more like this generated query

2015-04-27 Thread alxsss
Hello,


I am using solr-4.10.4 with mlt. I noticed that mlt constructs query which is 
missing some words. For example, for doc with 
title: Jennnifer Lopez 
keywords: Jennifer, concert, Hollywood


the parsedquery generated by mlt for this doc  is  title:lopez 
keywords:jennifer keywords:concert keywords:hollywood.
It seems to me that there must be  title:jennifer, too


For another doc that has only title, mlt generated query includes  
keywords:famili. This doc has family in title.


Any ideas what is wrong here?


Thanks.
Alex.








Re: snapinstaller does not start newSearcher

2015-03-04 Thread alxsss
I have used snapshotter api and modified snapinstaller script, so that it
successfully grabs the snapshot folder and updates index folder in slave.
However, it fails to open newSearcher. 
It simple, sends a commit command to slave, but  hasUncommittedChanges
function returns false.
That is the reason.

Reloading collection picks up changes.

Could reloading return no results for queries that were sent during this
process?

Thanks.
Alex.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/snapinstaller-does-not-start-newSearcher-tp4188449p4191069.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: snapinstaller does not start newSearcher

2015-02-24 Thread alxsss
Hello,

We cannot use replication with the current architecture, so decided to use 
snapshotter with snapinstaller.

Here is the full stack trace

8937 [coreLoadExecutor-5-thread-3] INFO  
org.apache.solr.core.CachingDirectoryFactory  – Closing directory: 
/home/solr/solr-4.10.1/solr/example/solr/product/data
8938 [coreLoadExecutor-5-thread-3] ERROR org.apache.solr.core.CoreContainer  – 
Error creating core [product]: Error opening new searcher
org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.init(SolrCore.java:873)
at org.apache.solr.core.SolrCore.init(SolrCore.java:646)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:491)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:255)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:249)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677)
at org.apache.solr.core.SolrCore.init(SolrCore.java:845)
... 9 more
Caused by: java.nio.file.NoSuchFileException: 
/home/solr/solr-4.10.1/solr/example/solr/product/data/index/segments_4
at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at 
sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:176)
at java.nio.channels.FileChannel.open(FileChannel.java:287)
at java.nio.channels.FileChannel.open(FileChannel.java:334)
at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:196)
at 
org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:198)
at 
org.apache.lucene.store.Directory.openChecksumInput(Directory.java:113)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:341)
at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:454)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:906)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:752)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:450)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:792)
at 
org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:77)
at 
org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64)
at 
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:279)
at 
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:111)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1528)
... 11 more
8943 [main] INFO  org.apache.solr.servlet.SolrDispatchFilter  – 
user.dir=/home/solr/solr-4.10.1/solr/example
8943 [main] INFO  org.apache.solr.servlet.SolrDispatchFilter  – 
SolrDispatchFilter.init() done
8982 [main] INFO  org.eclipse.jetty.server.AbstractConnector  – Started 
SocketConnector@0.0.0.0:8983

Thanks.
Alex.

 

 

 

-Original Message-
From: Shalin Shekhar Mangar shalinman...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Tue, Feb 24, 2015 12:13 am
Subject: Re: snapinstaller does not start newSearcher


Do you mean the snapinstaller (bash) script? Those are legacy scripts.
It's
been a long time since they were tested. The ReplicationHandler is
the
recommended way to setup replication. If you want to take a snapshot
then
the replication handler has an HTTP based API which lets you do
that.

In any case, do you have the full stack trace for that exception?
There
should be another cause nested under it.

On Tue, Feb 24, 2015 at 12:47
PM, alx...@aim.com wrote:

 Hello,

 I am using latest solr (solr
trunk) . I run snapinstaller, and see that it
 copies snapshot to index folder
but changes are not picked up and

  logs in slave after running
snapinstaller are

 44302 [qtp1312571113-14] INFO 
org.apache.solr.update.UpdateHandler  –
 start

commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

44303 [qtp1312571113-14] INFO  org.apache.solr.update.UpdateHandler  – No

uncommitted changes. Skipping IW.commit.
 44304 [qtp1312571113-14] INFO 
org.apache.solr.core.SolrCore  –
 SolrIndexSearcher has not 

snapinstaller does not start newSearcher

2015-02-23 Thread alxsss
Hello,

I am using latest solr (solr trunk) . I run snapinstaller, and see that it 
copies snapshot to index folder but changes are not picked up and

 logs in slave after running snapinstaller are

44302 [qtp1312571113-14] INFO  org.apache.solr.update.UpdateHandler  – start 
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
44303 [qtp1312571113-14] INFO  org.apache.solr.update.UpdateHandler  – No 
uncommitted changes. Skipping IW.commit.
44304 [qtp1312571113-14] INFO  org.apache.solr.core.SolrCore  – 
SolrIndexSearcher has not changed - not re-opening: 
org.apache.solr.search.SolrIndexSearcher
44305 [qtp1312571113-14] INFO  org.apache.solr.update.UpdateHandler  – 
end_commit_flush
44305 [qtp1312571113-14] INFO  
org.apache.solr.update.processor.LogUpdateProcessor  – [product] webapp=/solr 
path=/update params={} {commit=} 0 57

Restarting solr  gives

 Error creating core [product]: Error opening new searcher
org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.init(SolrCore.java:873)
at org.apache.solr.core.SolrCore.init(SolrCore.java:646)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:491)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:255)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:249)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677)
at org.apache.solr.core.SolrCore.init(SolrCore.java:845)
... 9 more

Any idea what causes this issue.

Thanks in advance.
Alex.



custom sorting of search result

2014-11-03 Thread alxsss
Hello,


We need to order solr search results according to specific rules. 


I will explain with an example. Let say solr returns 1000 results for query 
sport. 
These results must be divided into three buckets according to rules that come 
from database. 
Then one doc must be chosen from each bucket and put in the results 
subsequently until all buckets are empty.


One approach was to modify/override solr code where it gets results, sorts them 
and return #rows of elements.
However, from the code in Weight.java scoreAll function we see that docs have 
only internal document id and nothing else. 


We expect unique solr document id in order to match documents with the custom 
scoring.
We also  see that Lucene code handles those doc ids to scoreAll function, and 
for now We do not want to modify Lucene code
 and prefer to solve this issue as a Solr  plugin .


Any ideas are welcome.




Thanks.
Alex. 






Re: Incorrect group.ngroups value

2014-08-25 Thread alxsss
Hi,

From the discussion it is not clear if this is a fixable bug in the case of 
documents being in different shards. If this is fixable could someone please 
direct me to the part of the code so that I could investigate.

Thanks.
Alex.

 

 

 

-Original Message-
From: Andrew Shumway andrew.shum...@issinc.com
To: solr-user solr-user@lucene.apache.org
Sent: Fri, Aug 22, 2014 8:15 am
Subject: RE: Incorrect group.ngroups value


The Co-location section of this document  
http://searchhub.org/2013/06/13/solr-cloud-document-routing/ 
might be of interest to you.  It mentions the need for using Solr Cloud routing 
to group documents in the same core so that grouping can work properly.

--Andrew Shumway


-Original Message-
From: Bryan Bende [mailto:bbe...@gmail.com] 
Sent: Friday, August 22, 2014 9:01 AM
To: solr-user@lucene.apache.org
Subject: Re: Incorrect group.ngroups value

Thanks Jim.

We've been using the composite id approach where we put group value as the 
leading portion of the id (i.e. groupValue!documentid), so I was expecting all 
of the documents for a given group to be in the same shard, but at least this 
gives me something to look into. I'm still suspicious of something changing 
between 4.6.1 and 4.8.1, because we've had the grouping implemented this way 
for 
a while, and only on the exact day we upgraded did someone bring this problem 
forward. I will keep investigating, thanks.


On Fri, Aug 22, 2014 at 9:18 AM, jim ferenczi jim.feren...@gmail.com
wrote:

 Hi Bryan,
 This is a known limitations of the grouping.
 https://wiki.apache.org/solr/FieldCollapsing#RequestParameters

 group.ngroups:


 *WARNING: If this parameter is set to true on a sharded environment, 
 all the documents that belong to the same group have to be located in 
 the same shard, otherwise the count will be incorrect. If you are 
 using SolrCloud https://wiki.apache.org/solr/SolrCloud, consider 
 using custom hashing*

 Cheers,
 Jim



 2014-08-21 21:44 GMT+02:00 Bryan Bende bbe...@gmail.com:

  Is there any known issue with using group.ngroups in a distributed 
  Solr using version 4.8.1 ?
 
  I recently upgraded a cluster from 4.6.1 to 4.8.1, and I'm noticing
 several
  queries where ngroups will be more than the actual groups returned 
  in the response. For example, ngroups will say 5, but then there 
  will be 3
 groups
  in the response. It is not happening on all queries, only some.
 


 


regexTransformer returns no results if there is no match

2014-08-11 Thread alxsss
Hello,

I try to construct wikipedia page url from page title using regexTransformer
with


field column=title_underscore  regex=\s+ replaceWith=_  
sourceColName=title /

This does not work  for titles that have no space, so title_underscore for them 
is empty.

Any ideas what is wrong here?

This is with solr-4.8.1

Thanks. Alex.


Re: group.ngroups is set to an incorrect value - specific field types

2014-06-17 Thread alxsss
Hi,


I see similar problem in our solr application. Sometime it gives number in a 
group as number of all documents. This starting to happen after upgrade from 
4.6.1 to 4.8.1


Thanks.
Alex.



-Original Message-
From: 海老澤 志信 shinobu_ebis...@waku-2.com
To: solr-user solr-user@lucene.apache.org
Sent: Tue, Jun 17, 2014 5:24 am
Subject: RE: group.ngroups is set to an incorrect value - specific field types


Hi all

Could anyone have comments on my bug report?

Regards,
Ebisawa


-Original Message-
From: 海老澤 志信
Sent: Friday, June 13, 2014 7:45 PM
To: 'solr-user@lucene.apache.org'
Subject: group.ngroups is set to an incorrect value - specific field types

Hi,

I'm using Solr version 4.1.
I found a bug in group.ngroups. So could anyone kindly take a look at my
bug report?

If I specify the type Double as group.field, the value of group.ngroups is
set to be an incorrect value.

[condition]
- Double is defined in group.field
- Documents without the field which is defined as group.field,

[Sample query and Example]
---
solr/select?q=*:*group=truegroup.ngroups=truegroup.field=Double_Fiel
d

* Double_Field is defined solr.TrieDoubleField type.
---
When documents with group.field are 4 and documents without group.field are
6,
then it turns out 10 of group.ngroups as result of the query.

But I think that group.ngroups should be 5 rightly in this case.

[Root Cause]
It seems there is a bug in the source code of Lucene.
There is a function that compares a list of whether these groups contain
the same group.field,
It calls MutableValueDouble.compareSameType().

See below the point which seems to be a root cause.
-
if (!exists) return -1;
if (!b.exists) return 1;
-
If exists is false, it return -1.

But I think it should return 0, when exists and b.exists are equal.

[Similar problem]
There is a similar problem to MutableValueBool.compareSameType().
Therefore, when you grouping the field of type Boolean (solr.BoolField),
value of group.ngroups is always 0 or 1 .

[Solution]
I propose the following modifications:
MutableValueDouble.compareSameType()

===
--- MutableValueDouble.java
+++ MutableValueDouble.java
@@ -54,9 +54,8 @@
 MutableValueDouble b = (MutableValueDouble)other;
 int c = Double.compare(value, b.value);
 if (c != 0) return c;
-if (!exists) return -1;
-if (!b.exists) return 1;
-return 0;
+if (exists == b.exists) return 0;
+return exists ? 1 : -1;
   }
===

I propose the following modifications: MutableValueBool.compareSameType()

===
--- MutableValueBool.java
+++ MutableValueBool.java
@@ -52,7 +52,7 @@
   @Override
   public int compareSameType(Object other) {
 MutableValueBool b = (MutableValueBool)other;
-if (value != b.value) return value ? 1 : 0;
+if (value != b.value) return value ? 1 : -1;
 if (exists == b.exists) return 0;
 return exists ? 1 : -1;
   }
===


Thanks,

Ebisawa


 


Re: how do I get search for fort st john to match ft saint john

2014-04-01 Thread alxsss
It seems to me that, you are missing this line  

  filter class=solr.SynonymFilterFactory synonyms=city_index_synonyms.txt 
ignoreCase=true expand=true /

under
 analyzer type=query

Alex.

 

 

-Original Message-
From: solr-user solr-u...@hotmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Tue, Apr 1, 2014 5:01 pm
Subject: Re: how do I get search for fort st john to match ft saint john


Hi Eric.

Sorry, been away.  

The city_index_synonyms.txt file is pretty small as it contains just these
two lines:

saint,st,ste
fort,ft

There is nothing at all in the city_query_synonyms.txt file, and it isn't
used either.

My understanding is that solr would create the appropriate synonym entries
in the index and so treat fort and ft as equal

if you have a simple one line schema (that uses the type definition from my
original email) and index fort saint john, does it work for you?  i.e.
does it return results if you search for ft st john and ft saint john
and fort st john?  

My Solr 4.6.1 instance doesn't.  I am wondering if synonyms just don't work
for all/some words in a phrase



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128500.html
Sent from the Solr - User mailing list archive at Nabble.com.

 


spellcheck in solr-4.6-1 distrib=true

2014-03-31 Thread alxsss
Hello,

For queries in solrcloud and in distributed mode solr-4.6.1 spellcheck does not 
return any suggestions, but in non-distrubited mode.
Is this a know bug?

Thanks.
Alex.


Re: change character correspondence in icu lib

2014-02-13 Thread alxsss
I found out that generated files are the same. I think this is because that 
these lines inside build file

  target name=gen-utr30-data-files depends=compile-tools
java
classname=org.apache.lucene.analysis.icu.GenerateUTR30DataFiles
dir=${utr30.data.dir}
fork=true
failonerror=true
  classpath
path refid=icujar/
pathelement location=${build.dir}/classes/tools/
  /classpath
/java
  /target

  property name=gennorm2.src.files
  value=nfc.txt nfkc.txt nfkc_cf.txt BasicFoldings.txt 
DiacriticFolding.txt DingbatFolding.txt HanRadicalFolding.txt 
NativeDigitFolding.txt/
  property name=gennorm2.tmp value=${build.dir}/gennorm2/utr30.tmp/
  property name=gennorm2.dst 
value=${resources.dir}/org/apache/lucene/analysis/icu/utr30.nrm/
  target name=gennorm2 depends=gen-utr30-data-files
echoNote that the gennorm2 and icupkg tools must be on your PATH. These 
tools
are part of the ICU4C package. See http://site.icu-project.org/ /echo
mkdir dir=${build.dir}/gennorm2/
exec executable=gennorm2 failonerror=true
  arg value=-v/
  arg value=-s/
  arg value=${utr30.data.dir}/
  arg line=${gennorm2.src.files}/
  arg value=-o/
  arg value=${gennorm2.tmp}/
/exec
!-- now convert binary file to big-endian --
exec executable=icupkg failonerror=true
  arg value=-tb/
  arg value=${gennorm2.tmp}/
  arg value=${gennorm2.dst}/
/exec
delete file=${gennorm2.tmp}/
  /target

Are not executed and resource files are downloaded from internet instead.

Any ideas how to fix this issue?

Thanks.
Alex.

 

 

 

-Original Message-
From: Alexandre Rafalovitch arafa...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Wed, Feb 12, 2014 5:20 pm
Subject: Re: change character correspondence in icu lib


Not a direct answer, but the usual next question is: are you
absolutely sure you are using the right jars? Try renaming them and
restarting Solr. If it complains, you got the right ones. If not
Also, unzip those jars and see if your file made it all the way
through the build pipeline.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, Feb 13, 2014 at 8:12 AM,  alx...@aim.com wrote:
 Hello,

 I use
 icu4j-49.1.jar,
 lucene-analyzers-icu-4.6-SNAPSHOT.jar

 for one of the fields in the form

 filter class=solr.ICUFoldingFilterFactory /

 I need to change one of the accent char's corresponding letter. I made 
 changes 
to this file

 lucene/analysis/icu/src/data/utr30/DiacriticFolding.txt

 recompiled solr and lucene and replaced the above jars with new ones, but no 
change in the indexing and parsing of keywords.

 Any ideas where the appropriate change must be made?

 Thanks.
 Alex.




 


Re: change character correspondence in icu lib

2014-02-13 Thread alxsss
I found out that generated files are the same. I think this is because that 
these lines inside build file

  target name=gen-utr30-data-files depends=compile-tools
java
classname=org.apache.lucene.analysis.icu.GenerateUTR30DataFiles
dir=${utr30.data.dir}
fork=true
failonerror=true
  classpath
path refid=icujar/
pathelement location=${build.dir}/classes/tools/
  /classpath
/java
  /target

  property name=gennorm2.src.files
  value=nfc.txt nfkc.txt nfkc_cf.txt BasicFoldings.txt 
DiacriticFolding.txt DingbatFolding.txt HanRadicalFolding.txt 
NativeDigitFolding.txt/
  property name=gennorm2.tmp value=${build.dir}/gennorm2/utr30.tmp/
  property name=gennorm2.dst 
value=${resources.dir}/org/apache/lucene/analysis/icu/utr30.nrm/
  target name=gennorm2 depends=gen-utr30-data-files
echoNote that the gennorm2 and icupkg tools must be on your PATH. These 
tools
are part of the ICU4C package. See http://site.icu-project.org/ /echo
mkdir dir=${build.dir}/gennorm2/
exec executable=gennorm2 failonerror=true
  arg value=-v/
  arg value=-s/
  arg value=${utr30.data.dir}/
  arg line=${gennorm2.src.files}/
  arg value=-o/
  arg value=${gennorm2.tmp}/
/exec
!-- now convert binary file to big-endian --
exec executable=icupkg failonerror=true
  arg value=-tb/
  arg value=${gennorm2.tmp}/
  arg value=${gennorm2.dst}/
/exec
delete file=${gennorm2.tmp}/
  /target

Are not executed and resource files are downloaded from internet instead.

Any ideas how to fix this issue?

Thanks.
Alex.

 

 

 

-Original Message-
From: Alexandre Rafalovitch arafa...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Wed, Feb 12, 2014 5:20 pm
Subject: Re: change character correspondence in icu lib


Not a direct answer, but the usual next question is: are you
absolutely sure you are using the right jars? Try renaming them and
restarting Solr. If it complains, you got the right ones. If not
Also, unzip those jars and see if your file made it all the way
through the build pipeline.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, Feb 13, 2014 at 8:12 AM,  alx...@aim.com wrote:
 Hello,

 I use
 icu4j-49.1.jar,
 lucene-analyzers-icu-4.6-SNAPSHOT.jar

 for one of the fields in the form

 filter class=solr.ICUFoldingFilterFactory /

 I need to change one of the accent char's corresponding letter. I made 
 changes 
to this file

 lucene/analysis/icu/src/data/utr30/DiacriticFolding.txt

 recompiled solr and lucene and replaced the above jars with new ones, but no 
change in the indexing and parsing of keywords.

 Any ideas where the appropriate change must be made?

 Thanks.
 Alex.




 


change character correspondence in icu lib

2014-02-12 Thread alxsss
Hello,

I use
icu4j-49.1.jar,
lucene-analyzers-icu-4.6-SNAPSHOT.jar

for one of the fields in the form 

filter class=solr.ICUFoldingFilterFactory /

I need to change one of the accent char's corresponding letter. I made changes 
to this file

lucene/analysis/icu/src/data/utr30/DiacriticFolding.txt

recompiled solr and lucene and replaced the above jars with new ones, but no 
change in the indexing and parsing of keywords.

Any ideas where the appropriate change must be made?

Thanks.
Alex.





Re: additional requests sent to solr

2013-08-11 Thread alxsss
Hi,

Could someone please confirm that this must me so or this is a bug in SOLR.

In short, I see three logs in SOLR for one  request
http://server1:8983/solr/mycollection/select?q=alexwt=xmldefType=edismaxfacet.field=schoolfacet.field=companyfacet=truefacet.limit=10facet.mincount=1qf=school_txt+company_txt+nameshards=server1:8983/solr/mycollection,server2.com:8983/solr/mycollection

for the case when facet=true.  

The third log looks like as 
INFO: [mycollection] webapp=/solr path=/select

params={facet=truefacet.mincount=1company__terms=Googleids=957642543183429632,957841245982425088,67612781366,56659036467,50875569066,957707339232706560,465078975511facet.limit=10qf=school_txt+company_txt+namedistrib=falsewt=javabinversion=2rows=10defType=edismaxNOW=1374191542130shard.url=server1:8983/solr/mycollectionschool__terms=Michigan+State+University,Brigham+Young+University,Northeastern+Universityq=alexfacet.field={!terms%3D$school__terms}schoolfacet.field={!terms%3D$company__terms}companyisShard=true}
 status=0 QTime=6

where company__terms and school_terms values are taken from facet values for
company and school fields.

When data is big this leads to a log with all facet values, that
considerably slows performance. This issue is observed in distributed mode
only.

Thanks in advance.
Alex.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/additional-requests-sent-to-solr-tp4079007p4083799.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: additional requests sent to solr

2013-08-05 Thread alxsss
Hello,

I still have this issue. Basically in distributed mode, when facet is true, 
solr-4.2 issues an additional query with 
facet.field={!terms%3D$company__terms}companyisShard=true} where for example 
company__terms have all values from company facet field.

I have added terms=false to the original query sent to solr, but it did not 
help.

Does anyone has any idea how to suppress these queries. 

Thanks.
Alex.


 

 

 

-Original Message-
From: alxsss alx...@aim.com
To: solr-user solr-user@lucene.apache.org
Sent: Fri, Jul 19, 2013 5:00 am
Subject: additional requests sent to solr


Hello,

I send to solr( to server1 in the cluster of two servers) the folowing request

http://server1:8983/solr/mycollection/select?q=alexwt=xmldefType=edismaxfacet.field=schoolfacet.field=companyfacet=truefacet.limit=10facet.mincount=1qf=school_txt+company_txt+nameshards=server1:8983/solr/mycollection,server2.com:8983/solr/mycollection

I see in the logs 2 additional requests

INFO: [mycollection] webapp=/solr path=/select 
params={facet=truef.company.facet.limit=25qf=school_txt+company_txt+namedistrib=falsewt=javabinversion=2rows=10defType=edismaxf.school_facet.facet.limit=25NOW=1374191542130shard.url=server1:8983/solr/mycollectionfl=id,scorestart=0q=alexfacet.field=schoolfacet.field=companyisShard=truefsv=true}
 
hits=9118 status=0 QTime=72

Jul 18, 2013 4:52:22 PM org.apache.solr.core.SolrCore execute
INFO: [mycollection] webapp=/solr path=/select 
params={facet=truefacet.mincount=1company__terms=Googleids=957642543183429632,957841245982425088,67612781366,56659036467,50875569066,957707339232706560,465078975511facet.limit=10qf=school_txt+company_txt+namedistrib=falsewt=javabinversion=2rows=10defType=edismaxNOW=1374191542130shard.url=server1:8983/solr/mycollectionschool__terms=Michigan+State+University,Brigham+Young+University,Northeastern+Universityq=alexfacet.field={!terms%3D$school__terms}schoolfacet.field={!terms%3D$company__terms}companyisShard=true}
 
status=0 QTime=6

Jul 18, 2013 4:52:22 PM org.apache.solr.core.SolrCore execute
INFO: [mycollection] webapp=/solr path=/select 
params={facet=trueshards=server1.prod.mylife.com:8983/solr/mycollection,server2:8983/solr/mycollectionfacet.mincount=1q=alexfacet.limit=10qf=school_txt+company_txt+namefacet.field=schoolfacet.field=companywt=xmldefType=edismax}
 
hits=97262 status=0 QTime=168


I can understand that the first and the third log records are related to the 
above request, but cannot inderstand where the second log comes from. 
I see in it, company__terms and 
{!terms%3D$school__terms}schoolfacet.field={!terms%3D$company__terms}, whish 
seems does not have anything to do with the initial request. This is solr-4.2.0


Any ideas about it are welcome.

Thanks in advance.
Alex.

 


Re: additional requests sent to solr

2013-08-05 Thread alxsss
I care about performance. Since the data is too big the query with terms 
becomes to long and slows performance. 

bq
---
In general distributed searchrequires two round trips to the other shards.
---

In this case I have three queries to solr. The third one is with {!terms..., 
which I do not understand why it is there.

Thanks.
Alex.

 

 

 

-Original Message-
From: Erick Erickson erickerick...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Mon, Aug 5, 2013 7:10 pm
Subject: Re: additional requests sent to solr


Why do you care? Is this causing you trouble? In general distributed search
requires two round trips to the other shards. The first query gets the
top N, those are returned to the originator (just a list of IDs and sort
criteria,
often score). The originator then assembles the final top N, but then
the actual body of those documents must be fetched from the other
nodes.

Best
Erick


On Mon, Aug 5, 2013 at 2:02 AM, alx...@aim.com wrote:

 Hello,

 I still have this issue. Basically in distributed mode, when facet is
 true, solr-4.2 issues an additional query with
 facet.field={!terms%3D$company__terms}companyisShard=true} where for
 example
 company__terms have all values from company facet field.

 I have added terms=false to the original query sent to solr, but it did
 not help.

 Does anyone has any idea how to suppress these queries.

 Thanks.
 Alex.








 -Original Message-
 From: alxsss alx...@aim.com
 To: solr-user solr-user@lucene.apache.org
 Sent: Fri, Jul 19, 2013 5:00 am
 Subject: additional requests sent to solr


 Hello,

 I send to solr( to server1 in the cluster of two servers) the folowing
 request


 http://server1:8983/solr/mycollection/select?q=alexwt=xmldefType=edismaxfacet.field=schoolfacet.field=companyfacet=truefacet.limit=10facet.mincount=1qf=school_txt+company_txt+nameshards=server1:8983/solr/mycollection,server2.com:8983/solr/mycollection

 I see in the logs 2 additional requests

 INFO: [mycollection] webapp=/solr path=/select
 params={facet=truef.company.facet.limit=25qf=school_txt+company_txt+namedistrib=falsewt=javabinversion=2rows=10defType=edismaxf.school_facet.facet.limit=25NOW=1374191542130shard.url=server1:8983/solr/mycollectionfl=id,scorestart=0q=alexfacet.field=schoolfacet.field=companyisShard=truefsv=true}
 hits=9118 status=0 QTime=72

 Jul 18, 2013 4:52:22 PM org.apache.solr.core.SolrCore execute
 INFO: [mycollection] webapp=/solr path=/select
 params={facet=truefacet.mincount=1company__terms=Googleids=957642543183429632,957841245982425088,67612781366,56659036467,50875569066,957707339232706560,465078975511facet.limit=10qf=school_txt+company_txt+namedistrib=falsewt=javabinversion=2rows=10defType=edismaxNOW=1374191542130shard.url=server1:8983/solr/mycollectionschool__terms=Michigan+State+University,Brigham+Young+University,Northeastern+Universityq=alexfacet.field={!terms%3D$school__terms}schoolfacet.field={!terms%3D$company__terms}companyisShard=true}
 status=0 QTime=6

 Jul 18, 2013 4:52:22 PM org.apache.solr.core.SolrCore execute
 INFO: [mycollection] webapp=/solr path=/select params={facet=trueshards=
 server1.prod.mylife.com:8983/solr/mycollection,server2:8983/solr/mycollectionfacet.mincount=1q=alexfacet.limit=10qf=school_txt+company_txt+namefacet.field=schoolfacet.field=companywt=xmldefType=edismax
 }
 hits=97262 status=0 QTime=168


 I can understand that the first and the third log records are related to
 the
 above request, but cannot inderstand where the second log comes from.
 I see in it, company__terms and
 {!terms%3D$school__terms}schoolfacet.field={!terms%3D$company__terms},
 whish
 seems does not have anything to do with the initial request. This is
 solr-4.2.0


 Any ideas about it are welcome.

 Thanks in advance.
 Alex.




 


additional requests sent to solr

2013-07-18 Thread alxsss
Hello,

I send to solr( to server1 in the cluster of two servers) the folowing request

http://server1:8983/solr/mycollection/select?q=alexwt=xmldefType=edismaxfacet.field=schoolfacet.field=companyfacet=truefacet.limit=10facet.mincount=1qf=school_txt+company_txt+nameshards=server1:8983/solr/mycollection,server2.com:8983/solr/mycollection

I see in the logs 2 additional requests

INFO: [mycollection] webapp=/solr path=/select 
params={facet=truef.company.facet.limit=25qf=school_txt+company_txt+namedistrib=falsewt=javabinversion=2rows=10defType=edismaxf.school_facet.facet.limit=25NOW=1374191542130shard.url=server1:8983/solr/mycollectionfl=id,scorestart=0q=alexfacet.field=schoolfacet.field=companyisShard=truefsv=true}
 hits=9118 status=0 QTime=72

Jul 18, 2013 4:52:22 PM org.apache.solr.core.SolrCore execute
INFO: [mycollection] webapp=/solr path=/select 
params={facet=truefacet.mincount=1company__terms=Googleids=957642543183429632,957841245982425088,67612781366,56659036467,50875569066,957707339232706560,465078975511facet.limit=10qf=school_txt+company_txt+namedistrib=falsewt=javabinversion=2rows=10defType=edismaxNOW=1374191542130shard.url=server1:8983/solr/mycollectionschool__terms=Michigan+State+University,Brigham+Young+University,Northeastern+Universityq=alexfacet.field={!terms%3D$school__terms}schoolfacet.field={!terms%3D$company__terms}companyisShard=true}
 status=0 QTime=6

Jul 18, 2013 4:52:22 PM org.apache.solr.core.SolrCore execute
INFO: [mycollection] webapp=/solr path=/select 
params={facet=trueshards=server1.prod.mylife.com:8983/solr/mycollection,server2:8983/solr/mycollectionfacet.mincount=1q=alexfacet.limit=10qf=school_txt+company_txt+namefacet.field=schoolfacet.field=companywt=xmldefType=edismax}
 hits=97262 status=0 QTime=168


I can understand that the first and the third log records are related to the 
above request, but cannot inderstand where the second log comes from. 
I see in it, company__terms and 
{!terms%3D$school__terms}schoolfacet.field={!terms%3D$company__terms}, whish 
seems does not have anything to do with the initial request. This is solr-4.2.0


Any ideas about it are welcome.

Thanks in advance.
Alex.


Re: document id in nutch/solr

2013-06-24 Thread alxsss
Another way of overriding nutch fields is to modify solrindex-mapping.xml file.

hth
Alex.

 

 

 

-Original Message-
From: Jack Krupansky j...@basetechnology.com
To: solr-user solr-user@lucene.apache.org
Sent: Sun, Jun 23, 2013 12:04 pm
Subject: Re: document id in nutch/solr


Add the passthrough dynamic field to your Solr schema, and then see what 
fields get passed through to Solr from Nutch. Then, add the missing fields 
to your Solr schema and remove the passthrough.

dynamicField name=* type=string indexed=true stored=true 
multiValued=true /

Or, add Solr copyField directives to place fields in existing named 
fields.

Or... talk to the nutch people about how to do field name mapping on the 
nutch side of the fence.

Hold off on UUIDs until you figure all of the above out and everything is 
working without them.

-- Jack Krupansky

-Original Message- 
From: Joe Zhang
Sent: Sunday, June 23, 2013 2:35 PM
To: solr-user@lucene.apache.org
Subject: Re: document id in nutch/solr

Can somebody help with this one, please?


On Fri, Jun 21, 2013 at 10:36 PM, Joe Zhang smartag...@gmail.com wrote:

 A quite standard configuration of nutch seems to autoamtically map url
 to id. Two questions:

 - Where is such mapping defined? I can't find it anywhere in
 nutch-site.xml or schema.xml. The latter does define the id field as 
 well
 as its uniqueness, but not the mapping.

 - Given that nutch nutch has already defined such an id, can i ask solr to
 redefine id as UUID?
 field name=id type=uuid indexed=true stored=true default=NEW/

 - This leads to a related question: do solr and nutch have to have
 IDENTICAL schema.xml?
 


 


whole index in memory

2013-05-31 Thread alxsss
Hello,

I have a solr index of size 5GB. I am thinking of increasing  cache size to 5 
GB, expecting Solr will put whole index into memory.

1. Will Solr indeed put whole index into memory?
2. What are drawbacks of this approach?

Thanks in advance.
Alex.


Re: EdgeGram filter

2013-04-23 Thread alxsss
Hi,

I was unable to find more info about 
LimitTokenCountFilterFactory
 in solr wiki. Is there any other place to get thorough description of what it 
does?

Thanks.
Alex.

 

 

 

-Original Message-
From: Jack Krupansky j...@basetechnology.com
To: solr-user solr-user@lucene.apache.org
Sent: Tue, Apr 23, 2013 11:36 am
Subject: Re: EdgeGram filter


Well, you could copy to another field (using copyField) and then have an 
analyzer with a LimitTokenCountFilterFactory that accepts only 1 token, and 
then apply the EdgeNGramFilter to that one token. But you would have to 
query explicitly against that other field. Since you are using dismax, you 
should be able to add that second field to the qf parameter. And then remove 
the EdgeNGramFilter from your main field.

-- Jack Krupansky

-Original Message- 
From: hassancrowdc
Sent: Tuesday, April 23, 2013 12:09 PM
To: solr-user@lucene.apache.org
Subject: EdgeGram filter

Hi,

I want to edgeNgram let's say this document that has 'difficult contents' so
that if i query (using disman) q=dif  it shows me this result. This is
working fine. But now if i search for q=con it gives me this document as
well. is there any way to only show this document when i search for 'dif' or
'di'. basically i want to edgegram 'difficultcontent' not 'difficult' and
'content'. Any help?


Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/EdgeGram-filter-tp4058337.html
Sent from the Solr - User mailing list archive at Nabble.com. 


 


Re: EdgeGram filter

2013-04-23 Thread alxsss
Hi,

I did not find any descriptions, except constructor and method names. 

Thanks.
Alex.
 

 

 

-Original Message-
From: Markus Jelsma markus.jel...@openindex.io
To: solr-user solr-user@lucene.apache.org
Sent: Tue, Apr 23, 2013 12:08 pm
Subject: RE: EdgeGram filter


Always check the javadocs. There's a lot of info to be found there:
http://lucene.apache.org/core/4_0_0-BETA/analyzers-common/org/apache/lucene/analysis/miscellaneous/LimitTokenCountFilterFactory.html

 
 
-Original message-
 From:alx...@aim.com alx...@aim.com
 Sent: Tue 23-Apr-2013 21:06
 To: solr-user@lucene.apache.org
 Subject: Re: EdgeGram filter
 
 Hi,
 
 I was unable to find more info about 
 LimitTokenCountFilterFactory
  in solr wiki. Is there any other place to get thorough description of what 
 it 
does?
 
 Thanks.
 Alex.
 
  
 
  
 
  
 
 -Original Message-
 From: Jack Krupansky j...@basetechnology.com
 To: solr-user solr-user@lucene.apache.org
 Sent: Tue, Apr 23, 2013 11:36 am
 Subject: Re: EdgeGram filter
 
 
 Well, you could copy to another field (using copyField) and then have an 
 analyzer with a LimitTokenCountFilterFactory that accepts only 1 token, and 
 then apply the EdgeNGramFilter to that one token. But you would have to 
 query explicitly against that other field. Since you are using dismax, you 
 should be able to add that second field to the qf parameter. And then remove 
 the EdgeNGramFilter from your main field.
 
 -- Jack Krupansky
 
 -Original Message- 
 From: hassancrowdc
 Sent: Tuesday, April 23, 2013 12:09 PM
 To: solr-user@lucene.apache.org
 Subject: EdgeGram filter
 
 Hi,
 
 I want to edgeNgram let's say this document that has 'difficult contents' so
 that if i query (using disman) q=dif  it shows me this result. This is
 working fine. But now if i search for q=con it gives me this document as
 well. is there any way to only show this document when i search for 'dif' or
 'di'. basically i want to edgegram 'difficultcontent' not 'difficult' and
 'content'. Any help?
 
 
 Thanks.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/EdgeGram-filter-tp4058337.html
 Sent from the Solr - User mailing list archive at Nabble.com. 
 
 
  
 

 


Re: solr-cloud performance decrease day by day

2013-04-19 Thread alxsss
How many segments each shard has and what is the reason of running multiple 
shards in one machine?

Alex.

 

 

 

-Original Message-
From: qibaoyuan qibaoy...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Fri, Apr 19, 2013 12:26 am
Subject: Re: solr-cloud performance decrease day by day


there are 6 shards and they are in one machine,and the jvm param is very 
big,the 
physical memory is 16GB,the total #docs is about 150k,the index size of each 
shard is about 1GB.AND there is indexing while searching,I USE auto commit  
each 
10min.and the data comes about 100 per minutes. 


在 2013-4-19,下午3:17,Furkan KAMACI furkankam...@gmail.com 写道:

 Could you give more info about your index size and technical details of
 your machine? Maybe you are indexing more data day by day and your RAM
 capability is not enough anymore?
 
 2013/4/19 qibaoyuan qibaoy...@gmail.com
 
 Hello,
   i am using sold 4.1.0 and ihave used sold cloud in my product.I have
 found at first everything seems good,the search time is fast and delay is
 slow,but it becomes very slow after days.does any one knows if there maybe
 some params or optimization to use sold cloud?


 


Re: Spellchecker not working for Solr 4.1

2013-04-11 Thread alxsss
inside your request handler try to put spellcheck true and name of the 
spellcheck dictionary

hth

Alex.

 

 

 

-Original Message-
From: davers dboych...@improvementdirect.com
To: solr-user solr-user@lucene.apache.org
Sent: Thu, Apr 11, 2013 6:24 pm
Subject: Spellchecker not working for Solr 4.1


This is almost the same exact setup I was using in solr 3.6 not sure why it's
not working. Here is my setup.

searchComponent name=spellcheck class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypetextSpell/str


   lst name=spellchecker
 str name=namedefault/str
 str name=fieldspell/str
 str name=classnamesolr.DirectSolrSpellChecker/str
 
 str name=distanceMeasureinternal/str
 
 float name=accuracy0.7/float
 
  int name=maxEdits2/int
 
 int name=minPrefix1/int
 
 int name=maxInspections5/int
 
 int name=minQueryLength4/int
 
 float name=maxQueryFrequency0.01/float
 
   /lst
   


   

   

   
 /searchComponent
 
 
requestHandler name=/productQuery class=solr.SearchHandler
lst name=defaults
  str name=dftext/str
  str name=defTypeedismax/str
  float name=tie0.01/float
  str name=qf
sku^9.0 upc^9.1 uniqueid^9.0 series^2.8 productTitle^1.2
productid^9.0 manufacturer^4.0 masterFinish^1.5 theme^1.1 categoryName^0.2
finish^1.4
  /str
  str name=pf
text^0.2 productTitle^1.5 manufacturer^4.0 finish^1.9
  /str
  str name=bf
linear(popularity_82_i,1,2)^3.0
  /str
  str name=fl
uniqueid,productid,manufacturer
  /str
  str name=mm
3lt;-1 5lt;-2 6lt;90%
  /str
  bool name=grouptrue/bool
  str name=group.fieldgroupid/str
  bool name=group.ngroupstrue/bool
  int name=ps100/int
  int name=qs3/int
  int name=spellcheck.count10/int
  bool name=spellcheck.collatetrue/bool
  int name=spellcheck.maxCollations10/int
  int name=spellcheck.maxCollationTries100/int
/lst
arr name=last-components
  strspellcheck/str
/arr
  /requestHandler


fieldType name=textSpell class=solr.TextField
positionIncrementGap=100 omitNorms=true
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.PatternReplaceFilterFactory
pattern=([\.,;:_/\-]) replacement=  replace=all/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.PatternReplaceFilterFactory
pattern=([\.,;:_/\-]) replacement=  replace=all/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType


This is what I see in my logs when I attempt a spellcheck

INFO: [productindex] webapp=/solr path=/select
params={spellcheck=falsegroup.distributed.first=truetie=0.01spellcheck.maxCollationTries=100distrib=falseversion=2NOW=1365729795603shard.url=solr-shard-1.sys.id.build.com:8080/solr/productindex/|solr-shard-4.sys.id.build.com:8080/solr/productindex/fl=id,scoredf=textbf=%0a%09%09linear(popularity_82_i,1,2)^3.0%0a%09%09++group.field=groupidspellcheck.count=10qs=3spellcheck.build=truemm=%0a%09%093-1+5-2+690%25%0a%09%09++group.ngroups=truespellcheck.maxCollations=10qf=%0a%09%09sku^9.0+upc^9.1+uniqueid^9.0+series^2.8+productTitle^1.2+productid^9.0+manufacturer^4.0+masterFinish^1.5+theme^1.1+categoryName^0.2+finish^1.4%0a%09%09++wt=javabinspellcheck.collate=truedefType=edismaxrows=10pf=%0a%09%09text^0.2+productTitle^1.5+manufacturer^4.0+finish^1.9%0a%09%09++start=0q=fuacetgroup=trueisShard=trueps=100}
status=0 QTime=13
Apr 11, 2013 6:23:15 PM
org.apache.solr.handler.component.SpellCheckComponent finishStage
INFO:
solr-shard-2.sys.id.build.com:8080/solr/productindex/|solr-shard-5.sys.id.build.com:8080/solr/productindex/
null
Apr 11, 2013 6:23:15 PM
org.apache.solr.handler.component.SpellCheckComponent finishStage
INFO:
solr-shard-3.sys.id.build.com:8080/solr/productindex/|solr-shard-6.sys.id.build.com:8080/solr/productindex/
null
Apr 11, 2013 6:23:15 PM
org.apache.solr.handler.component.SpellCheckComponent finishStage
INFO:
solr-shard-1.sys.id.build.com:8080/solr/productindex/|solr-shard-4.sys.id.build.com:8080/solr/productindex/
null



--
View this message in 

Re: Query slow with termVectors termPositions termOffsets

2013-03-25 Thread alxsss
Did index size increase after turning on termPositions and termOffsets?

Thanks.
Alex.

 

 

 

-Original Message-
From: Ravi Solr ravis...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Mon, Mar 25, 2013 8:27 am
Subject: Query slow with termVectors termPositions termOffsets


Hello,
We re-indexed our entire core of 115 docs with some of the
fields having termVectors=true termPositions=true termOffsets=true,
prior to the reindex we only had termVectors=true. After the reindex the
the query component has become very slow. I thought that adding the
termOffsets and termPositions will increase the speed, am I wrong ? Several
queries like the one shown below which used to run fine are now very slow.
Can somebody kindly clarify how termOffsets and termPositions affect query
component ?

lst name=processdouble name=time19076.0/double
 lst name=org.apache.solr.handler.component.QueryComponentdouble
name=time18972.0/double/lst
lst name=org.apache.solr.handler.component.FacetComponentdouble
name=time0.0/double/lst
lst name=org.apache.solr.handler.component.MoreLikeThisComponentdouble
name=time0.0/double/lst
lst name=org.apache.solr.handler.component.HighlightComponentdouble
name=time0.0/double/lst
lst name=org.apache.solr.handler.component.StatsComponentdouble
name=time0.0/double/lst
lst
name=org.apache.solr.handler.component.QueryElevationComponentdouble
name=time0.0/double/lst
lst name=org.apache.solr.handler.clustering.ClusteringComponentdouble
name=time0.0/double/lst
lst name=org.apache.solr.handler.component.DebugComponentdouble
name=time104.0/double/lst
/lst


[#|2013-03-25T11:22:53.446-0400|INFO|sun-appserver2.1|org.apache.solr.core.SolrCore|_ThreadID=45;_ThreadName=httpSSLWorkerThread-9001-19;|[xxx]
webapp=/solr-admin path=/select
params={q=primarysectionnode:(/national*+OR+/health*)+OR+(contenttype:Blog+AND+subheadline:(The+Checkup+OR+Checkpoint+Washington+OR+Post+Carbon+OR+TSA+OR+College+Inc.+OR+Campus+Overload+OR+Planet+Panel+OR+The+Answer+Sheet+OR+Class+Struggle+OR+BlogPost))+OR+(contenttype:Photo+Gallery+AND+headline:day+in+photos)start=0rows=1sort=displaydatetime+descfq=-source:(Reuters+OR+PC+World+OR+CBS+News+OR+NC8/WJLA+OR+NewsChannel+8+OR+NC8+OR+WJLA+OR+CBS)+-contenttype:(Discussion+OR+Photo)+-slug:(op-*dummy*+OR+noipad-*)+-(contenttype:Photo+Gallery+AND+headline:(Drawing+Board+OR+Drawing+board+OR+drawing+board))+headline:[*+TO+*]+contenttype:[*+TO+*]+pubdatetime:[NOW/DAY-3YEARS+TO+NOW/DAY%2B1DAY]+-headline:(Summary+Box*+OR+Video*+OR+Post+Sports+Live*)+-slug:(warren*+OR+history)+-(contenttype:Blog+AND+subheadline:(DC+Schools+Insider+OR+On+Leadership))+contenttype:Blog+-systemid:(999c7102-955a-11e2-95ca-dd43e7ffee9c+OR+72bbb724-9554-11e2-95ca-dd43e7ffee9c+OR+2d008b80-9520-11e2-95ca-dd43e7ffee9c+OR+d2443d3c-9514-11e2-95ca-dd43e7ffee9c+OR+173764d6-9520-11e2-95ca-dd43e7ffee9c+OR+0181fd42-953c-11e2-95ca-dd43e7ffee9c+OR+e6cacb96-9559-11e2-95ca-dd43e7ffee9c+OR+03288052-9501-11e2-95ca-dd43e7ffee9c+OR+ddbf020c-9517-11e2-95ca-dd43e7ffee9c)+fullbody:[*+TO+*]wt=javabinversion=2}
hits=4985 status=0 QTime=19044 |#]

Thanks,

Ravi Kiran Bhaskar

 


Re: strange behaviour of wordbreak spellchecker in solr cloud

2013-03-22 Thread alxsss
Hello,


Further investigation shows the following pattern, for both DirectIndex and 
wordbreak spellchekers.

Assume that in all cases there are spellchecker results when distrib=false

In distributed mode (distrib=true)
  case when matches=0
1. group=true,  no spellcheck results

2. group=false , there are spellcheck results

  case when matches0
1. group=true, there are spellcheck results
2. group =false, there are spellcheck results


Do these constitute a failing test case?

Thanks.
Alex.

 

 

-Original Message-
From: alxsss alx...@aim.com
To: solr-user solr-user@lucene.apache.org
Sent: Thu, Mar 21, 2013 6:50 pm
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud



Hello,

I am debugging the SpellCheckComponent#finishStage. 
 
From the responses I see that not only wordbreak, but also directSpellchecker 
does not return some results in distributed mode. 
The request handler I was using had 

str name=grouptrue/str


So, I desided to turn of grouping and I see spellcheck results in distributed 
mode.


curl 
'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandler'
has no spellchek results 
but

curl 
'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandler
group=false'
returns results.

So, the conclusion is that grouping causes the distributed spellcheker to fail.

Could please you point me to the class that may be responsible to this issue?

Thanks.
Alex.
 




-Original Message-
From: Dyer, James james.d...@ingramcontent.com
To: solr-user solr-user@lucene.apache.org
Sent: Thu, Mar 21, 2013 11:23 am
Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud


The shard responses get combined in SpellCheckComponent#finishStage .  I highly 
recommend you file a JIRA bug report for this at 
https://issues.apache.org/jira/browse/SOLR 

.  If you write a failing unit test, it would make it much more likely that 
others would help you with a fix.  Of course, if you solve the issue entirely, 
a 

patch would be much appreciated.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: alx...@aim.com [mailto:alx...@aim.com]
Sent: Thursday, March 21, 2013 12:45 PM
To: solr-user@lucene.apache.org
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud

Hello,

We need this feature be fixed ASAP. So, please let me know which class is 
responsible for combining spellcheck results from all shards. I will try to 
debug the code.

Thanks in advance.
Alex.







-Original Message-
From: alxsss alx...@aim.com
To: solr-user solr-user@lucene.apache.org
Sent: Tue, Mar 19, 2013 11:34 am
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud


-- distributed environment.  But to nail it down, we probably need to see both
-- the applicable requestHandler /

Not sure what this is?

I have

 searchComponent name=spellcheck class=solr.SpellCheckComponent

str name=queryAnalyzerFieldTypespell/str

!-- Multiple Spell Checkers can be declared and used by this
 component
  --

!-- a spellchecker built from a field of the main index --
lst name=spellchecker
  str name=namedirect/str
  str name=fieldspell/str
  str name=classnamesolr.DirectSolrSpellChecker/str
  !-- the spellcheck distance measure used, the default is the internal
levenshtein --
  str name=distanceMeasureinternal/str
  !-- minimum accuracy needed to be considered a valid spellcheck
suggestion --
  float name=accuracy0.5/float
  !-- the maximum #edits we consider when enumerating terms: can be 1 or 2
--
  int name=maxEdits2/int
  !-- the minimum shared prefix when enumerating terms --
  int name=minPrefix1/int
  !-- maximum number of inspections per result. --
  int name=maxInspections5/int
  !-- minimum length of a query term to be considered for correction --
  int name=minQueryLength4/int
  !-- maximum threshold of documents a query term can appear to be
considered for correction --
  float name=maxQueryFrequency0.01/float
  !-- uncomment this to require suggestions to occur in 1% of the documents
float name=thresholdTokenFrequency.01/float
  --
/lst

!-- a spellchecker that can break or combine words.  See /spell handler
below for usage --
lst name=spellchecker
  str name=namewordbreak/str
  str name=classnamesolr.WordBreakSolrSpellChecker/str
  str name=fieldspell/str
  str name=combineWordstrue/str
  str name=breakWordstrue/str
  int name=maxChanges10/int
/lst

!-- a spellchecker that uses a different distance measure --
!--
   lst name=spellchecker
 str name=namejarowinkler/str
 str name=fieldspell/str
 str name=classnamesolr.DirectSolrSpellChecker/str
 str name=distanceMeasure
   org.apache.lucene.search.spell.JaroWinklerDistance
 /str
   /lst

Re: strange behaviour of wordbreak spellchecker in solr cloud

2013-03-22 Thread alxsss
Thanks.

I can fix this, but going over code it seems it is not easy to figure out where 
the whole request and response come from.

I followed up  SpellCheckComponent#finishStage
 

 and found out that SearchHandler#handleRequestBody calls this function. 
However, which part calls handleRequestBody and how its arguments are 
constructed is not clear.


Thanks.
Alex.

 

-Original Message-
From: Dyer, James james.d...@ingramcontent.com
To: solr-user solr-user@lucene.apache.org
Sent: Fri, Mar 22, 2013 2:08 pm
Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud


Alex,

I added your comments to SOLR-3758 
(https://issues.apache.org/jira/browse/SOLR-3758) 
, which seems to me to be the very same issue.

If you need this to work now and if you cannot devise a fix yourself, then 
perhaps a workaround is if the query returns with 0 results, re-issue the query 
with rows=0group=false (you would omit all other optional components also). 
 
This will give you back just a spell check result.  I realize this is not 
optimal because it requires the overhead of issuing 2 queries but if you do it 
only in instances the user gets nothing (or very little) back maybe it would be 
tolerable?  Then once a viable fix is devised you can remove the extra code 
from 
your application.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: alx...@aim.com [mailto:alx...@aim.com]
Sent: Friday, March 22, 2013 12:53 PM
To: solr-user@lucene.apache.org
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud

Hello,


Further investigation shows the following pattern, for both DirectIndex and 
wordbreak spellchekers.

Assume that in all cases there are spellchecker results when distrib=false

In distributed mode (distrib=true)
  case when matches=0
1. group=true,  no spellcheck results

2. group=false , there are spellcheck results

  case when matches0
1. group=true, there are spellcheck results
2. group =false, there are spellcheck results


Do these constitute a failing test case?

Thanks.
Alex.





-Original Message-
From: alxsss alx...@aim.com
To: solr-user solr-user@lucene.apache.org
Sent: Thu, Mar 21, 2013 6:50 pm
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud



Hello,

I am debugging the SpellCheckComponent#finishStage.

From the responses I see that not only wordbreak, but also directSpellchecker
does not return some results in distributed mode.
The request handler I was using had

str name=grouptrue/str


So, I desided to turn of grouping and I see spellcheck results in distributed
mode.


curl 
'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandler'
has no spellchek results
but

curl 
'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandler
group=false'
returns results.

So, the conclusion is that grouping causes the distributed spellcheker to fail.

Could please you point me to the class that may be responsible to this issue?

Thanks.
Alex.





-Original Message-
From: Dyer, James james.d...@ingramcontent.com
To: solr-user solr-user@lucene.apache.org
Sent: Thu, Mar 21, 2013 11:23 am
Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud


The shard responses get combined in SpellCheckComponent#finishStage .  I highly
recommend you file a JIRA bug report for this at 
https://issues.apache.org/jira/browse/SOLR

.  If you write a failing unit test, it would make it much more likely that
others would help you with a fix.  Of course, if you solve the issue entirely, a

patch would be much appreciated.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: alx...@aim.com [mailto:alx...@aim.com]
Sent: Thursday, March 21, 2013 12:45 PM
To: solr-user@lucene.apache.org
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud

Hello,

We need this feature be fixed ASAP. So, please let me know which class is
responsible for combining spellcheck results from all shards. I will try to
debug the code.

Thanks in advance.
Alex.







-Original Message-
From: alxsss alx...@aim.com
To: solr-user solr-user@lucene.apache.org
Sent: Tue, Mar 19, 2013 11:34 am
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud


-- distributed environment.  But to nail it down, we probably need to see both
-- the applicable requestHandler /

Not sure what this is?

I have

 searchComponent name=spellcheck class=solr.SpellCheckComponent

str name=queryAnalyzerFieldTypespell/str

!-- Multiple Spell Checkers can be declared and used by this
 component
  --

!-- a spellchecker built from a field of the main index --
lst name=spellchecker
  str name=namedirect/str
  str name=fieldspell/str
  str name=classnamesolr.DirectSolrSpellChecker/str
  !-- the spellcheck distance measure used, the default is the internal
levenshtein --
  str name

Re: strange behaviour of wordbreak spellchecker in solr cloud

2013-03-21 Thread alxsss
Hello,

We need this feature be fixed ASAP. So, please let me know which class is 
responsible for combining spellcheck results from all shards. I will try to 
debug the code.

Thanks in advance.
Alex.

 

 

 

-Original Message-
From: alxsss alx...@aim.com
To: solr-user solr-user@lucene.apache.org
Sent: Tue, Mar 19, 2013 11:34 am
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud


-- distributed environment.  But to nail it down, we probably need to see both
-- the applicable requestHandler /

Not sure what this is?

I have

 searchComponent name=spellcheck class=solr.SpellCheckComponent

str name=queryAnalyzerFieldTypespell/str

!-- Multiple Spell Checkers can be declared and used by this
 component
  --

!-- a spellchecker built from a field of the main index --
lst name=spellchecker
  str name=namedirect/str
  str name=fieldspell/str
  str name=classnamesolr.DirectSolrSpellChecker/str
  !-- the spellcheck distance measure used, the default is the internal 
levenshtein --
  str name=distanceMeasureinternal/str
  !-- minimum accuracy needed to be considered a valid spellcheck 
suggestion --
  float name=accuracy0.5/float
  !-- the maximum #edits we consider when enumerating terms: can be 1 or 2 
--
  int name=maxEdits2/int
  !-- the minimum shared prefix when enumerating terms --
  int name=minPrefix1/int
  !-- maximum number of inspections per result. --
  int name=maxInspections5/int
  !-- minimum length of a query term to be considered for correction --
  int name=minQueryLength4/int
  !-- maximum threshold of documents a query term can appear to be 
considered for correction --
  float name=maxQueryFrequency0.01/float
  !-- uncomment this to require suggestions to occur in 1% of the documents
float name=thresholdTokenFrequency.01/float
  --
/lst

!-- a spellchecker that can break or combine words.  See /spell handler 
below for usage --
lst name=spellchecker
  str name=namewordbreak/str
  str name=classnamesolr.WordBreakSolrSpellChecker/str
  str name=fieldspell/str
  str name=combineWordstrue/str
  str name=breakWordstrue/str
  int name=maxChanges10/int
/lst

!-- a spellchecker that uses a different distance measure --
!--
   lst name=spellchecker
 str name=namejarowinkler/str
 str name=fieldspell/str
 str name=classnamesolr.DirectSolrSpellChecker/str
 str name=distanceMeasure
   org.apache.lucene.search.spell.JaroWinklerDistance
 /str
   /lst
 --
 !-- a spellchecker that use an alternate comparator

 comparatorClass be one of:
  1. score (default)
  2. freq (Frequency first, then score)
  3. A fully qualified class name
  --
!--
   lst name=spellchecker
 str name=namefreq/str
 str name=fieldlowerfilt/str
 str name=classnamesolr.DirectSolrSpellChecker/str
 str name=comparatorClassfreq/str
  --

!-- A spellchecker that reads the list of words from a file --
!--
   lst name=spellchecker
 str name=classnamesolr.FileBasedSpellChecker/str
 str name=namefile/str
 str name=sourceLocationspellings.txt/str
 str name=characterEncodingUTF-8/str
 str name=spellcheckIndexDirspellcheckerFile/str
   /lst
  --
  /searchComponent


spell filed in our schema is called spell and its type also is called spell.
Here are requests


 curl 
'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandlerdistrib=false'
?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
  int name=status0/int
  int name=QTime32/int
  lst name=params
str name=indenttrue/str
str name=shards.qttesthandler/str
str name=qpaulusoles/str
str name=distribfalse/str
str name=rows10/str
  /lst
/lst
lst name=grouped
  lst name=site
int name=matches0/int
int name=ngroups0/int
arr name=groups/
  /lst
/lst
lst name=highlighting/
lst name=spellcheck
  lst name=suggestions/
/lst
/response




 curl 
'server2:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandlerdistrib=false'
?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
  int name=status0/int
  int name=QTime26/int
  lst name=params
str name=indenttrue/str
str name=shards.qttesthandler/str
str name=qpaulusoles/str
str name=distribfalse/str
str name=rows10/str
  /lst
/lst
lst name=grouped
  lst name=site
int name=matches0/int
int name=ngroups0/int
arr name=groups/
  /lst
/lst
lst name=highlighting/
lst name=spellcheck
  lst name=suggestions
lst name=paulusoles
  int name=numFound1/int
  int name=startOffset0/int
  int name=endOffset11/int
  arr name=suggestion
strpaul u soles/str   
  /arr
/lst
str name=collation(paul u soles)/str

Re: strange behaviour of wordbreak spellchecker in solr cloud

2013-03-21 Thread alxsss

Hello,

I am debugging the SpellCheckComponent#finishStage. 
 
From the responses I see that not only wordbreak, but also directSpellchecker 
does not return some results in distributed mode. 
The request handler I was using had 

str name=grouptrue/str


So, I desided to turn of grouping and I see spellcheck results in distributed 
mode.


curl 
'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandler'
has no spellchek results 
but

curl 
'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandler
group=false'
returns results.

So, the conclusion is that grouping causes the distributed spellcheker to fail.

Could please you point me to the class that may be responsible to this issue?

Thanks.
Alex.
 




-Original Message-
From: Dyer, James james.d...@ingramcontent.com
To: solr-user solr-user@lucene.apache.org
Sent: Thu, Mar 21, 2013 11:23 am
Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud


The shard responses get combined in SpellCheckComponent#finishStage .  I highly 
recommend you file a JIRA bug report for this at 
https://issues.apache.org/jira/browse/SOLR 
.  If you write a failing unit test, it would make it much more likely that 
others would help you with a fix.  Of course, if you solve the issue entirely, 
a 
patch would be much appreciated.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: alx...@aim.com [mailto:alx...@aim.com]
Sent: Thursday, March 21, 2013 12:45 PM
To: solr-user@lucene.apache.org
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud

Hello,

We need this feature be fixed ASAP. So, please let me know which class is 
responsible for combining spellcheck results from all shards. I will try to 
debug the code.

Thanks in advance.
Alex.







-Original Message-
From: alxsss alx...@aim.com
To: solr-user solr-user@lucene.apache.org
Sent: Tue, Mar 19, 2013 11:34 am
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud


-- distributed environment.  But to nail it down, we probably need to see both
-- the applicable requestHandler /

Not sure what this is?

I have

 searchComponent name=spellcheck class=solr.SpellCheckComponent

str name=queryAnalyzerFieldTypespell/str

!-- Multiple Spell Checkers can be declared and used by this
 component
  --

!-- a spellchecker built from a field of the main index --
lst name=spellchecker
  str name=namedirect/str
  str name=fieldspell/str
  str name=classnamesolr.DirectSolrSpellChecker/str
  !-- the spellcheck distance measure used, the default is the internal
levenshtein --
  str name=distanceMeasureinternal/str
  !-- minimum accuracy needed to be considered a valid spellcheck
suggestion --
  float name=accuracy0.5/float
  !-- the maximum #edits we consider when enumerating terms: can be 1 or 2
--
  int name=maxEdits2/int
  !-- the minimum shared prefix when enumerating terms --
  int name=minPrefix1/int
  !-- maximum number of inspections per result. --
  int name=maxInspections5/int
  !-- minimum length of a query term to be considered for correction --
  int name=minQueryLength4/int
  !-- maximum threshold of documents a query term can appear to be
considered for correction --
  float name=maxQueryFrequency0.01/float
  !-- uncomment this to require suggestions to occur in 1% of the documents
float name=thresholdTokenFrequency.01/float
  --
/lst

!-- a spellchecker that can break or combine words.  See /spell handler
below for usage --
lst name=spellchecker
  str name=namewordbreak/str
  str name=classnamesolr.WordBreakSolrSpellChecker/str
  str name=fieldspell/str
  str name=combineWordstrue/str
  str name=breakWordstrue/str
  int name=maxChanges10/int
/lst

!-- a spellchecker that uses a different distance measure --
!--
   lst name=spellchecker
 str name=namejarowinkler/str
 str name=fieldspell/str
 str name=classnamesolr.DirectSolrSpellChecker/str
 str name=distanceMeasure
   org.apache.lucene.search.spell.JaroWinklerDistance
 /str
   /lst
 --
 !-- a spellchecker that use an alternate comparator

 comparatorClass be one of:
  1. score (default)
  2. freq (Frequency first, then score)
  3. A fully qualified class name
  --
!--
   lst name=spellchecker
 str name=namefreq/str
 str name=fieldlowerfilt/str
 str name=classnamesolr.DirectSolrSpellChecker/str
 str name=comparatorClassfreq/str
  --

!-- A spellchecker that reads the list of words from a file --
!--
   lst name=spellchecker
 str name=classnamesolr.FileBasedSpellChecker/str
 str name=namefile/str
 str name=sourceLocationspellings.txt/str
 str name=characterEncodingUTF-8/str

Re: strange behaviour of wordbreak spellchecker in solr cloud

2013-03-19 Thread alxsss
Hello,

I was testing my custom testhandler. Direct spellchecker also was not working 
in cloud. After I added 

  arr name=last-components
 strspellcheck/str
   /arr 
to /select requestHandler it worked but the wordbreak spellchecker. I have 
added shards.qt=testhanlder to curl request but it did not solve the issue.

Thanks.
Alex.

 

 

 

-Original Message-
From: Dyer, James james.d...@ingramcontent.com
To: solr-user solr-user@lucene.apache.org
Sent: Tue, Mar 19, 2013 10:30 am
Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud


Mark,

I wasn't sure if Alex is actually testing /select, or if the problem is just 
coming up in /testhandler.  Just wanted to verify that before we get into bug 
reports.

DistributedSpellCheckComponentTest does have 1 little Word Break test scenario 
in it, so we know WordBreakSolrSpellChecker at least works some of the time in 
a 
Distributed environment :) .  Ideally, we should probably use a random test for 
stuff like this as adding a bunch of test scenarios would make this 
already-slower-than-molasses test even slower.  On the other hand, we want to 
test as many possibilities as we can.  Based on DSCCT and it being so 
superficial, I really can't vouch too much for my spell check enhancements 
working as well with shards as they do with a single index.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Tuesday, March 19, 2013 11:49 AM
To: solr-user@lucene.apache.org
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud

My first thought too, but then I saw that he had the spell component in both 
his 
custom testhander and the /select handler, so I'd expect that to work as well.

- Mark

On Mar 19, 2013, at 12:18 PM, Dyer, James james.d...@ingramcontent.com 
wrote:

 Can you try including in your request the shards.qt parameter?  In your 
case, I think you should set it to testhandler.  See 
http://wiki.apache.org/solr/SpellCheckComponent?highlight=%28shards\.qt%29#Distributed_Search_Support
 
for a brief discussion.
 
 James Dyer
 Ingram Content Group
 (615) 213-4311
 
 
 -Original Message-
 From: alx...@aim.com [mailto:alx...@aim.com] 
 Sent: Monday, March 18, 2013 4:07 PM
 To: solr-user@lucene.apache.org
 Subject: strange behaviour of wordbreak spellchecker in solr cloud
 
 Hello,
 
 I try to use wordbreak spellchecker in solr-4.2 with cloud feature. We have 
two server with one shard in each of them.
 
 curl 'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10'
 curl 'server2:8983/solr/test/testhandler?q=paulusolesindent=truerows=10'
 
 does not return any results in spellchecker. However, if I specify 
distrib=false only one of these has spellchecker results.
 
 curl 
 'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10distrib=false'
 
 no spellcheler results 
 
 curl 
 'server2:8983/solr/test/testhandler?q=paulusolesindent=truerows=10distrib=false'
 returns spellcheker results.
 
 
 My testhandler and select handlers are as follows
 
 
 requestHandler name=/testhandler class=solr.SearchHandler 
 lst name=defaults
 str name=defTypeedismax/str
 str name=echoParamsexplicit/str
 float name=tie0.01/float
 str name=qfhost^30  content^0.5 title^1.2 /str
 str name=pfsite^25 content^10 title^22/str
 str name=flurl,id,title/str
 !-- str name=mm2-1 5-3 690%/str --
 str name=mm3-1 5-3 690%/str
 int name=ps1/int
 
 str name=hltrue/str
 str name=hl.flcontent/str
 str name=f.content.hl.fragmenterregex/str
 str name=hl.fragsize165/str
 str name=hl.fragmentsBuilderdefault/str
 
 
 str name=spellcheck.dictionarydirect/str
 str name=spellcheck.dictionarywordbreak/str
 str name=spellcheckon/str
 str name=spellcheck.collatetrue/str
 str name=spellcheck.onlyMorePopularfalse/str
 str name=spellcheck.count2/str
 
 /lst
 
 arr name=last-components
 strspellcheck/str
 /arr
 
 /requestHandler
 
 
  requestHandler name=/select class=solr.SearchHandler
!-- default values for query parameters can be specified, these
 will be overridden by parameters in the request
  --
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
   !-- str name=dftext/str --
 /lst
!-- In addition to defaults, appends params can be specified
 to identify values which should be appended to the list of
 multi-val params from the query (or the existing defaults).
  --
!-- In this example, the param fq=instock:true would be appended to
 any query time fq params the user may specify, as a mechanism for
 partitioning the index, independent of any user selected filtering
 that may also be desired (perhaps as a result of faceted searching).
 
 NOTE: there is *absolutely* nothing a client can do to prevent these
 appends values from being used, so don't use this mechanism
 unless you are sure you always want it.
  --

Re: strange behaviour of wordbreak spellchecker in solr cloud

2013-03-19 Thread alxsss
-- distributed environment.  But to nail it down, we probably need to see both
-- the applicable requestHandler /

Not sure what this is?

I have

 searchComponent name=spellcheck class=solr.SpellCheckComponent

str name=queryAnalyzerFieldTypespell/str

!-- Multiple Spell Checkers can be declared and used by this
 component
  --

!-- a spellchecker built from a field of the main index --
lst name=spellchecker
  str name=namedirect/str
  str name=fieldspell/str
  str name=classnamesolr.DirectSolrSpellChecker/str
  !-- the spellcheck distance measure used, the default is the internal 
levenshtein --
  str name=distanceMeasureinternal/str
  !-- minimum accuracy needed to be considered a valid spellcheck 
suggestion --
  float name=accuracy0.5/float
  !-- the maximum #edits we consider when enumerating terms: can be 1 or 2 
--
  int name=maxEdits2/int
  !-- the minimum shared prefix when enumerating terms --
  int name=minPrefix1/int
  !-- maximum number of inspections per result. --
  int name=maxInspections5/int
  !-- minimum length of a query term to be considered for correction --
  int name=minQueryLength4/int
  !-- maximum threshold of documents a query term can appear to be 
considered for correction --
  float name=maxQueryFrequency0.01/float
  !-- uncomment this to require suggestions to occur in 1% of the documents
float name=thresholdTokenFrequency.01/float
  --
/lst

!-- a spellchecker that can break or combine words.  See /spell handler 
below for usage --
lst name=spellchecker
  str name=namewordbreak/str
  str name=classnamesolr.WordBreakSolrSpellChecker/str
  str name=fieldspell/str
  str name=combineWordstrue/str
  str name=breakWordstrue/str
  int name=maxChanges10/int
/lst

!-- a spellchecker that uses a different distance measure --
!--
   lst name=spellchecker
 str name=namejarowinkler/str
 str name=fieldspell/str
 str name=classnamesolr.DirectSolrSpellChecker/str
 str name=distanceMeasure
   org.apache.lucene.search.spell.JaroWinklerDistance
 /str
   /lst
 --
 !-- a spellchecker that use an alternate comparator

 comparatorClass be one of:
  1. score (default)
  2. freq (Frequency first, then score)
  3. A fully qualified class name
  --
!--
   lst name=spellchecker
 str name=namefreq/str
 str name=fieldlowerfilt/str
 str name=classnamesolr.DirectSolrSpellChecker/str
 str name=comparatorClassfreq/str
  --

!-- A spellchecker that reads the list of words from a file --
!--
   lst name=spellchecker
 str name=classnamesolr.FileBasedSpellChecker/str
 str name=namefile/str
 str name=sourceLocationspellings.txt/str
 str name=characterEncodingUTF-8/str
 str name=spellcheckIndexDirspellcheckerFile/str
   /lst
  --
  /searchComponent


spell filed in our schema is called spell and its type also is called spell.
Here are requests


 curl 
'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandlerdistrib=false'
?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
  int name=status0/int
  int name=QTime32/int
  lst name=params
str name=indenttrue/str
str name=shards.qttesthandler/str
str name=qpaulusoles/str
str name=distribfalse/str
str name=rows10/str
  /lst
/lst
lst name=grouped
  lst name=site
int name=matches0/int
int name=ngroups0/int
arr name=groups/
  /lst
/lst
lst name=highlighting/
lst name=spellcheck
  lst name=suggestions/
/lst
/response




 curl 
'server2:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandlerdistrib=false'
?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
  int name=status0/int
  int name=QTime26/int
  lst name=params
str name=indenttrue/str
str name=shards.qttesthandler/str
str name=qpaulusoles/str
str name=distribfalse/str
str name=rows10/str
  /lst
/lst
lst name=grouped
  lst name=site
int name=matches0/int
int name=ngroups0/int
arr name=groups/
  /lst
/lst
lst name=highlighting/
lst name=spellcheck
  lst name=suggestions
lst name=paulusoles
  int name=numFound1/int
  int name=startOffset0/int
  int name=endOffset11/int
  arr name=suggestion
strpaul u soles/str   
  /arr
/lst
str name=collation(paul u soles)/str
  /lst
/lst
/response

No distrib param

curl 
'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandler'
?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
  int name=status0/int
  int name=QTime24/int
  lst name=params
str name=indenttrue/str
str name=shards.qttesthandler/str
str name=qpaulusoles/str
str name=distribfalse/str
str name=rows10/str
  

Re: structure of solr index

2013-03-18 Thread alxsss

 
---So,search time is in no way impacting by the existence or non-existence of 
stored values,




 What about memory? Would it require to increase memeory in order to have the 
same Qtime as in the case of indexed only fields?
For example in the case of indexed fields only index size is 5GB, average Qtime 
is 0.1 sec and memory is 10G. 
In case when the same fields are indexed and stored index size is 50GB. Will 
the Qtime be 0.1s + time for extracting of stored fields?

Another scenario is to store fields in hbase or cassandra, have only indexed 
fields in Solr and after getting id field from solr extract stored values from 
hbase or cassandra. Will this setup be faster than the  one with stored fields 
in Solr?

Thanks.
Alex.

 

-Original Message-
From: Jack Krupansky j...@basetechnology.com
To: solr-user solr-user@lucene.apache.org
Sent: Sat, Mar 16, 2013 9:53 am
Subject: Re: structure of solr index


Search depends only on the index. But... returning field values for each 
of the matched documents does require access to the stored values. So, 
search time is in no way impacting by the existence or non-existence of 
stored values, but total query processing time would of course include both 
search time and the time to access and format the stored field values.

-- Jack Krupansky

-Original Message- 
From: alx...@aim.com
Sent: Saturday, March 16, 2013 12:48 PM
To: solr-user@lucene.apache.org
Subject: Re: structure of solr index

Hi,

So, will search time be the same for the case when fields are indexed only 
vs  the case when they are indexed and stored?



Thanks.
Alex.



-Original Message-
From: Otis Gospodnetic otis.gospodne...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Fri, Mar 15, 2013 8:09 pm
Subject: Re: structure of solr index


Hi,

I think you are asking if the original/raw content of those fields will be
read.  No, it won't, not for the search itself.  If you want to
retrieve/return those fields then, of course, they will be read for the
documents being returned.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Fri, Mar 15, 2013 at 2:41 PM, alx...@aim.com wrote:

 Hi,

 I wondered if solr searches on indexed fields only or on entire index? In
 more detail, let say I have fields id,  title and content, all  indexed,
 stored. Will a search send all these fields to memory or only indexed part
 of these fields?

 Thanks.
 Alex.






 


strange behaviour of wordbreak spellchecker in solr cloud

2013-03-18 Thread alxsss
Hello,

I try to use wordbreak spellchecker in solr-4.2 with cloud feature. We have two 
server with one shard in each of them.

curl 'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10'
curl 'server2:8983/solr/test/testhandler?q=paulusolesindent=truerows=10'

does not return any results in spellchecker. However, if I specify 
distrib=false only one of these has spellchecker results.

curl 
'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10distrib=false'

no spellcheler results 

curl 
'server2:8983/solr/test/testhandler?q=paulusolesindent=truerows=10distrib=false'
returns spellcheker results.


My testhandler and select handlers are as follows


requestHandler name=/testhandler class=solr.SearchHandler 
lst name=defaults
str name=defTypeedismax/str
str name=echoParamsexplicit/str
float name=tie0.01/float
str name=qfhost^30  content^0.5 title^1.2 /str
str name=pfsite^25 content^10 title^22/str
str name=flurl,id,title/str
!-- str name=mm2-1 5-3 690%/str --
str name=mm3-1 5-3 690%/str
int name=ps1/int

str name=hltrue/str
str name=hl.flcontent/str
str name=f.content.hl.fragmenterregex/str
str name=hl.fragsize165/str
str name=hl.fragmentsBuilderdefault/str


str name=spellcheck.dictionarydirect/str
str name=spellcheck.dictionarywordbreak/str
str name=spellcheckon/str
str name=spellcheck.collatetrue/str
str name=spellcheck.onlyMorePopularfalse/str
str name=spellcheck.count2/str

/lst

arr name=last-components
 strspellcheck/str
/arr

/requestHandler


  requestHandler name=/select class=solr.SearchHandler
!-- default values for query parameters can be specified, these
 will be overridden by parameters in the request
  --
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
   !-- str name=dftext/str --
 /lst
!-- In addition to defaults, appends params can be specified
 to identify values which should be appended to the list of
 multi-val params from the query (or the existing defaults).
  --
!-- In this example, the param fq=instock:true would be appended to
 any query time fq params the user may specify, as a mechanism for
 partitioning the index, independent of any user selected filtering
 that may also be desired (perhaps as a result of faceted searching).

 NOTE: there is *absolutely* nothing a client can do to prevent these
 appends values from being used, so don't use this mechanism
 unless you are sure you always want it.
  --
!--
   lst name=appends
 str name=fqinStock:true/str
   /lst
  --
!-- invariants are a way of letting the Solr maintainer lock down
 the options available to Solr clients.  Any params values
 specified here are used regardless of what values may be specified
 in either the query, the defaults, or the appends params.

 In this example, the facet.field and facet.query params would
 be fixed, limiting the facets clients can use.  Faceting is
 not turned on by default - but if the client does specify
 facet=true in the request, these are the only facets they
 will be able to see counts for; regardless of what other
 facet.field or facet.query params they may specify.

 NOTE: there is *absolutely* nothing a client can do to prevent these
 invariants values from being used, so don't use this mechanism
 unless you are sure you always want it.
  --
!--
   lst name=invariants
 str name=facet.fieldcat/str
 str name=facet.fieldmanu_exact/str
 str name=facet.queryprice:[* TO 500]/str
 str name=facet.queryprice:[500 TO *]/str
   /lst
  --
!-- If the default list of SearchComponents is not desired, that
 list can either be overridden completely, or components can be
 prepended or appended to the default list.  (see below)
  --
!--
   arr name=components
 strnameOfCustomComponent1/str
 strnameOfCustomComponent2/str
   /arr
  --
   arr name=last-components
 strspellcheck/str
   /arr 
/requestHandler



is this a bug or something else has to be done?


Thanks.
Alex.



Re: structure of solr index

2013-03-16 Thread alxsss
Hi,

So, will search time be the same for the case when fields are indexed only vs  
the case when they are indexed and stored?

 

 Thanks.
Alex.

 

-Original Message-
From: Otis Gospodnetic otis.gospodne...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Fri, Mar 15, 2013 8:09 pm
Subject: Re: structure of solr index


Hi,

I think you are asking if the original/raw content of those fields will be
read.  No, it won't, not for the search itself.  If you want to
retrieve/return those fields then, of course, they will be read for the
documents being returned.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Fri, Mar 15, 2013 at 2:41 PM, alx...@aim.com wrote:

 Hi,

 I wondered if solr searches on indexed fields only or on entire index? In
 more detail, let say I have fields id,  title and content, all  indexed,
 stored. Will a search send all these fields to memory or only indexed part
 of these fields?

 Thanks.
 Alex.




 


structure of solr index

2013-03-15 Thread alxsss
Hi,

I wondered if solr searches on indexed fields only or on entire index? In more 
detail, let say I have fields id,  title and content, all  indexed, stored. 
Will a search send all these fields to memory or only indexed part of these 
fields? 

Thanks.
Alex.




spellchecker does not have suggestion for keywords typed through a non-whitespace delimiter

2013-03-12 Thread alxsss
Hello,

Recently we noticed that solr and its spellchecker do not return results  for 
keywords typed with non-whitespace delimiter.
A user accidentally typed u instead of white space. For example, paulusoles 
instead of paul soles. Solr does not return any results or spellcheck 
suggestion for keyword paulusoles, although it returns results for keywords 
paul soles, paul, and soles.

search.yahoo.com  returns results for the  keyword paulusoles as if it was 
given keyword paul soles.

Any ideas how to implement this functionality in solr?

text and spell fields are as follows;

  fieldType name=text class=solr.TextField positionIncrementGap=100 
termVectors=true termPositions=true termOffsets=true
analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 splitOnNumerics=0/
filter class=solr.ICUFoldingFilterFactory /
filter class=solr.SnowballPorterFilterFactory 
language=Spanish /
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

  fieldType name=spell class=solr.TextField positionIncrementGap=100
analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 splitOnNumerics=0/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType


str name=spellchecktrue/str
str name=spellcheck.dictionarydirect/str
str name=spellcheck.collatetrue/str
str name=spellcheck.onlyMorePopulartrue/str
str name=spellcheck.count2/str

This is solr -4.1.0 with cloud feature and index based dictionary.

Thanks.
Alex.


Re: solr cloud index size is too big

2013-03-04 Thread alxsss
Hi,

It is the index folder. tlog is only a few MB.

I have analysed all changed and found out that only one field in schema was 
changed.

This field in non cloud
 fieldType name=text class=solr.TextField positionIncrementGap=100

was changed to
 fieldType name=text class=solr.TextField positionIncrementGap=100 
termVectors=true termPositions=true termOffsets=true

 in cloud to use fastVectorHighlighting.

Is it possible that this change could double index size?

Thanks.
Alex.

 

 

-Original Message-
From: Jan Høydahl jan@cominvent.com
To: solr-user solr-user@lucene.apache.org
Sent: Mon, Mar 4, 2013 2:24 pm
Subject: Re: solr cloud index size is too big


Can you tell whether it's the index folder that is that large or is it 
including the tlog transaction log folder?
If you have a huge transaction log, you need to start sending hard commits more 
often during indexing to flush the tlogs.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

4. mars 2013 kl. 04:16 skrev alx...@aim.com:

 Hello,
 
 I had a non cloud collection index size around 80G for 15M documents with 
solr-4.1.0. So, I decided to use solr cloud with two shards and sent to solr 
the 
following command
 
 curl 
 'http://slave:8983/solr/admin/collections?action=CREATEname=mycollectionnumShards=2replicationFactor=1maxShardsPerNode=1'
 
 I tried to put replicationFactor=0 but this command gave an error.  After 
reindexing, into two separate linux boxes with one instances of solr running in 
each of them I see that size of index in each shard is 90GB versus expected 
40GB 
although each of the shards has half (7.5M) of  documents.
 
 Any ideas what went wrong?
 
 Thanks.
 Alex.


 


Re: How do I create two collections on the same cluster?

2013-02-22 Thread alxsss
Hi,

What if you add new collection to solr.xml file?

Alex.

 

 

 

-Original Message-
From: Shankar Sundararaju shan...@ebrary.com
To: solr-user solr-user@lucene.apache.org
Sent: Thu, Feb 21, 2013 8:51 pm
Subject: How do I create two collections on the same cluster?


I am using Solr 4.1.

I created collection1 consisting of 2 leaders and 2 replicas (2 shards) at
boot time.

After the cluster is up, I am trying to create collection2 with 2 leaders
and 2 replicas just like collection1. I am using following collections API
for that:

http://localhost:7575/solr/admin/collections?action=CREATEname=collection2numShards=2replicationFactor=2collection.configName=myconfcreateNodeSet=localhost:8983_solr,localhost:7574_solr,localhost:7575_solr,localhost:7576_solr

Yes, collection2 does get created. But I see a problem - createNodeSet
parameter is not being honored. All 4 nodes are not being used to create
collection2, only 3 are being used. Is this a bug or I don't understand how
this parameter should be used?

What is the best way to create collection2? Can I specify both collections
in solr.xml in the solr home dir in all nodes and launch them? Do I have to
get the configs for collection2 uploaded to zookeeper before I launch the
nodes?

Thanks in advance.

-Shankar

-- 
Regards,
*Shankar Sundararaju
*Sr. Software Architect
ebrary, a ProQuest company
410 Cambridge Avenue, Palo Alto, CA 94306 USA
shan...@ebrary.com | www.ebrary.com | 650-475-8776 (w) | 408-426-3057 (c)

 


how to overrride pre and post tags when usefastVectorHighlighter is set to true

2013-02-22 Thread alxsss
Hello,

I was unable to change pre and post tags for highlighting when 
usefastVectorHighlighter is set to true. Changing default tags in 
solrconfig.xml works for standard highlighter though. I searched mailing list 
and the net with no success.
I use solr-4.1.0.

Thanks.
Alex.


Re: long QTime for big index

2013-02-14 Thread alxsss
Hi,

It is curious to know how many linux boxes do you have and how many cores in 
each of them. It was my understanding that solr puts in the memory all 
documents found for a keyword, not the whole index. So, why it must be faster 
with more cores, when number of selected documents from many separate cores  
are the same as from one core? 

Thanks.
Alex.

 

 

 

-Original Message-
From: Mou mouna...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Thu, Feb 14, 2013 2:35 pm
Subject: Re: long QTime for big index


Just to close this discussion , we solved the problem by splitting the index.
It turned out that distributed search with 12 cores are faster than
searching two cores.

All queries ,tomcat configuration, jvm configuration remain same. Now
queries are served in milliseconds.


On Thu, Jan 31, 2013 at 9:34 PM, Mou [via Lucene]
ml-node+s472066n4037870...@n3.nabble.com wrote:
 Thank you again.

 Unfortunately the index files will not fit in the RAM.I have to try using
 document cache. I am also moving my index to SSD again, we took our index
 off when fusion IO cards failed twice during indexing and index was
 corrupted.Now with the bios upgrade and new driver, it is supposed to be
 more reliable.

 Also I am going to look into the client app to verify that it is making
 proper query requests.

 Surprisingly when I used a much lower value than default for
 defaultconnectionperhost and maxconnectionperhost in solrmeter , it performs
 very well, the same queries return in less than one sec . I am not sure yet,
 need to run solrmeter with different heap size , with cache and without
 cache etc.

 
 If you reply to this email, your message will be added to the discussion
 below:
 http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4037870.html
 To unsubscribe from long QTime for big index, click here.
 NAML




--
View this message in context: 
http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4040535.html
Sent from the Solr - User mailing list archive at Nabble.com.

 


Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread alxsss
Depending on your architecture, why not index the same data into two machines? 
One will be your prod another your backup?

Thanks.
Alex.

 

 

 

-Original Message-
From: Upayavira u...@odoko.co.uk
To: solr-user solr-user@lucene.apache.org
Sent: Thu, Dec 20, 2012 11:51 am
Subject: Re: Pause and resume indexing on SolR 4 for backups


You're saying that there's no chance to catch it in the middle of
writing the segments file?

Having said that, the segments file is pretty small, so the chance would
be pretty slim.

Upayavira

On Thu, Dec 20, 2012, at 06:45 PM, Lance Norskog wrote:
 To be clear: 1) is fine. Lucene index updates are carefully sequenced so 
 that the index is never in a bogus state. All data files are written and 
 flushed to disk, then the segments.* files are written that match the 
 data files. You can capture the files with a set of hard links to create 
 a backup.
 
 The CheckIndex program will verify the index backup.
 java -cp yourcopy/lucene-core-SOMETHING.jar 
 org.apache.lucene.index.CheckIndex collection/data/index
 
 lucene-core-SOMETHING.jar is usually in the solr-webapp directory where 
 Solr is unpacked.
 
 On 12/20/2012 02:16 AM, Andy D'Arcy Jewell wrote:
  Hi all.
 
  Can anyone advise me of a way to pause and resume SolR 4 so I can 
  perform a backup? I need to be able to revert to a usable (though not 
  necessarily complete) index after a crash or other disaster more 
  quickly than a re-index operation would yield.
 
  I can't yet afford the extravagance of a separate SolR replica just 
  for backups, and I'm not sure if I'll ever have the luxury. I'm 
  currently running with just one node, be we are not yet live.
 
  I can think of the following ways to do this, each with various 
  downsides:
 
  1) Just backup the existing index files whilst indexing continues
  + Easy
  + Fast
  - Incomplete
  - Potential for corruption? (e.g. partial files)
 
  2) Stop/Start Tomcat
  + Easy
  - Very slow and I/O, CPU intensive
  - Client gets errors when trying to connect
 
  3) Block/unblock SolR port with IpTables
  + Fast
  - Client gets errors when trying to connect
  - Have to wait for existing transactions to complete (not sure 
  how, maybe watch socket FD's in /proc)
 
  4) Pause/Restart SolR service
  + Fast ? (hopefully)
  - Client gets errors when trying to connect
 
  In any event, the web app will have to gracefully handle 
  unavailability of SolR, probably by displaying a down for 
  maintenance message, but this should preferably be only a very short 
  amount of time.
 
  Can anyone comment on my proposed solutions above, or provide any 
  additional ones?
 
  Thanks for any input you can provide!
 
  -Andy
 
 

 


Re: Grouping performance problem

2012-07-16 Thread alxsss



Re: Grouping performance problem

2012-07-16 Thread alxsss
This is strange. We have data folder size 24Gb,  RAM for java 2GB. We query 
with grouping, ngroups and  highlighting, do not query all fields and query 
time mostly is less than 1 sec it rarely goes up to 2 sec. We use solr 3.6 and 
tuned off all kind of caching.
Maybe your problem is with caching and displaying all fields?

Hope this may help.

Alex.



-Original Message-
From: Agnieszka Kukałowicz agnieszka.kukalow...@usable.pl
To: solr-user solr-user@lucene.apache.org
Sent: Mon, Jul 16, 2012 10:04 am
Subject: Re: Grouping performance problem


I have server with 24GB RAM. I have 4 shards on it, each of them with 4GB
RAM for java:
JAVA_OPTIONS=-server -Xms4096M -Xmx4096M
The size is about 15GB for one shard (i use ssd disk for index data).

Agnieszka


2012/7/16 alx...@aim.com

 What are the RAM of your server and size of the data folder?



 -Original Message-
 From: Agnieszka Kukałowicz agnieszka.kukalow...@usable.pl
 To: solr-user solr-user@lucene.apache.org
 Sent: Mon, Jul 16, 2012 6:16 am
 Subject: Re: Grouping performance problem


 Hi Pavel,

 I tried with group.ngroups=false but didn't notice a big improvement.
 The times were still about 4000 ms. It doesn't solve my problem.
 Maybe this is because of my index type. I have millions of documents but
 only about 20 000 groups.

  Cheers
  Agnieszka

 2012/7/16 Pavel Goncharik pavel.goncha...@gmail.com

  Hi Agnieszka ,
 
  if you don't need number of groups, you can try leaving out
  group.ngroups=true param.
  In this case Solr apparently skips calculating all groups and delivers
  results much faster.
  At least for our application the difference in performance
  with/without group.ngroups=true is significant (have to say, we use
  Solr 3.6).
 
  WBR,
  Pavel
 
  On Mon, Jul 16, 2012 at 1:00 PM, Agnieszka Kukałowicz
  agnieszka.kukalow...@usable.pl wrote:
   Hi,
  
   Is the any way to make grouping searches more efficient?
  
   My queries look like:
  
 
 /select?q=querygroup=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1
  
   For index with 3 mln documents query for all docs with group=true takes
   almost 4000ms. Because queryResultCache is not used next queries take a
   long time also.
  
   When I remove group=true and leave only faceting the query for all docs
   takes much more less time: for first time ~ 700ms and next runs only
  200ms
   because of queryResultCache being used.
  
   So with group=true the query is about 20 time slower than without it.
   Is it possible or is there any way to improve performance with
 grouping?
  
   My application needs grouping feature and all of the queries use it but
  the
   performance of them is to low for production use.
  
   I use Solr 4.x from trunk
  
   Agnieszka Kukalowicz
 




 


Re: Broken pipe error

2012-07-03 Thread alxsss
I had the same problem with jetty. It turned out that broken pipe happens  when 
application disconnects from jetty. In my case I was using php client and it 
had 10 sec restriction in curl request. When solr takes more than 10 sec to 
respond, curl automatically disconnected from jetty.

Hope this can help.

Alex.



-Original Message-
From: Jason hialo...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Mon, Jul 2, 2012 7:41 pm
Subject: Broken pipe error


Hi, all

We're independently running three search servers.
One of three servers has bigger index size and more connection users than
the others.
Except that, all configurations are same.
Problem is that server sometimes occurs broken pipe error.
But I don't know what problem is.
Please give some ideas.
Thanks in advance.
Jason


error message below...
===
2012-07-03 10:42:56,753 [http-8080-exec-3677] ERROR
org.apache.solr.servlet.SolrDispatchFilter - null:ClientAbortException: 
java.io.IOException: Broken pipe
at
org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:358)
at
org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:432)
at
org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:309)
at
org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:288)
at
org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:98)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:278)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)
at org.apache.solr.util.FastWriter.flush(FastWriter.java:115)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:402)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:279)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:470)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
at
org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:732)
at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2262)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:69)
at sun.nio.ch.IOUtil.write(IOUtil.java:40)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
at org.apache.tomcat.util.net.NioChannel.write(NioChannel.java:116)
at
org.apache.tomcat.util.net.NioBlockingSelector.write(NioBlockingSelector.java:93)
at
org.apache.tomcat.util.net.NioSelectorPool.write(NioSelectorPool.java:156)
at
org.apache.coyote.http11.InternalNioOutputBuffer.writeToSocket(InternalNioOutputBuffer.java:460)
at
org.apache.coyote.http11.InternalNioOutputBuffer.flushBuffer(InternalNioOutputBuffer.java:804)
at
org.apache.coyote.http11.InternalNioOutputBuffer.addToBB(InternalNioOutputBuffer.java:644)
at
org.apache.coyote.http11.InternalNioOutputBuffer.access$000(InternalNioOutputBuffer.java:46)
at
org.apache.coyote.http11.InternalNioOutputBuffer$SocketOutputBuffer.doWrite(InternalNioOutputBuffer.java:829)
at
org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:126)
at
org.apache.coyote.http11.InternalNioOutputBuffer.doWrite(InternalNioOutputBuffer.java:610)
at org.apache.coyote.Response.doWrite(Response.java:560)
at
org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:353)
... 25 more

--
View this message in context: 

Re: Removing old documents

2012-05-02 Thread alxsss

 

 I use jetty that comes with solr. 
I use solr's dedupe

updateRequestProcessorChain name=dedupe
   processor class=solr.processor.SignatureUpdateProcessorFactory
 bool name=enabledtrue/bool
 str name=signatureFieldid/str
 bool name=overwriteDupestrue/bool
 str name=fieldsurl/str
 str name=signatureClasssolr.processor.Lookup3Signature/str
   /processor
   processor class=solr.LogUpdateProcessorFactory /
   processor class=solr.RunUpdateProcessorFactory /
 /updateRequestProcessorChain


and because of this id is not url itself but its encoded signature.

I see solrclean uses url to delete a document.

Is it possible that the issue is because of this mismatch?


Thanks.
Alex.


 

-Original Message-
From: Paul Libbrecht p...@hoplahup.net
To: solr-user solr-user@lucene.apache.org
Sent: Tue, May 1, 2012 11:43 pm
Subject: Re: Removing old documents


With which client?

paul


Le 2 mai 2012 à 01:29, alx...@aim.com a écrit :

 all caching is disabled and I restarted jetty. The same results.


 


Re: Removing old documents

2012-05-01 Thread alxsss
Hello,

I did bin/nutch solrclean crawl/crawldb http://127.0.0.1:8983/solr/

without and with -noCommit  and restarted solr server

Log  shows that 5 documents were removed but they are still in the search 
results.
Is this a bug or something is missing?
I use nutch-1.4 and solr 3.5

Thanks.
Alex. 

 

 

 

-Original Message-
From: Markus Jelsma markus.jel...@openindex.io
To: solr-user solr-user@lucene.apache.org
Sent: Tue, May 1, 2012 7:41 am
Subject: Re: Removing old documents


Nutch 1.4 has a separate tool to remove 404 and redirects documents from your 
index based on your CrawlDB. Trunk's SolrIndexer can add and remove documents 
in one run based on segment data.

On Tuesday 01 May 2012 16:31:47 Bai Shen wrote:
 I'm running Nutch, so it's updating the documents, but I'm wanting to
 remove ones that are no longer available.  So in that case, there's no
 update possible.
 
 On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk 
 
 mav.p...@holidaylettings.co.uk wrote:
  Not sure if there is an automatic way but we do it via a delete query and
  where possible we update doc under same id to avoid deletes.
  
  On 01/05/2012 13:43, Bai Shen baishen.li...@gmail.com wrote:
  What is the best method to remove old documents?  Things that no
  generate 404 errors, etc.
  
  Is there an automatic method or do I have to do it manually?
  
  THanks.

-- 
Markus Jelsma - CTO - Openindex

 


Re: Removing old documents

2012-05-01 Thread alxsss

 

 all caching is disabled and I restarted jetty. The same results.

Thanks.
Alex.

 

-Original Message-
From: Lance Norskog goks...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Tue, May 1, 2012 2:57 pm
Subject: Re: Removing old documents


Maybe this is the HTTP caching feature? Solr comes with HTTP caching
turned on by default and so when you do queries and changes your
browser does not fetch your changed documents.

On Tue, May 1, 2012 at 11:53 AM,  alx...@aim.com wrote:
 Hello,

 I did bin/nutch solrclean crawl/crawldb http://127.0.0.1:8983/solr/

 without and with -noCommit  and restarted solr server

 Log  shows that 5 documents were removed but they are still in the search 
results.
 Is this a bug or something is missing?
 I use nutch-1.4 and solr 3.5

 Thanks.
 Alex.







 -Original Message-
 From: Markus Jelsma markus.jel...@openindex.io
 To: solr-user solr-user@lucene.apache.org
 Sent: Tue, May 1, 2012 7:41 am
 Subject: Re: Removing old documents


 Nutch 1.4 has a separate tool to remove 404 and redirects documents from your
 index based on your CrawlDB. Trunk's SolrIndexer can add and remove documents
 in one run based on segment data.

 On Tuesday 01 May 2012 16:31:47 Bai Shen wrote:
 I'm running Nutch, so it's updating the documents, but I'm wanting to
 remove ones that are no longer available.  So in that case, there's no
 update possible.

 On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk 

 mav.p...@holidaylettings.co.uk wrote:
  Not sure if there is an automatic way but we do it via a delete query and
  where possible we update doc under same id to avoid deletes.
 
  On 01/05/2012 13:43, Bai Shen baishen.li...@gmail.com wrote:
  What is the best method to remove old documents?  Things that no
  generate 404 errors, etc.
  
  Is there an automatic method or do I have to do it manually?
  
  THanks.

 --
 Markus Jelsma - CTO - Openindex





-- 
Lance Norskog
goks...@gmail.com

 


Re: term frequency outweighs exact phrase match

2012-04-13 Thread alxsss
Hello Hoss,

Here are the explain tags for two doc

str name=a0127d8e70a6d523
0.021646015 = (MATCH) sum of:
  0.021646015 = (MATCH) sum of:
0.02141003 = (MATCH) max plus 0.01 times others of:
  2.84194E-4 = (MATCH) weight(content:apache^0.5 in 3578), product of:
0.0029881175 = queryWeight(content:apache^0.5), product of:
  0.5 = boost
  4.3554416 = idf(docFreq=126092, maxDocs=3613605)
  0.0013721307 = queryNorm
0.09510804 = (MATCH) fieldWeight(content:apache in 3578), product of:
  2.236068 = tf(termFreq(content:apache)=5)
  4.3554416 = idf(docFreq=126092, maxDocs=3613605)
  0.009765625 = fieldNorm(field=content, doc=3578)
  0.021407187 = (MATCH) weight(title:apache^1.2 in 3578), product of:
0.01371095 = queryWeight(title:apache^1.2), product of:
  1.2 = boost
  8.327043 = idf(docFreq=2375, maxDocs=3613605)
  0.0013721307 = queryNorm
1.5613205 = (MATCH) fieldWeight(title:apache in 3578), product of:
  1.0 = tf(termFreq(title:apache)=1)
  8.327043 = idf(docFreq=2375, maxDocs=3613605)
  0.1875 = fieldNorm(field=title, doc=3578)
2.359865E-4 = (MATCH) max plus 0.01 times others of:
  2.359865E-4 = (MATCH) weight(content:solr^0.5 in 3578), product of:
0.004071705 = queryWeight(content:solr^0.5), product of:
  0.5 = boost
  5.9348645 = idf(docFreq=25986, maxDocs=3613605)
  0.0013721307 = queryNorm
0.05795766 = (MATCH) fieldWeight(content:solr in 3578), product of:
  1.0 = tf(termFreq(content:solr)=1)
  5.9348645 = idf(docFreq=25986, maxDocs=3613605)
  0.009765625 = fieldNorm(field=content, doc=3578)
/strstr name=d89380e313c64aa5
0.021465056 = (MATCH) sum of:
  1.8154096E-4 = (MATCH) sum of:
6.354771E-5 = (MATCH) max plus 0.01 times others of:
  6.354771E-5 = (MATCH) weight(content:apache^0.5 in 638040), product of:
0.0029881175 = queryWeight(content:apache^0.5), product of:
  0.5 = boost
  4.3554416 = idf(docFreq=126092, maxDocs=3613605)
  0.0013721307 = queryNorm
0.021266805 = (MATCH) fieldWeight(content:apache in 638040), product of:
  1.0 = tf(termFreq(content:apache)=1)
  4.3554416 = idf(docFreq=126092, maxDocs=3613605)
  0.0048828125 = fieldNorm(field=content, doc=638040)
1.1799325E-4 = (MATCH) max plus 0.01 times others of:
  1.1799325E-4 = (MATCH) weight(content:solr^0.5 in 638040), product of:
0.004071705 = queryWeight(content:solr^0.5), product of:
  0.5 = boost
  5.9348645 = idf(docFreq=25986, maxDocs=3613605)
  0.0013721307 = queryNorm
0.02897883 = (MATCH) fieldWeight(content:solr in 638040), product of:
  1.0 = tf(termFreq(content:solr)=1)
  5.9348645 = idf(docFreq=25986, maxDocs=3613605)
  0.0048828125 = fieldNorm(field=content, doc=638040)
  0.021283515 = (MATCH) weight(content:apache solr~1^30.0 in 638040), product 
of:
0.42358932 = queryWeight(content:apache solr~1^30.0), product of:
  30.0 = boost
  10.290306 = idf(content: apache=126092 solr=25986)
  0.0013721307 = queryNorm
0.050245635 = fieldWeight(content:apache solr in 638040), product of:
  1.0 = tf(phraseFreq=1.0)
  10.290306 = idf(content: apache=126092 solr=25986)
  0.0048828125 = fieldNorm(field=content, doc=638040)
/str

 

 

 Although the second doc has exact match it is placed after the first one which 
does not have exact match.

I use the following request handler

requestHandler name=search class=solr.SearchHandler 
lst name=defaults
str name=defTypeedismax/str
str name=echoParamsexplicit/str
float name=tie0.01/float
str name=qfhost^30  content^0.5 title^1.2 anchor^1.2/str
str name=pfcontent^30/str
str name=flurl,id, site ,title/str
str name=mm2lt;-1 5lt;-2 6lt;90%/str
int name=ps1/int
bool name=hltrue/bool
str name=q.alt*:*/str
str name=hl.flcontent/str
str name=f.title.hl.fragsize0/str
str name=hl.fragsize165/str
str name=f.title.hl.alternateFieldtitle/str
str name=f.url.hl.fragsize0/str
str name=f.url.hl.alternateFieldurl/str
str name=f.content.hl.fragmenterregex/str
str name=spellchecktrue/str
str name=spellcheck.collatetrue/str
str name=spellcheck.count5/str
str name=grouptrue/str
str name=group.fieldsite/str
str name=group.ngroupstrue/str
/lst
arr name=last-components
 strspellcheck/str
/arr
/requestHandler


and the query is as follows 

http://localhost:8983/solr/select/?q=apache 
solrversion=2.2start=0rows=10indent=onqt=searchdebugQuery=true

Thanks.
Alex.


-Original Message-
From: Chris Hostetter hossman_luc...@fucit.org
To: solr-user solr-user@lucene.apache.org
Sent: Thu, Apr 12, 2012 7:43 pm
Subject: Re: term frequency outweighs exact phrase match



: I use solr 3.5 with edismax. I have the following issue with phrase 
: search. For example if I have three documents with content like
: 
: 1.apache apache
: 2. solr solr
: 

Re: term frequency outweighs exact phrase match

2012-04-12 Thread alxsss
In that case documents 1 and 2 will not be in the results. We need them also be 
shown in the results but be ranked after those docs with exact match.
I think omitting term frequency in calculating ranking in phrase queries will 
solve this issue, but I do not see that such a parameter in configs.
I see omitTermFreqAndPositions=true but not sure if it is the setting I need, 
because its description is too vague.

Thanks.
Alex.


 

 

 

-Original Message-
From: Erick Erickson erickerick...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Wed, Apr 11, 2012 8:23 am
Subject: Re: term frequency outweighs exact phrase match


Consider boosting on phrase with a SHOULD clause, something
like field:apache solr^2..

Best
Erick


On Tue, Apr 10, 2012 at 12:46 PM,  alx...@aim.com wrote:
 Hello,

 I use solr 3.5 with edismax. I have the following issue with phrase search. 
For example if I have three documents with content like

 1.apache apache
 2. solr solr
 3.apache solr

 then search for apache solr displays documents in the order 1,.2,3 instead of 
3, 2, 1 because term frequency in the first and second documents is higher than 
in the third document. We want results be displayed in the order as  3,2,1 
since 
the third document has exact match.

 My request handler is as follows.

 requestHandler name=search class=solr.SearchHandler 
 lst name=defaults
 str name=defTypeedismax/str
 str name=echoParamsexplicit/str
 float name=tie0.01/float
 str name=qfhost^30  content^0.5 title^1.2/str
 str name=pfhost^30  content^20 title^22 /str
 str name=flurl,id, site ,title/str
 str name=mm2lt;-1 5lt;-2 6lt;90%/str
 int name=ps1/int
 bool name=hltrue/bool
 str name=q.alt*:*/str
 str name=hl.flcontent/str
 str name=f.title.hl.fragsize0/str
 str name=hl.fragsize165/str
 str name=f.title.hl.alternateFieldtitle/str
 str name=f.url.hl.fragsize0/str
 str name=f.url.hl.alternateFieldurl/str
 str name=f.content.hl.fragmenterregex/str
 str name=spellchecktrue/str
 str name=spellcheck.collatetrue/str
 str name=spellcheck.count5/str
 str name=grouptrue/str
 str name=group.fieldsite/str
 str name=group.ngroupstrue/str
 /lst
 arr name=last-components
  strspellcheck/str
 /arr
 /requestHandler

 Any ideas how to fix this issue?

 Thanks in advance.
 Alex.

 


term frequency outweighs exact phrase match

2012-04-10 Thread alxsss
Hello,

I use solr 3.5 with edismax. I have the following issue with phrase search. For 
example if I have three documents with content like

1.apache apache
2. solr solr
3.apache solr

then search for apache solr displays documents in the order 1,.2,3 instead of 
3, 2, 1 because term frequency in the first and second documents is higher than 
in the third document. We want results be displayed in the order as  3,2,1 
since the third document has exact match.

My request handler is as follows.

requestHandler name=search class=solr.SearchHandler 
lst name=defaults
str name=defTypeedismax/str
str name=echoParamsexplicit/str
float name=tie0.01/float
str name=qfhost^30  content^0.5 title^1.2/str
str name=pfhost^30  content^20 title^22 /str
str name=flurl,id, site ,title/str
str name=mm2lt;-1 5lt;-2 6lt;90%/str
int name=ps1/int
bool name=hltrue/bool
str name=q.alt*:*/str
str name=hl.flcontent/str
str name=f.title.hl.fragsize0/str
str name=hl.fragsize165/str
str name=f.title.hl.alternateFieldtitle/str
str name=f.url.hl.fragsize0/str
str name=f.url.hl.alternateFieldurl/str
str name=f.content.hl.fragmenterregex/str
str name=spellchecktrue/str
str name=spellcheck.collatetrue/str
str name=spellcheck.count5/str
str name=grouptrue/str
str name=group.fieldsite/str
str name=group.ngroupstrue/str
/lst
arr name=last-components
 strspellcheck/str
/arr
/requestHandler

Any ideas how to fix this issue?

Thanks in advance.
Alex.


data/index/segments_u (No such file or directory)

2012-03-19 Thread alxsss
Hello,

I have copied solr's data folder from dev linux box to prod one. When starting 
solr I get this error in prod server. In dev solr starts sucessfully. 

Caused by: java.io.FileNotFoundException: 
/home/apache-solr-3.5.0/example/solr/data/index/segments_u (No such file or 
directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(RandomAccessFile.java:233)
at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.init(SimpleFSDirectory.java:70)
at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.init(SimpleFSDirectory.java:97)
at 
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.init(NIOFSDirectory.java:92)
at 
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:79)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:345)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:265)
at 
org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:79)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:754)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:462)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:405)
at 
org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1092)

There is no segments_u file or folder in the dev box.

Thanks in advenace.
Alex.



Re: Help with duplicate unique IDs

2012-03-02 Thread alxsss

 take a look to  updateRequestProcessorChain name=dedupe
I think you must use dedup to solve this issue

 

 

-Original Message-
From: Thomas Dowling tdowl...@ohiolink.edu
To: solr-user solr-user@lucene.apache.org
Cc: Mikhail Khludnev mkhlud...@griddynamics.com
Sent: Fri, Mar 2, 2012 1:10 pm
Subject: Re: Help with duplicate unique IDs


Thanks.  In fact, the behavior I want is overwrite=true.  I want to be 
able to reindex documents, with the same id string, and automatically 
overwrite the previous version.


Thomas


On 03/02/2012 04:01 PM, Mikhail Khludnev wrote:
 Hello Tomas,

 I guess you could just specify overwrite=false
 http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22


 On Fri, Mar 2, 2012 at 11:23 PM, Thomas Dowlingtdowl...@ohiolink.eduwrote:

 In a Solr index of journal articles, I thought I was safe reindexing
 articles because their unique ID would cause the new record in the index to
 overwrite the old one. (As stated at http://wiki.apache.org/solr/**
 SchemaXml#The_Unique_Key_Fieldhttp://wiki.apache.org/solr/SchemaXml#The_Unique_Key_Field-
  
right?)


 


Re: spellcheck configuration not providing suggestions or corrections

2012-02-13 Thread alxsss
you have put this

 str name=buildOnOptimizetrue/str

Maybe you need to put 
str name=buildOnCommittrue/str

 

 Alex.

 

-Original Message-
From: Dyer, James james.d...@ingrambook.com 
To: solr-user solr-user@lucene.apache.org
Sent: Mon, Feb 13, 2012 12:43 pm
Subject: RE: spellcheck configuration not providing suggestions or corrections


That would be it, I tbinkl.  Your request is to /select, but you've put 
spellchecking into /search.  Try /search instead.  Also, I doubt its the 
problem, but try removing the trailing CRLFs from your query.  Also, typically 
you'd still query against the main field (itemDesc in your case) and just use 
itemDescSpell from which to build your dictionary.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: geeky2 [mailto:gee...@hotmail.com] 
Sent: Monday, February 13, 2012 2:28 PM
To: solr-user@lucene.apache.org
Subject: RE: spellcheck configuration not providing suggestions or corrections

hello 

thank you for the suggestion - however this did not work.

i went in to solrconfig and change the count to 20 - then restarted the
server and then did a reimport.



is it possible that i am not firing the request handler that i think i am
firing ?


  requestHandler name=/search
class=org.apache.solr.handler.component.SearchHandler
lst name=defaults

str name=spellcheck.dictionarydefault/str

str name=spellcheck.onlyMorePopularfalse/str

str name=spellcheck.extendedResultstrue/str

str name=spellcheck.count20/str
  str name=echoParamsexplicit/str
/lst
arr name=last-components
  strspellcheck/str
/arr
  /requestHandler


query sent to server:

http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select/?q=itemDescSpell%3Agusket%0D%0Aversion=2.2start=0rows=10indent=onspellcheck=truespellcheck.build=true

results:

responselst name=responseHeaderint name=status0/intint
name=QTime0/intlst name=paramsstr name=spellchecktrue/strstr
name=indenton/strstr name=start0/strstr
name=qitemDescSpell:gusket
/strstr name=spellcheck.buildtrue/strstr name=rows10/strstr
name=version2.2/str/lst/lstresult name=response numFound=0
start=0//response

--
View this message in context: 
http://lucene.472066.n3.nabble.com/spellcheck-configuration-not-providing-suggestions-or-corrections-tp3740877p3741521.html
Sent from the Solr - User mailing list archive at Nabble.com.

 


Re: can solr automatically search for different punctuation of a word

2012-01-30 Thread alxsss

 Hi Chantal,

In the readme file at  solr/contrib/analysis-extras/README.txt it says to add 
the ICU library (in lib/)

Do I need also add dependecy... and where?

Thanks.
Alex.

 

 

-Original Message-
From: Chantal Ackermann chantal.ackerm...@btelligent.de
To: solr-user solr-user@lucene.apache.org
Sent: Fri, Jan 13, 2012 1:52 am
Subject: Re: can solr automatically search for different punctuation of a word


Hi Alex,



for me, ICUFoldingFilterFactory works very good. It does lowercasing and

removes diacritica (this is how umlauts and accenting of letters is

called - punctuation means comma, points etc.). It will work for any any

language, not only German. And it will also handle apostrophs as in

C'est bien.



ICU requires additional libraries in the classpath. For an in-built solr

solution have a look at ASCIIFoldingFilterFactory.



http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory







Example configuration:

fieldType name=text_sort class=solr.TextField

positionIncrementGap=100

analyzer

tokenizer class=solr.KeywordTokenizerFactory /

filter class=solr.ICUFoldingFilterFactory /

/analyzer

/fieldType



And dependencies (example for Maven) in addition to solr-core:

dependency

groupIdorg.apache.lucene/groupId

artifactIdlucene-icu/artifactId

version${solr.version}/version

scoperuntime/scope

/dependency

dependency

groupIdorg.apache.solr/groupId

artifactIdsolr-analysis-extras/artifactId

version${solr.version}/version

scoperuntime/scope

/dependency



Cheers,

Chantal



On Fri, 2012-01-13 at 00:09 +0100, alx...@aim.com wrote:

 Hello,

 

 I would like to know if solr has a functionality to automatically search for 
 a 

different punctuation of a word. 

 For example if I if a user searches for a word Uber, and stemmer is german 

lang, then solr looks for both Uber and  Über,  like in synonyms.

 

 Is it possible to give a file with a list of possible substitutions of 
 letters 

to solr and have it search for all possible punctuations?

 

 

 Thanks.

 Alex.




 


can solr automatically search for different punctuation of a word

2012-01-12 Thread alxsss
Hello,

I would like to know if solr has a functionality to automatically search for a 
different punctuation of a word. 
For example if I if a user searches for a word Uber, and stemmer is german 
lang, then solr looks for both Uber and  Über,  like in synonyms.

Is it possible to give a file with a list of possible substitutions of letters 
to solr and have it search for all possible punctuations?


Thanks.
Alex.


Re: How to apply relevant Stemmer to each document

2011-12-22 Thread alxsss
Hi Erick,

Why querying would be wrong? 

It is my understanding that if I have let say 3 docs and each of them has been 
indexed with its own language stemmer, then sending a query will search  all  
docs and return matching results? Let say if a query is driving and one of 
the docs has drive and was stemmed by English Stemmer, then it would return 1 
result as opposed if I had applied to all docs Russian lang stemmer and resuilt 
be 0 docs?

Am I missing something?

Thanks.
Alex.

  

 

 

 

-Original Message-
From: Erick Erickson erickerick...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Thu, Dec 22, 2011 11:06 am
Subject: Re: How to apply relevant Stemmer to each document


Not really. And it's hard to make sense of how this would work in practice
because stemming the document (even if you could) because that's only
half the battle.

How would querying work then? No matter what language you used
for your stemming, it would be wrong for all the documents that used a
different stemmer (or a stemmer based on a different language).

So I wouldn't hold out too much hope here.

Best
Erick

On Wed, Dec 21, 2011 at 4:09 PM,  alx...@aim.com wrote:
 Hello,

 I would like to know if in the latest version of solr is it possible to apply 
relevant stemmer to each doc depending on its lang field.
 I searched solr-user mailing lists and fount this thread

 http://lucene.472066.n3.nabble.com/Multiplexing-TokenFilter-for-multi-language-td3235341.html

 but not sure if it was developed into a jira ticket.

 Thanks.
 Alex.



 


Re: two word phrase search using dismax

2011-12-05 Thread alxsss
Hi Eric, 

After reading more about pf param I increased them a few times and this solved 
options 2, 3, 4 but 1. As an example,  for  phrase newspaper latimes 
latimes.com is not even in the results to boost it to the first place and 
changing mm param to   str name=mm1lt;-1 5lt;-2 6lt;90%/str solves 
only 1,4 but 2,3.

Thanks.
Alex.

 

 

 

-Original Message-
From: Erick Erickson erickerick...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Mon, Dec 5, 2011 5:52 am
Subject: Re: two word phrase search using dismax


Have you looked at the pf (phrase fields)
parameter of edismax?

http://wiki.apache.org/solr/DisMaxQParserPlugin#pf_.28Phrase_Fields.29

Best
Erick

On Sat, Dec 3, 2011 at 7:04 PM,  alx...@aim.com wrote:
 Hello,

 Here is my request handler

 requestHandler name=search class=solr.SearchHandler 
 lst name=defaults
 str name=defTypeedismax/str
 str name=echoParamsexplicit/str
 float name=tie0.01/float
 str name=qfsite^1.5 content^0.5 title^1.2/str
 str name=pfsite^1.5 content^0.5 title^1.2/str
 str name=flid,title, site/str
 str name=mm2lt;-1 5lt;-2 6lt;90%/str
 int name=ps300/int
 bool name=hltrue/bool
 str name=q.alt*:*/str
 str name=hl.flcontent/str
 str name=f.title.hl.fragsize0/str
 str name=hl.fragsize165/str
 str name=f.title.hl.alternateFieldtitle/str
 str name=f.url.hl.fragsize0/str
 str name=f.url.hl.alternateFieldurl/str
 str name=f.content.hl.fragmenterregex/str
 /lst
 /requestHandler

 I have made a few tests with debugQuery and realised that for two word 
phrases, solr takes the first word and gives it a score according to qf param 
then takes the second word and gives it score and etc, but not to the whole 
phrase. That is why if one of the words is in the title and one of them in the 
content then this doc is given higher score than the one that has both words in 
the content but none in the title.

 Ideally, I want to achieve the following order.
 1. If one (or both) of the words are in field site, then it must be given 
higher score.
 2. Then come docs with both words in the title.
 3. Next, docs with both words in the content.
 4. And finally docs having either of words in the title and content.

 I tried to change mm param to str name=mm1lt;-1 5lt;-2 6lt;90%/str
 This allows to achieve 1,4 but not 2,3

 Thanks.
 Alex.












 -Original Message-
 From: Chris Hostetter hossman_luc...@fucit.org
 To: solr-user solr-user@lucene.apache.org
 Sent: Thu, Nov 17, 2011 2:17 pm
 Subject: Re: two word phrase search using dismax




 : After putting the same score for title and content in qf filed, docs

 : with both words in content moved to fifth place. The doc in the first,

 : third and fourth places still have only one of the words in content and

 : title. The doc in the second place has one of the words in title and

 : both words in the content but in different places not together.



 details matter -- if you send futher followup mails the full details of

 your dismax options and the score explanations for debugQuery are

 neccessary to be sure people understand what you are describing (a

 snapshot of reality is far more valuable then a vague description of

 reality)



 off hand what you are describing sounds correct -- this is what the

 dismax parser is really designed to do.



 even if you have given both title and content equal boosts, your title

 field is probably shorter then your content field, so words matching once

 in title are likly to score higher then the same word matching once in

 content due to length normalization -- and unless you set the tie param

 to something really high, the score contribution from the highest scoring

 field (in this case title) will be the dominant factor in the score (it's

 disjunction *max* by default ... if you make tie=1 then it's disjunction

 *sum*)



 you haven't mentioned anything about hte pf param at all which i can

 only assume means you aren't using it -- the pf param is how you configure

 that scores should be increased if/when all of the words in teh query

 string appear together.  I would suggest putting all of the fields in your

 qf param in your pf param as well.





 -Hoss




 
 


less search results in prod

2011-12-03 Thread alxsss
Hello,

I have build solr-3.4.0 data folder in dev server and copied it to prod server. 
Made a search for a keyword, then modified qf and pf params in solrconfig.xml. 
Made search for the same keywords, then restored qf and pf params to their 
original value. Now, solr returns very less number of docs for the same 
keywords in comparison with the dev server. Tried other keywords, the issue is 
the same. Copied solrconfig.xml from dev server, but nothing changed.  Took a 
look to statistics, the numDocs and maxDoc values are the same in both servers.







Any ideas how to debug this issue?

Thanks in advance.
Alex.


Re: two word phrase search using dismax

2011-12-03 Thread alxsss
Hello,

Here is my request handler

requestHandler name=search class=solr.SearchHandler 
lst name=defaults
str name=defTypeedismax/str
str name=echoParamsexplicit/str
float name=tie0.01/float
str name=qfsite^1.5 content^0.5 title^1.2/str
str name=pfsite^1.5 content^0.5 title^1.2/str
str name=flid,title, site/str
str name=mm2lt;-1 5lt;-2 6lt;90%/str
int name=ps300/int
bool name=hltrue/bool
str name=q.alt*:*/str
str name=hl.flcontent/str
str name=f.title.hl.fragsize0/str
str name=hl.fragsize165/str
str name=f.title.hl.alternateFieldtitle/str
str name=f.url.hl.fragsize0/str
str name=f.url.hl.alternateFieldurl/str
str name=f.content.hl.fragmenterregex/str
/lst
/requestHandler

I have made a few tests with debugQuery and realised that for two word phrases, 
solr takes the first word and gives it a score according to qf param then takes 
the second word and gives it score and etc, but not to the whole phrase. That 
is why if one of the words is in the title and one of them in the content then 
this doc is given higher score than the one that has both words in the content 
but none in the title.

Ideally, I want to achieve the following order.
1. If one (or both) of the words are in field site, then it must be given 
higher score.
2. Then come docs with both words in the title.
3. Next, docs with both words in the content.
4. And finally docs having either of words in the title and content.

I tried to change mm param to str name=mm1lt;-1 5lt;-2 6lt;90%/str
This allows to achieve 1,4 but not 2,3

Thanks.
Alex.






 

 

 

-Original Message-
From: Chris Hostetter hossman_luc...@fucit.org
To: solr-user solr-user@lucene.apache.org
Sent: Thu, Nov 17, 2011 2:17 pm
Subject: Re: two word phrase search using dismax




: After putting the same score for title and content in qf filed, docs 

: with both words in content moved to fifth place. The doc in the first, 

: third and fourth places still have only one of the words in content and 

: title. The doc in the second place has one of the words in title and 

: both words in the content but in different places not together.



details matter -- if you send futher followup mails the full details of 

your dismax options and the score explanations for debugQuery are 

neccessary to be sure people understand what you are describing (a 

snapshot of reality is far more valuable then a vague description of 

reality)



off hand what you are describing sounds correct -- this is what the 

dismax parser is really designed to do.



even if you have given both title and content equal boosts, your title 

field is probably shorter then your content field, so words matching once 

in title are likly to score higher then the same word matching once in 

content due to length normalization -- and unless you set the tie param 

to something really high, the score contribution from the highest scoring 

field (in this case title) will be the dominant factor in the score (it's 

disjunction *max* by default ... if you make tie=1 then it's disjunction 

*sum*)



you haven't mentioned anything about hte pf param at all which i can 

only assume means you aren't using it -- the pf param is how you configure 

that scores should be increased if/when all of the words in teh query 

string appear together.  I would suggest putting all of the fields in your 

qf param in your pf param as well.





-Hoss


 


Re: spellcheck in dismax

2011-11-22 Thread alxsss

 It seem you forget this
str name=spellchecktrue/str


 

 

-Original Message-
From: Ruixiang Zhang rxzh...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Tue, Nov 22, 2011 11:54 am
Subject: spellcheck in dismax


I put the following into dismax requestHandler, but no suggestion field is
returned.

lst name=defaults
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.extendedResultsfalse/str
  str name=spellcheck.count1/str
/lst
arr name=last-components
  strspellcheck/str
/arr

But everything works if I put it as a separate requestHandler. Did I miss
something?

Thanks
Richard

 


jetty error, broken pipe

2011-11-19 Thread alxsss
Hello,

I use solr 3.4 with jetty that is included in it. Periodically, I see this 
error in the jetty output

SEVERE: org.mortbay.jetty.EofException
at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)
at 
org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)
at 
org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:296)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:140)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
...
...
...
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
at org.mortbay.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:368)
at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:129)
at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:161)
at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:714)
... 25 more

2011-11-19 20:50:00.060:WARN::Committed before 500 
null||org.mortbay.jetty.EofException|?at 
org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)|?at 
org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)|?at
 org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012)|?at 
sun.nio.cs.StreamEncoder.implFlush(S

I searched web and the only advice I get is to upgrade to jetty 6.1, but I 
think the version included in solr is 6.1.26.

Any advise is appreciated.


Thanks.
Alex.


Re: jetty error, broken pipe

2011-11-19 Thread alxsss
I found out that curl timeout was set to 10 and for queries taking longer than 
10 sec it was closing connection to jetty.
I noticed that when number of docs found is large solr returns results for 
about 20 sec. This is too long. I set caching to off but it did not help.
I think solr spends too much time to find total number of docs. Is there a way 
to turn off this count?

Thanks.
Alex.

 

 
-Original Message-
From: Fuad Efendi f...@efendi.ca
To: solr-user solr-user@lucene.apache.org
Cc: solr-user solr-user@lucene.apache.org
Sent: Sat, Nov 19, 2011 7:24 pm
Subject: Re: jetty error, broken pipe


It's not Jetty. It is broken TCP pipe due to client-side. It happens when 
client 
closes TCP connection.

And I even had this problem with recent Tomcat 6.


Problem disappeared after I explicitly tuned keep-alive at Tomcat, and started 
using monitoring thread with HttpClient and SOLRJ... 

Fuad Efendi
http://www.tokenizer.ca




Sent from my iPad

On 2011-11-19, at 9:14 PM, alx...@aim.com wrote:

 Hello,
 
 I use solr 3.4 with jetty that is included in it. Periodically, I see this 
error in the jetty output
 
 SEVERE: org.mortbay.jetty.EofException
at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)
at 
 org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)
at 
 org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:296)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:140)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
 ...
 ...
 ...
 Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
at org.mortbay.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:368)
at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:129)
at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:161)
at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:714)
... 25 more
 
 2011-11-19 20:50:00.060:WARN::Committed before 500 
 null||org.mortbay.jetty.EofException|?at 
org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)|?at 
org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)|?at
 
org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012)|?at 
sun.nio.cs.StreamEncoder.implFlush(S
 
 I searched web and the only advice I get is to upgrade to jetty 6.1, but I 
think the version included in solr is 6.1.26.
 
 Any advise is appreciated.
 
 
 Thanks.
 Alex.

 


Re: two word phrase search using dismax

2011-11-15 Thread alxsss
Hello,

Thanks for your letter. I investigated further and found out that we have title 
scored more than content in qf field and those docs in the first places have 
one of the words in title but not both of them.
The doc in the first place has only one of the words in the content.
Docs with both words in content are placed after them in around 20th place.

After putting the same score for title and content in qf filed,  docs with both 
words in content moved to fifth place. The doc in the first, third and fourth 
places still have only one of the words in content and title.
The doc in the second place has one of the words in title and both words in the 
content but in different places not together.

Thanks.
Alex.
 

-Original Message-
From: Michael Kuhlmann k...@solarier.de
To: solr-user solr-user@lucene.apache.org
Sent: Tue, Nov 15, 2011 12:20 am
Subject: Re: two word phrase search using dismax


Am 14.11.2011 21:50, schrieb alx...@aim.com:
 Hello,

 I use solr3.4 and nutch 1.3. In request handler we have
 str name=mm2lt;-1 5lt;-2 6lt;90%/str

 As fas as I know this means that for two word phrase search match must be 
100%.
 However, I noticed that in most cases documents with both words are ranked 
around 20 place.
 In the first places are documents with one of the words in the phrase.

 Any ideas why this happening and is it possible to fix it?

Hi,

are you sure that only one of the words matched in the found documents? 
Have you checked all fields that are listed in the qf parameter? And did 
you check for stemmed versions of your search terms?

If all this is true, you maybe want to give an example.

And AFAIK the mm parameter does not affect the ranking.


 


Re: how to achieve google.com like results for phrase queries

2011-11-07 Thread alxsss
Solr also can query link(url) text and rank them higher if we specify url in qf 
field. Only problem is that why it does not rank pages with both words higher 
when mm is set as 
1lt;-1. It seems to me that this is a bug.

Thanks.
Alex.

 
 

 

-Original Message-
From: Ted Dunning ted.dunn...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Sat, Nov 5, 2011 8:59 pm
Subject: Re: how to achieve google.com like results for phrase queries


Google achieves their results by using data not found in the web pages
themselves.  This additional data critically includes link text, but also
is derived from behavioral information.



On Sat, Nov 5, 2011 at 5:07 PM, alx...@aim.com wrote:

 Hi Erick,

 The term  newspaper latimes is not found in latimes.com. However,
 google places it in the first place. My guess is that mm parameter must
  not be set as 2lt;-1 in order to achieve google.com like ranking for
 two word phrase queries.

 My goal is to set mm parameter in such a way that latimes.com will be
 ranked in 1-3rd places and sites with both words will be placed after them.
 As I wrote in my previous letter
 setting mm as 1lt;-1 solves this issue partially. Problem in this case is
 that sites with both words are placed at the bottom or are not in the
 search results at all.

 Thanks.
 Alex.






 -Original Message-
 From: Erick Erickson erickerick...@gmail.com
 To: solr-user solr-user@lucene.apache.org
 Sent: Sat, Nov 5, 2011 9:01 am
 Subject: Re: how to achieve google.com like results for phrase queries


 First, the default query operator is ignored by edismax, so that's
 not doing anything.

 Why would you expect newspaper latimes to be found at all in
 latimes.com? What
 proof do you have that the two terms are even in the latimes.com
 document?

 You can look at the Query Elevation Component to force certain known
 documents to the top of the results based on the search terms, but that's
 not a very elegant solution.

 What business requirement are you trying to accomplish here? Because as
 asked, there's really not enough information to provide a meaningful
 suggestion.

 Best
 Erick

 On Thu, Nov 3, 2011 at 7:30 PM,  alx...@aim.com wrote:
  Hello,
 
  I use nutch-1.3 crawled results in solr-3.4. I noticed that for two word
 phrases like newspaper latimes, latimes.com is not in results at all.
  This may be due to the dismax def type that I use in  request handler
 
  str name=defTypedismax/str
  str name=qfurl^1.5 id^1.5 content^ title^1.2/str
  str name=pfurl^1.5 id^1.5 content^0.5 title^1.2/str
 
 
   with mm as
  str name=mm2lt;-1 5lt;-2 6lt;90%/str
 
  However, changing it to
  str name=mm1lt;-1 2lt;-1 5lt;-2 6lt;90%/str
 
  and q.op to OR or AND
 
  do not solve the problem. In this case latimes.com is ranked higher,
 but still
 is not in the first place.
  Also in this case results with both words are ranked very low, almost at
 the
 end.
 
  We need to be able to achieve the case when latimes.com is placed in
 the first
 place then results with both words and etc.
 
  Any ideas how to modify config to this end?
 
  Thanks in advance.
  Alex.
 
 





 


Re: how to achieve google.com like results for phrase queries

2011-11-05 Thread alxsss
Hi Erick,

The term  newspaper latimes is not found in latimes.com. However, google 
places it in the first place. My guess is that mm parameter must  not be set as 
2lt;-1 in order to achieve google.com like ranking for two word phrase queries.

My goal is to set mm parameter in such a way that latimes.com will be ranked in 
1-3rd places and sites with both words will be placed after them. As I wrote in 
my previous letter
setting mm as 1lt;-1 solves this issue partially. Problem in this case is that 
sites with both words are placed at the bottom or are not in the search results 
at all.

Thanks.
Alex.

 
 

 

-Original Message-
From: Erick Erickson erickerick...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Sat, Nov 5, 2011 9:01 am
Subject: Re: how to achieve google.com like results for phrase queries


First, the default query operator is ignored by edismax, so that's
not doing anything.

Why would you expect newspaper latimes to be found at all in
latimes.com? What
proof do you have that the two terms are even in the latimes.com document?

You can look at the Query Elevation Component to force certain known
documents to the top of the results based on the search terms, but that's
not a very elegant solution.

What business requirement are you trying to accomplish here? Because as
asked, there's really not enough information to provide a meaningful
suggestion.

Best
Erick

On Thu, Nov 3, 2011 at 7:30 PM,  alx...@aim.com wrote:
 Hello,

 I use nutch-1.3 crawled results in solr-3.4. I noticed that for two word 
phrases like newspaper latimes, latimes.com is not in results at all.
 This may be due to the dismax def type that I use in  request handler

 str name=defTypedismax/str
 str name=qfurl^1.5 id^1.5 content^ title^1.2/str
 str name=pfurl^1.5 id^1.5 content^0.5 title^1.2/str


  with mm as
 str name=mm2lt;-1 5lt;-2 6lt;90%/str

 However, changing it to
 str name=mm1lt;-1 2lt;-1 5lt;-2 6lt;90%/str

 and q.op to OR or AND

 do not solve the problem. In this case latimes.com is ranked higher, but 
 still 
is not in the first place.
 Also in this case results with both words are ranked very low, almost at the 
end.

 We need to be able to achieve the case when latimes.com is placed in the 
 first 
place then results with both words and etc.

 Any ideas how to modify config to this end?

 Thanks in advance.
 Alex.



 
 


how to achieve google.com like results for phrase queries

2011-11-03 Thread alxsss
Hello,

I use nutch-1.3 crawled results in solr-3.4. I noticed that for two word 
phrases like newspaper latimes, latimes.com is not in results at all.
This may be due to the dismax def type that I use in  request handler 

str name=defTypedismax/str
str name=qfurl^1.5 id^1.5 content^ title^1.2/str
str name=pfurl^1.5 id^1.5 content^0.5 title^1.2/str


 with mm as
str name=mm2lt;-1 5lt;-2 6lt;90%/str 

However, changing it to 
str name=mm1lt;-1 2lt;-1 5lt;-2 6lt;90%/str 

and q.op to OR or AND 

do not solve the problem. In this case latimes.com is ranked higher, but still 
is not in the first place.
Also in this case results with both words are ranked very low, almost at the 
end.

We need to be able to achieve the case when latimes.com is placed in the first 
place then results with both words and etc.

Any ideas how to modify config to this end?

Thanks in advance.
Alex.



apply filter to spell filed

2011-09-27 Thread alxsss

 

 Hello,

I have implemented spellchecker in two ways. 
1. Adding a textspell type to schema.xml and making a copy field from original 
content field, which is type text.
2. without adding new type and copy field. Simple adding name of spell field, 
content to solrconfig.xml

I have an issue in both cases. In case 1. data folder becomes twice bigger and 
it comes with additional copy field which is a exact copy of content field and 
is an unnecessary data .
In case 2 , suggestions are lower cases of search keywords, i.e. if a user 
searches for Jessica Alba, solr suggests jessica alba. 

So my question is that is it possible to resolve this issue without adding 
additional type and  copy field to the schema.xml?

Thanks.
Alex.




Re: pagination with grouping

2011-09-12 Thread alxsss
Is case #2 planned to be coded in the future releases?

Thanks.
Alex.

 

 


 

 

-Original Message-
From: Bill Bell billnb...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Thu, Sep 8, 2011 10:17 pm
Subject: Re: pagination with grouping


There are 2 use cases:

1. rows=10 means 10 groups.
2. rows=10 means to results (irregardless of groups).

I thought there was a total number of groups (ngroups) or case #1.

I don't believe case #2 has been coded.

On 9/8/11 2:22 PM, alx...@aim.com alx...@aim.com wrote:


 

 Hello,

When trying to implement pagination as in the case without grouping I see
two issues.
1. with rows=10 solr feed displays 10 groups not 10 results
2. there is no total number of results with grouping  to show the last
page.

In detail:
1. I need to display only 10 results in one page. For example if I have
group.limit=5 and the first group has 5 docs, the second 3 and the third
2 then only these 3 group must be displayed in the first page.
Currently specifying rows=10, shows 10 groups and if we have 5 docs in
each group then in the first page we will have 50 docs.

2.I need to show the last page, for which I need total number of results
with grouping. For example if I have 5 groups with number of docs 5, 4,
3,2 1 then this total number must be 15.

Any ideas how to achieve this.

Thanks in advance.
Alex.






 


pagination with grouping

2011-09-08 Thread alxsss

 

 Hello,

When trying to implement pagination as in the case without grouping I see two 
issues.
1. with rows=10 solr feed displays 10 groups not 10 results
2. there is no total number of results with grouping  to show the last page.

In detail:
1. I need to display only 10 results in one page. For example if I have 
group.limit=5 and the first group has 5 docs, the second 3 and the third 2 then 
only these 3 group must be displayed in the first page.
Currently specifying rows=10, shows 10 groups and if we have 5 docs in each 
group then in the first page we will have 50 docs.

2.I need to show the last page, for which I need total number of results with 
grouping. For example if I have 5 groups with number of docs 5, 4, 3,2 1 then 
this total number must be 15.

Any ideas how to achieve this.

Thanks in advance.
Alex.





grouping by alpha-numeric field

2011-09-07 Thread alxsss

 

 Hello,

I try to group by a field with type string. In the results I see groupValues as 
parts of the group field.

Any ideas how to fix this.

Thanks.
Alex.






spellchecking in nutch solr

2011-09-01 Thread alxsss


Hello,
I have tried to implement spellchecker based on index in nutch-solr by adding 
spell field to schema.xml and making it a copy from content field. However, 
this increased data folder size twice and spell filed as a copy of content 
field appears in xml feed which is not necessary. Is it possible to implement 
spellchecker without this issue?

Thanks.
Alex.
 


Re: how to manually add data to indexes generated by nutch-1.0 using solr

2009-05-13 Thread alxsss

 I forget to say that when I do 

curl http://localhost:8983/solr/update -H Content-Type: text/xml 
--data-binary 'commit waitFlush=false waitSearcher=false/'
?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint 
name=QTime453/int/lst
/response


and search for added keywords gives 0 results. Does status 0 mean that addition 
was successful?

Thanks.
Alex.


 


 

-Original Message-
From: Erik Hatcher e...@ehatchersolutions.com
To: solr-user@lucene.apache.org
Sent: Tue, 12 May 2009 6:48 pm
Subject: Re: how to manually add data to indexes generated by nutch-1.0 using 
solr









send a commit/ request afterwards, or you can add ?commit=true to the /update 
request with the adds.?
?

?  Erik?
?

On May 12, 2009, at 8:57 PM, alx...@aim.com wrote:?
?

?

 Tried to add a new record using?

?

?

?

 curl http://localhost:8983/solr/update -H Content-Type: text/xml -- 
 data-binary 'add?

 doc boost=2.5?

 field name=segment20090512170318/field?

 field name=digest86937aaee8e748ac3007ed8b66477624/field?

 field name=boost0.21189615/field?

 field name=urltest.com/field?

 field name=titletest test/field?

 field name=tstamp 20090513003210909/field?

 /doc /add'?

?

 I get?

?

 ?xml version=1.0 encoding=UTF-8??

 response?

 lst name=responseHeaderint name=status0/intint  
 name=QTime71/int/lst?

 /response?

?

?

 and added records are not found in the search.?

?

 Any ideas what went wrong??

?

?

 Thanks.?

 Alex.?

?

?

?

?

 -Original Message-?

 From: alx...@aim.com?

 To: solr-u...@lucene.apache.org?

 Sent: Mon, 11 May 2009 12:14 pm?

 Subject: how to manually add data to indexes generated by nutch-1.0  using 
 solr?

?

?

?

?

?

?

?

?

?

?

 Hello,?

?

 I had? Nutch -1.0 to crawl fetch and index a lot of files. Then I  needed 
 to??

?

 index a few files also. But I know keywords for those files and their??

 locations. I need to add them manually. I took a look to two  tutorials on 
 the?

 wiki, but did not find any info about this issue.?

 Is there a tutorial on, step by step procedure of adding data to?  nutch 
 index?

 using solr? manually??

?

 Thanks in advance.?

 Alex.?

?

?

?

?

?
?



 



Re: how to manually add data to indexes generated by nutch-1.0 using solr

2009-05-12 Thread alxsss

 Tried to add a new record using



 curl http://localhost:8983/solr/update -H Content-Type: text/xml 
--data-binary 'add
doc boost=2.5
field name=segment20090512170318/field
field name=digest86937aaee8e748ac3007ed8b66477624/field
field name=boost0.21189615/field
field name=urltest.com/field
field name=titletest test/field
field name=tstamp 20090513003210909/field
/doc /add'

I get

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint 
name=QTime71/int/lst
/response


and added records are not found in the search.

Any ideas what went wrong?


Thanks.
Alex.


 

-Original Message-
From: alx...@aim.com
To: solr-user@lucene.apache.org
Sent: Mon, 11 May 2009 12:14 pm
Subject: how to manually add data to indexes generated by nutch-1.0 using solr










Hello,

I had? Nutch -1.0 to crawl fetch and index a lot of files. Then I needed to?

index a few files also. But I know keywords for those files and their?
locations. I need to add them manually. I took a look to two tutorials on the 
wiki, but did not find any info about this issue.
Is there a tutorial on, step by step procedure of adding data to? nutch index 
using solr? manually?

Thanks in advance.
Alex.



 



how to manually add data to indexes generated by nutch-1.0 using solr

2009-05-11 Thread alxsss
Hello,

I had? Nutch -1.0 to crawl fetch and index a lot of files. Then I needed to?

index a few files also. But I know keywords for those files and their?
locations. I need to add them manually. I took a look to two tutorials on the 
wiki, but did not find any info about this issue.
Is there a tutorial on, step by step procedure of adding data to? nutch index 
using solr? manually?

Thanks in advance.
Alex.