Lucene 3.4.0 Merging

2011-09-29 Thread Ahson Iqbal
Hi

I have 3 solr3.4.0 indexes i want to merge them, after searching on web i found 
that there are two ways to do it as

1. Using Lucene Merge tool.
2. Merging through core admin

i am using the 1st method for this i have downloaded lucene 3.4.0 and unpack it 
and then run following command on command prompt


java -cp 
d:/lucene/lucene-core-3.4.0.jar:./contrib/misc/lucene-misc-3.4.0.jarorg/apache/lucene/misc/IndexMergeTool
 ./newindex ./app1/solr/data/index ./app2/solr/data/indexbut unfortunately it 
gives me following exception 

Exception in thread "Main Thread" java.lang.NoClassDefFoundError: 

org/apache/lucene/misc/IndexMergeTool


can any body guide me about this

Regards
Ahsan


Re: About solr distributed search

2011-09-29 Thread Jerry Li
hi

建议你自己搭个环境测试一下吧,1M这点儿数据一点儿问题没有


2011/9/30 秦鹏凯 :
> Hi all,
>
> Now I'm doing research on solr distributed search, and it
>  is said documents more than one million is reasonable to use
> distributed search.
> So I want to know, does anyone have the test
> result(Such as time cost) of using single index and distributed search
> of more than one million data? I need the test result very urgent,
> thanks in advance!
>
> Best Regards,
> Pengkai



--


About solr distributed search

2011-09-29 Thread 秦鹏凯
Hi all,

Now I'm doing research on solr distributed search, and it
 is said documents more than one million is reasonable to use 
distributed search.
So I want to know, does anyone have the test 
result(Such as time cost) of using single index and distributed search 
of more than one million data? I need the test result very urgent, 
thanks in advance!

Best Regards,
Pengkai

Re: autosuggest combination of data from documents and popular queries

2011-09-29 Thread abhayd
anyone?

How to sort for termscomponent?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/autosuggest-combination-of-data-from-documents-and-popular-queries-tp3360657p3381201.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dismax with AND/OR combination

2011-09-29 Thread yingshou guo
I don't understand what do you mean by "a translated form". The only
special symbols that dismax query parser can understand is "+-, eg
phrase, mandatory and prohibitory semantic, something like: "term1
term2" +term3 -term4. Dismax parser will take the other operators as
query string.

I guess when you switch to use edismax, you'll get some result.


2011/9/30 Jason Toy :
> Can dismax understand that query in a translated form?
>
>
> 在 Sep 29, 2011 10:01 PM 時,yingshou guo  寫到:
>
>> you cann't use this kind of query syntax against dismax query parser.
>> your query can by understood by standard query parser or edismax query
>> parser. "qt" request parameter is used by solr to select the request
>> handler plugin, not query parser.
>>
>> keep in mind that different query parser can understand different query 
>> syntax.
>>
>> On Fri, Sep 30, 2011 at 8:00 AM, abhayd  wrote:
>>> hi
>>> i m using solr from trunk 4.0
>>> Also dismax is set as default qt with
>>>
>>>text^2.5 features^1.1 displayName^15.0 mfg^4.0 description^3.0
>>> 
>>>
>>> myquery is
>>> =
>>> q=+"ab sx"+OR+(mfg:abc+OR+sx)+OR+(displayName:abc+OR+sx)&qt=dismax
>>>
>>> It is not working as per my expectation .
>>>
>>> Any way to implement this with dismax or any other options? qt=standard
>>> might work but i want different score when hit is on different fields
>>>
>>> --
>>> View this message in context: 
>>> http://lucene.472066.n3.nabble.com/dismax-with-AND-OR-combination-tp3380883p3380883.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>


Re: dismax with AND/OR combination

2011-09-29 Thread Jason Toy
Can dismax understand that query in a translated form?


在 Sep 29, 2011 10:01 PM 時,yingshou guo  寫到:

> you cann't use this kind of query syntax against dismax query parser.
> your query can by understood by standard query parser or edismax query
> parser. "qt" request parameter is used by solr to select the request
> handler plugin, not query parser.
> 
> keep in mind that different query parser can understand different query 
> syntax.
> 
> On Fri, Sep 30, 2011 at 8:00 AM, abhayd  wrote:
>> hi
>> i m using solr from trunk 4.0
>> Also dismax is set as default qt with
>>
>>text^2.5 features^1.1 displayName^15.0 mfg^4.0 description^3.0
>> 
>> 
>> myquery is
>> =
>> q=+"ab sx"+OR+(mfg:abc+OR+sx)+OR+(displayName:abc+OR+sx)&qt=dismax
>> 
>> It is not working as per my expectation .
>> 
>> Any way to implement this with dismax or any other options? qt=standard
>> might work but i want different score when hit is on different fields
>> 
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/dismax-with-AND-OR-combination-tp3380883p3380883.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 


Re: dismax with AND/OR combination

2011-09-29 Thread yingshou guo
you cann't use this kind of query syntax against dismax query parser.
your query can by understood by standard query parser or edismax query
parser. "qt" request parameter is used by solr to select the request
handler plugin, not query parser.

keep in mind that different query parser can understand different query syntax.

On Fri, Sep 30, 2011 at 8:00 AM, abhayd  wrote:
> hi
> i m using solr from trunk 4.0
> Also dismax is set as default qt with
>    
>        text^2.5 features^1.1 displayName^15.0 mfg^4.0 description^3.0
>     
>
> myquery is
> =
> q=+"ab sx"+OR+(mfg:abc+OR+sx)+OR+(displayName:abc+OR+sx)&qt=dismax
>
> It is not working as per my expectation .
>
> Any way to implement this with dismax or any other options? qt=standard
> might work but i want different score when hit is on different fields
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/dismax-with-AND-OR-combination-tp3380883p3380883.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


split index horizontally

2011-09-29 Thread Robert Yu
Is there a efficient way to handle my case?

Each document has several group fields, some of them are updated
frequently, some of them are updated infrequently. Is it possible to
maintain index based on groups but can search over all of them as ONE
index?

 

To some extent, it is a three layer of document (I think the current is
two layer):

document = {key: groups},...

groups = {group-name: fields},...

fields = {field-name: field-value},...

 

we can maintain index for each group, and can search it like below:

   query: group-name-1:field-1:val-1 AND (
group-name-2:field-2:val-2 OR group-name-3:field-3:[min-3 TO max-3])

   return data:
group-name-1:field-1,field-2;groupd-name-2:field-3,field-4,...

 

Thanks,

Robert Yu

 



Re: dismax with AND/OR combination

2011-09-29 Thread Erick Erickson
Well, you have to tell us what you expected and what
you're seeing. Including the output with &debugQuery=on
and telling us what you disagree with would be the best
way.

You might also include your definition from your solrconfig
file. You included a fragment of it, but other parts may
have bearing.

Best
Erick

On Thu, Sep 29, 2011 at 8:00 PM, abhayd  wrote:
> hi
> i m using solr from trunk 4.0
> Also dismax is set as default qt with
>    
>        text^2.5 features^1.1 displayName^15.0 mfg^4.0 description^3.0
>     
>
> myquery is
> =
> q=+"ab sx"+OR+(mfg:abc+OR+sx)+OR+(displayName:abc+OR+sx)&qt=dismax
>
> It is not working as per my expectation .
>
> Any way to implement this with dismax or any other options? qt=standard
> might work but i want different score when hit is on different fields
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/dismax-with-AND-OR-combination-tp3380883p3380883.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


dismax with AND/OR combination

2011-09-29 Thread abhayd
hi 
i m using solr from trunk 4.0 
Also dismax is set as default qt with

text^2.5 features^1.1 displayName^15.0 mfg^4.0 description^3.0
 

myquery is
=
q=+"ab sx"+OR+(mfg:abc+OR+sx)+OR+(displayName:abc+OR+sx)&qt=dismax

It is not working as per my expectation . 

Any way to implement this with dismax or any other options? qt=standard
might work but i want different score when hit is on different fields

--
View this message in context: 
http://lucene.472066.n3.nabble.com/dismax-with-AND-OR-combination-tp3380883p3380883.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Getting facet counts for 10,000 most relevant hits

2011-09-29 Thread Lan
I implemented a similar feature for a categorization suggestion service. I
did the faceting in the client code, which is not exactly the best
performing but it worked very well.

It would be nice to have the Solr server do the faceting for performance.


Burton-West, Tom wrote:
> 
> If relevance ranking is working well, in theory it doesn't matter how many
> hits you get as long as the best results show up in the first page of
> results.  However, the default in choosing which facet values to show is
> to show the facets with the highest count in the entire result set.  Is
> there a way to issue some kind of a filter query or facet query that would
> show only the facet counts for the 10,000 most relevant search results?
> 
> As an example, if you search in our full-text collection for "jaguar" you
> get 170,000 hits.  If I am looking for the car rather than the OS or the
> animal, I might expect to be able to click on a facet and limit my results
> to the car.  However, facets containing the word car or automobile are not
> in the top 5 facets that we show.  If you click on "more"  you will see
> "automobile periodicals" but not the rest of the facets containing the
> word automobile .  This occurs because the facet counts are for all
> 170,000 hits.  The facet counts  for at least 160,000 irrelevant hits are
> included (assuming only the top 10,000 hits are relevant) .
> 
> What we would like to do is get the facet counts for the N most relevant
> documents and select the 5 or 30 facet values with the highest counts for
> those relevant documents.
> 
> Is this possible or would it require writing some lucene or Solr code?
> 
> Tom Burton-West
> http://www.hathitrust.org/blogs/large-scale-search
> 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-facet-counts-for-10-000-most-relevant-hits-tp3363459p3380852.html
Sent from the Solr - User mailing list archive at Nabble.com.


removing dynamic fields

2011-09-29 Thread zarni aung
Hi,

I've been experimenting with Solr dynamic fields.  Here is what I've
gathered based on my research.

For instance, I have a setup where I am catching undefined custom fields
this way.  I am using (trie) types by the way.








I am dealing with documents that may have various number of custom fields.
Instead of having to deal with field type changes on Solr, I decided to go
with dynamic fields.  But what I realized is that over a period of time, I
could have int1, int2, int3 fields and they could be deleted in the database
and the Solr document has been deleted and re-added without the field values
for int1, int2 and int3.  I used the schema browser to inspect the fields
int1, int2 and int3, there's no docs associated with it but the field
definition remains.  I've tried unloading, reloading the cores and also
restarting the server but that doesn't remove the fields.  It only removes
when I clear everything out of the index using "*:*" query.

What type of penalty do I have to pay for having numerous unused fields?
Especially using (trie) fields.  I'm using it so that it'll work well with
range queries.

Thanks,

Zarni


Re: Indexing geohash in solrj - Multivalued spatial search

2011-09-29 Thread Smiley, David W.

On Sep 29, 2011, at 5:10 PM, Alessandro Benedetti wrote:

> Sorry David, probably I misunderstood your reply, what do you mean?
> 
> I'm using Lucid Work Enterprise 1.8, and, as I know , it includes geohashes
> patch.

Solr 3x, trunk, and I suspect Lucid Works Enterprise 2.0 (doubtful 1.8)) 
supports this:

That is not "the patch" -- SOLR-2155, that is the standard built-in geo support 
in Solr 3x.

The SOLR-2155 patch is a superset (i.e. extends) that field, adding more 
capability.  Are you telling me LWE includes SOLR-2155? Can you point to proof 
of that please?  If it did, then you wouldn't be getting that FieldCache error 
due to multi-value since SOLR-2155 definitely supports multi-value.

You may have luck putting my SOLR-2155 compiled plugin jar in LWE 1.8 according 
to the instructions I provide; I don't know.

> I have to index a multivalued location field and I have to make location
> queries on it!
> So I figured to use the geohash type ...
> Any hint about indexing and searching on a multivalued geohash field?
> 
> 
> 2011/9/29 Smiley, David W. 
> 
>> Hi Alessandro.
>> 
>> I can't think of any good reason anyone would use the geohash field type
>> that is a part of Solr today. If you are shocked I would say that, keep in
>> mind the work I've done with geohashes is an extension of what's in Solr,
>> it's not what's in Solr today. Recently I ported SOLR-2155 to Solr 3x, and
>> in a way that does NOT require that you patch Solr. I attached it to the
>> issue just now.
>> 
>> ~ David Smiley
>> 
>> On Sep 29, 2011, at 9:37 AM, Alessandro Benedetti wrote:
>> 
>>> Hi all,
>>> I have already read the topics in the mailing list that are regarding
>>> spatial search, but I haven't found an answer ...
>>> I have to index a multivalued field of type : "geohash" via solrj.
>>> Now I build a string with the lat and lon comma separated ( like
>>> 54.569468,67.58494 ) and index it in the geohash field.
>>> The indexing process seems to work , but when I try a spatial search,
>> Solr
>>> returns me an exception with fieldcache on multivalued field.
>>> 
>>> I'm using Lucid Work enterprise 1.8 , and , as I know , it should be
>>> integrated with this patch (
>>> 
>> https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994256#comment-12994256
>>> ).
>>> 
>>> Am I indexing wrong? Am I missing something?
>>> The type of my spatial field is geohash ...
>>> 
>>> Cheers
>>> 
>>> --
>>> ---
>>> Alessandro Benedetti
>>> 
>>> Sourcesense - making sense of Open Source: http://www.sourcesense.com
>> 
>> 
> 
> 
> -- 
> --
> 
> Benedetti Alessandro
> Personal Page: http://tigerbolt.altervista.org
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England



Re: Indexing geohash in solrj - Multivalued spatial search

2011-09-29 Thread Alessandro Benedetti
Sorry David, probably I misunderstood your reply, what do you mean?

I'm using Lucid Work Enterprise 1.8, and, as I know , it includes geohashes
patch.
I have to index a multivalued location field and I have to make location
queries on it!
So I figured to use the geohash type ...
Any hint about indexing and searching on a multivalued geohash field?


2011/9/29 Smiley, David W. 

> Hi Alessandro.
>
> I can't think of any good reason anyone would use the geohash field type
> that is a part of Solr today. If you are shocked I would say that, keep in
> mind the work I've done with geohashes is an extension of what's in Solr,
> it's not what's in Solr today. Recently I ported SOLR-2155 to Solr 3x, and
> in a way that does NOT require that you patch Solr. I attached it to the
> issue just now.
>
> ~ David Smiley
>
> On Sep 29, 2011, at 9:37 AM, Alessandro Benedetti wrote:
>
> > Hi all,
> > I have already read the topics in the mailing list that are regarding
> > spatial search, but I haven't found an answer ...
> > I have to index a multivalued field of type : "geohash" via solrj.
> > Now I build a string with the lat and lon comma separated ( like
> > 54.569468,67.58494 ) and index it in the geohash field.
> > The indexing process seems to work , but when I try a spatial search,
> Solr
> > returns me an exception with fieldcache on multivalued field.
> >
> > I'm using Lucid Work enterprise 1.8 , and , as I know , it should be
> > integrated with this patch (
> >
> https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994256#comment-12994256
> > ).
> >
> > Am I indexing wrong? Am I missing something?
> > The type of my spatial field is geohash ...
> >
> > Cheers
> >
> > --
> > ---
> > Alessandro Benedetti
> >
> > Sourcesense - making sense of Open Source: http://www.sourcesense.com
>
>


-- 
--

Benedetti Alessandro
Personal Page: http://tigerbolt.altervista.org

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Solr integration with Hbase

2011-09-29 Thread Haspadar
http://www.lilyproject.org

2011/9/29 

> Try lilyproject.com I think they do exactly what you are asking for.
>
> Sent from my iPhone
>
> On Sep 29, 2011, at 6:27 AM, Stuti Awasthi  wrote:
>
> > Hi all,
> >
> > I am newbee in Solr. I have my application on Hbase and Hadoop and I want
> to provide search functionality using Solr. I read
> http://wiki.apache.org/solr/DataImportHandler and got to know that there
> is support for SQL database.
> > My question is :
> > Is Solr is also good for NoSQL like database, especially Hbase?
> > How can we integrate Hbase and Solr. An pointers will be helpful.
> > Can we integrate Solr with Hadoop also as my documents will be in HDFS.
> >
> > Please Suggest
> >
> > Regards,
> > Stuti Awasthi
> >
> >
> > 
> > ::DISCLAIMER::
> >
> ---
> >
> > The contents of this e-mail and any attachment(s) are confidential and
> intended for the named recipient(s) only.
> > It shall not attach any liability on the originator or HCL or its
> affiliates. Any views or opinions presented in
> > this email are solely those of the author and may not necessarily reflect
> the opinions of HCL or its affiliates.
> > Any form of reproduction, dissemination, copying, disclosure,
> modification, distribution and / or publication of
> > this message without the prior written consent of the author of this
> e-mail is strictly prohibited. If you have
> > received this email in error please delete it and notify the sender
> immediately. Before opening any mail and
> > attachments please check them for viruses and defect.
> >
> >
> ---
>


Re: Solr integration with Hbase

2011-09-29 Thread pulkitsinghal
Try lilyproject.com I think they do exactly what you are asking for.

Sent from my iPhone

On Sep 29, 2011, at 6:27 AM, Stuti Awasthi  wrote:

> Hi all,
> 
> I am newbee in Solr. I have my application on Hbase and Hadoop and I want to 
> provide search functionality using Solr. I read 
> http://wiki.apache.org/solr/DataImportHandler and got to know that there is 
> support for SQL database.
> My question is :
> Is Solr is also good for NoSQL like database, especially Hbase?
> How can we integrate Hbase and Solr. An pointers will be helpful.
> Can we integrate Solr with Hadoop also as my documents will be in HDFS.
> 
> Please Suggest
> 
> Regards,
> Stuti Awasthi
> 
> 
> 
> ::DISCLAIMER::
> ---
> 
> The contents of this e-mail and any attachment(s) are confidential and 
> intended for the named recipient(s) only.
> It shall not attach any liability on the originator or HCL or its affiliates. 
> Any views or opinions presented in
> this email are solely those of the author and may not necessarily reflect the 
> opinions of HCL or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure, modification, 
> distribution and / or publication of
> this message without the prior written consent of the author of this e-mail 
> is strictly prohibited. If you have
> received this email in error please delete it and notify the sender 
> immediately. Before opening any mail and
> attachments please check them for viruses and defect.
> 
> ---


Automate startup/shutdown of SolrCloud Shards

2011-09-29 Thread Jamie Johnson
I am trying to automate the startup/shutdown of SolrCloud shards and
have noticed that there is a bit of a timing issue where if the server
which is to bootstrap ZK with the configs does not complete it's
process (i.e. there is no data at the Conf yet) the other servers will
fail to start.  An obvious solution is to just start the solr instance
responsible for bootstraping first, is there some way that folks are
handling this now?


Re: About solr distributed search

2011-09-29 Thread Gregor Kaczor

Hi Pengkai,

my experience is based on http://www.findfiles.net/ which holds >700 Mio 
documents, each about 2kb size.


A single Index containing that kind of data should hold below 80 Mio 
documents. In case you have complex queries with lots of facets, 
sorting, function queries then even 50 Mio documents per index could be 
your upper limit. On very fast Hardware and warmed index you might 
deliver results on average within 1 second.


For documents above 5kb in size those numbers might not necessarily be 
the same.


Try to test your documents by creating (NOT COPYING) and index them in 
vast numbers. After every 10 Mio documents test the average response 
time with caches switched off. If the average response time hits your 
threshold, then the number of documents in index is your limit per index.


Scaling up is no problem. AFAIK 20 to 50 indexes should be fine within a 
distributed productive system.


Kind Regards
Gregor

On 09/29/2011 12:14 PM, Pengkai Qin wrote:

Hi all,

Now I'm doing research on solr distributed search, and it is said 
documents more than one million is reasonable to use distributed search.
So I want to know, does anyone have the test result(Such as time cost) 
of using single index and distributed search of more than one million 
data? I need the test result very urgent, thanks in advance!


Best Regards,
Pengkai







--
How to find files on the Internet? FindFiles.net !


Solr integration with Hbase

2011-09-29 Thread Stuti Awasthi
Hi all,

I am newbee in Solr. I have my application on Hbase and Hadoop and I want to 
provide search functionality using Solr. I read 
http://wiki.apache.org/solr/DataImportHandler and got to know that there is 
support for SQL database.
My question is :
Is Solr is also good for NoSQL like database, especially Hbase?
How can we integrate Hbase and Solr. An pointers will be helpful.
Can we integrate Solr with Hadoop also as my documents will be in HDFS.

Please Suggest

Regards,
Stuti Awasthi



::DISCLAIMER::
---

The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only.
It shall not attach any liability on the originator or HCL or its affiliates. 
Any views or opinions presented in
this email are solely those of the author and may not necessarily reflect the 
opinions of HCL or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification, 
distribution and / or publication of
this message without the prior written consent of the author of this e-mail is 
strictly prohibited. If you have
received this email in error please delete it and notify the sender 
immediately. Before opening any mail and
attachments please check them for viruses and defect.

---


Re: Trouble configuring multicore / accessing admin page

2011-09-29 Thread Joshua Miller
On Sep 28, 2011, at 2:16 PM, Joshua Miller wrote:

> On Sep 28, 2011, at 2:11 PM, Jaeger, Jay - DOT wrote:
> 
>>  cores adminPath="/admij/cores"
>> 
>> Was that a cut and paste?  If so, the /admij/cores is presumably incorrect, 
>> and ought to be /admin/cores
>> 
> 
> No, that was a typo -- the config file is correct with admin/cores.  Thanks 
> for pointing out the mistake here.
> 

I was finally able to figure out the issue here.  It was the 
solr.QueryElevationComponent exception in the logs.  Once I commented out the 
related section in each core conf/solrconfig.xml file and restarted tomcat, I 
could then see the admin page link for each core and click through to manage 
them.

Thanks!


Josh Miller
Open Source Solutions Architect
(425) 737-2590
http://itsecureadmin.com/




Re: Query with plus sign failing

2011-09-29 Thread Erik Hatcher
Just a fact of life with the Lucene query parser.  You'll need to escape the + 
with a backslash for this to work.

Erik

On Sep 29, 2011, at 12:31 , Shawn Heisey wrote:

> The following query is failing:
> 
> ((Google +))
> 
> This is ultimately reduced to 'google' by my analysis chain, but the 
> following is in my log (3.2.0, but 3.4.0 also fails):
> 
> SEVERE: org.apache.solr.common.SolrException: 
> org.apache.lucene.queryParser.ParseException: Cannot parse '(  (Google +))': 
> Encountered " ")" ") "" at line 1, column 12.
> 
> If I change it to 'Google+' or 'Goo+gle' it works.
> 
> Below is the fieldType definition.  The pattern filter is designed to strip 
> leading/trailing punctuation characters, but leave any punctuation in the 
> middle of a term alone.  It does affect the plus sign, by reducing it to a 
> term of length zero.  The length filter then removes it at the end.  In the 
> 'Google+' variant, the pattern filter simply strips that character off and 
> the query does not fail.  Am I seeing a bug here, or problems with my 
> fieldType?
> 
>  positionIncrementGap="100">
> 
> 
>   pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$"
>  replacement="$2"
>  allowempty="false"
>/>
>   splitOnCaseChange="1"
>  splitOnNumerics="1"
>  stemEnglishPossessive="1"
>  generateWordParts="1"
>  generateNumberParts="1"
>  catenateWords="1"
>  catenateNumbers="1"
>  catenateAll="0"
>  preserveOriginal="1"
>/>
> 
> 
> 
> 
> 
>   pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$"
>  replacement="$2"
>  allowempty="false"
>/>
>   splitOnCaseChange="1"
>  splitOnNumerics="1"
>  stemEnglishPossessive="1"
>  generateWordParts="1"
>  generateNumberParts="1"
>  catenateWords="0"
>  catenateNumbers="0"
>  catenateAll="0"
>  preserveOriginal="1"
>/>
> 
> 
> 
> 
> 



PDF indexing

2011-09-29 Thread Jón Helgi Jónsson
Good day,

I'm checking if Solr would work for indexing PDFs. My requirements are:

1) I must know which page has what contents.
2) Left to right search support. Such as Hebrew. This has been the most
trickiest to achieve.

I also prefer to know the position of the searched contents on the page but
could live without.

Any info or ideas would be greatly appreciated.

Thank you,
Jon


Query with plus sign failing

2011-09-29 Thread Shawn Heisey

The following query is failing:

((Google +))

This is ultimately reduced to 'google' by my analysis chain, but the 
following is in my log (3.2.0, but 3.4.0 also fails):


SEVERE: org.apache.solr.common.SolrException: 
org.apache.lucene.queryParser.ParseException: Cannot parse '(  (Google 
+))': Encountered " ")" ") "" at line 1, column 12.


If I change it to 'Google+' or 'Goo+gle' it works.

Below is the fieldType definition.  The pattern filter is designed to 
strip leading/trailing punctuation characters, but leave any punctuation 
in the middle of a term alone.  It does affect the plus sign, by 
reducing it to a term of length zero.  The length filter then removes it 
at the end.  In the 'Google+' variant, the pattern filter simply strips 
that character off and the query does not fail.  Am I seeing a bug here, 
or problems with my fieldType?


positionIncrementGap="100">



















Re: Errors in requesthandler statistics

2011-09-29 Thread Shawn Heisey

On 9/29/2011 7:42 AM, roySolr wrote:

I have some logging by jetty. Every request looks like this:


   2011-09-29T12:28:47
   1317292127479
   18470
   org.apache.solr.core.SolrCore
   INFO
   org.apache.solr.core.SolrCore
   execute
   20
   [] webapp=/solr path=/select/
params={spellcheck=true&facet=true&sort=geodist()+asc&sfield=coord&spellcheck.q=test&facet.limit=20&version=2.2&fl=id,what,where}
hits=0 status=0 QTime=12


How can i see which  gives an error? The file has stored 94000
requests


Looks like your Solr installation is probably using 
java.util.logging.XMLFormatter.  I'm using java.util.logging.FileHandler 
and below is what I get in my log for one of my request errors.  This 
comes from the search application sending "tag_id:" (without the quotes) 
to Solr.  This is an invalid query.  I don't know what happens with XML 
formatting, but it's probably similar.  Try searching all your logfiles 
for "SEVERE" or "error" strings.


Sep 24, 2011 2:56:48 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: 
org.apache.lucene.queryParser.ParseException: Cannot parse 'tag_id:': 
Encountered "" at line 1, column 7.

Was expecting one of:
"(" ...
"*" ...
 ...
 ...
 ...
 ...
"[" ...
"{" ...
 ...

at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:108)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:208)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at 
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)

at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 
'tag_id:': Encountered "" at line 1, column 7.

Was expecting one of:
"(" ...
"*" ...
 ...
 ...
 ...
 ...
"[" ...
"{" ...
 ...

at 
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:211)
at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80)

at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:84)

... 22 more
Caused by: org.apache.lucene.queryParser.ParseException: Encountered 
"" at line 1, column 7.

Was expecting one of:
"(" ...
"*" ...
 ...
 ...
 ...
 ...
"[" ...
"{" ...
 ...

at 
org.apache.lucene.queryParser.QueryParser.generateParseException(QueryParser.java:1818)
at 
org.apache.lucene.queryParser.QueryParser.jj_consume_token(QueryParser.java:1700)
at 
org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1327)
at 
org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1237)
at 
org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1226)
at 
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206)

... 25 more



Re: Distributed search has problems with some field names

2011-09-29 Thread Luis Neves


Hi,

On 09/29/2011 03:10 PM, Erick Erickson wrote:

I know I've seen other anomalies with odd characters
in field names. In general, it's much safer to use
only letters, numbers, and underscores. In fact, I
even prefer lowercase letters. Since you're pretty
sure those work, why not just use them?


Yes, that's what I ended up doing, but it involved a reindex. I was 
trying to avoid that.


Thanks!

--
Luis Neves



Re: basic solr cloud questions

2011-09-29 Thread Sami Siren
2011/9/29 Yury Kats :
> True, but there is a big gap between goals and current state.
> Right now, there is distributed search, but not distributed indexing
> or auto-sharding, or auto-replication. So if you want to use the SolrCloud
> now (as many of us do), you need do a number of things yourself,
> even if they might be done by SolrCloud automatically in the future.

There is a patch in Jira: https://issues.apache.org/jira/browse/SOLR-2355
that adds a update processor suitable for doing simple distributed
indexing with current version of Solr.

--
 Sami Siren


Re: basic solr cloud questions

2011-09-29 Thread Darren Govoni

Agree. Thanks also for clarifying. It helps.

On 09/29/2011 08:50 AM, Yury Kats wrote:

On 9/29/2011 7:22 AM, Darren Govoni wrote:

That was kinda my point. The "new" cloud implementation
is not about replication, nor should it be. But rather about
horizontal scalability where "nodes" manage different parts
of a unified index.

It;s about many things. You stated one, but there are goals,
one of them being tolerance to node outages. In a cloud, when
one of your many nodes fail, you don't want to stop querying and
indexing. For this to happen, you need to maintain redundant copies
of the same pieces of the index, hence you need to replicate.


One of the design goals of the "new" cloud
implementation is for this to happen more or less automatically.

True, but there is a big gap between goals and current state.
Right now, there is distributed search, but not distributed indexing
or auto-sharding, or auto-replication. So if you want to use the SolrCloud
now (as many of us do), you need do a number of things yourself,
even if they might be done by SolrCloud automatically in the future.


To me that means one does not have to manually distributed
documents or enforce replication as Yurly suggests.
Replication is different to me than what was being asked.
And perhaps I misunderstood the original question.

Yurly's response introduced the term "core" where the original
person was referring to "nodes". For all I know, those are two
different things in the new cloud design terminology (I believe they are).

I guess understanding "cores" vs. "nodes" vs "shards" is helpful. :)

Shard is a slice of index. Index is managed/stored in a core.
Nodes are Solr instances, usually physical machines.

Each node can host multiple shards, and each shard can consist of multiple 
cores.
However, all cores within the same shard must have the same content.

This is where the OP ran into the problem. The OP had 1 shard, consisting of two
cores on two nodes. Since there is no distributed indexing yet, all documents 
were
indexed into a single core. However, there is distributed search, therefore 
queries
were sent randomly to different cores of the same shard. Since one core in the 
shard
had documents and the other didn't, the query result was random.

To solve this problem, the OP must make sure all cores within the same shard 
(be they
on the same node or not) have the same content. This can currently be achieved 
by:
a) setting up replication between cores. you index into one core and the other 
core
replicates the content
b) indexing into both cores

Hope this clarifies.




Re: Indexing geohash in solrj - Multivalued spatial search

2011-09-29 Thread Smiley, David W.
Hi Alessandro.

I can't think of any good reason anyone would use the geohash field type that 
is a part of Solr today. If you are shocked I would say that, keep in mind the 
work I've done with geohashes is an extension of what's in Solr, it's not 
what's in Solr today. Recently I ported SOLR-2155 to Solr 3x, and in a way that 
does NOT require that you patch Solr. I attached it to the issue just now.

~ David Smiley

On Sep 29, 2011, at 9:37 AM, Alessandro Benedetti wrote:

> Hi all,
> I have already read the topics in the mailing list that are regarding
> spatial search, but I haven't found an answer ...
> I have to index a multivalued field of type : "geohash" via solrj.
> Now I build a string with the lat and lon comma separated ( like
> 54.569468,67.58494 ) and index it in the geohash field.
> The indexing process seems to work , but when I try a spatial search, Solr
> returns me an exception with fieldcache on multivalued field.
> 
> I'm using Lucid Work enterprise 1.8 , and , as I know , it should be
> integrated with this patch (
> https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994256#comment-12994256
> ).
> 
> Am I indexing wrong? Am I missing something?
> The type of my spatial field is geohash ...
> 
> Cheers
> 
> -- 
> ---
> Alessandro Benedetti
> 
> Sourcesense - making sense of Open Source: http://www.sourcesense.com



Re: synonym filtering at index time

2011-09-29 Thread Erick Erickson
Biggest red flag is "KeywordTokenizerFactory". You don't
say whether your input is multi-word or not, but
that tokenizer does NOT break up input, so
even the input "my watche" would not trigger a
synonym substitution. Try something like
WhitespaceTokenizer.

Second red flag. Changing your analysis chain
so radically between index and query is pretty
much guaranteed to mess you up. In your example,
all your input is reduced to "watch". But by not
having anything in your query side, "watche" will
not be matched. Either set expand="true" or
reduce the query-time matches as well.

Third red flag: Not having your ngram
stuff in place in the query is probably going
to keep your searches from matching as
you expect.

Anyway, hope that helps. If this isn't relevant, you
might post examples what actually doesn't work

Best
Erick

On Wed, Sep 28, 2011 at 12:12 PM, Doug McKenzie
 wrote:
> Trying to add in synonyms at index time but it's not working as expected.
> Here's the schema and example from synonyms.txt
>
> synonyms.txt has :
> watch, watches, watche, watchs
>
> schema for the field :
>  positionIncrementGap="100">
> 
> 
> 
>  words="stopwords_en.txt" enablePositionIncrement="true"/>
>  ignoreCase="true" expand="false"/>
>  side="front"/>
> 
> 
> 
> 
> 
> 
>
> When I run analysis, the index query correctly shows watche => watch, which
> is then EdgeNGrammed
>
> My understanding of how this is meant to work is that solr will index all
> instances of 'watche' as 'watch' when expand=false
>
> This doesn't seem to be happening though. Any ideas on what I'm missing?
>
> I initially set the synonym filtering to run at query time as its user input
> however that was returning the same results so I thought it might be because
> those terms were already in the index and would therefore show up in the
> results
>
> Thanks
> Doug
>
>
> --
> Become a Firebox Fan on Facebook: http://facebook.com/firebox
> And Follow us on Twitter: http://twitter.com/firebox
>
> Firebox has been nominated for Retailer of the Year in the 2011 Stuff
> Awards. Who will win? It's up to you! Visit http://www.stuff.tv/awards and
> place your vote. We'll do a special dance if it's us.
>
> Firebox HQ is MOVING HOUSE! We're migrating from Streatham Hill to  shiny
> new digs in Shoreditch. As of 3rd October please update your records to:
> Firebox.com, 6.10 The Tea Building, 56 Shoreditch High Street, London, E1
> 6JJ
>
> Global Head Office: Firebox House, Ardwell Road, London SW2 4RT
> Firebox.com Ltd is registered in England and Wales, company number 3874477
> Registered Company Address: 41 Welbeck Street London W1G 8EA Firebox.com
>
> Any views expressed in this email are those of the individual sender, except
> where the sender expressly, and with authority, states them to be the views
> of Firebox.com Ltd.
>


Re: Solr on OC4J

2011-09-29 Thread Raja Ghulam Rasool
Just to explain a bit more,

OC4J standalone version is 10.1.3.5.0 and Solr version is 3.4.0.

Any help will  be greatly appreciated guys :)

On Thu, Sep 29, 2011 at 6:15 PM, Raja Ghulam Rasool wrote:

>  Hi all,
>
> I have installed solr on oc4j. but when i try to access the admin page it
> throws a 'StackOverflowError'
>
> Sep 28, 2011 3:35:25 PM org.apache.solr.common.SolrException log
> SEVERE: java.lang.StackOverflowError
> is there something i am doing wrong ? any tweak or config that i need to
> change ? please help me regarding this. anyone ?
> --
> Regards,
> Raja
>
>
>
>
>



-- 
Regards,
Ghulam Rasool.

Blog: http://ghulamrasool.blogspot.com
Mobile: +971506141872


Solr on OC4J

2011-09-29 Thread Raja Ghulam Rasool
Hi all,

I have installed solr on oc4j. but when i try to access the admin page it
throws a 'StackOverflowError'

Sep 28, 2011 3:35:25 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.StackOverflowError
is there something i am doing wrong ? any tweak or config that i need to
change ? please help me regarding this. anyone ?
-- 
Regards,
Raja


Re: Distributed search has problems with some field names

2011-09-29 Thread Erick Erickson
I know I've seen other anomalies with odd characters
in field names. In general, it's much safer to use
only letters, numbers, and underscores. In fact, I
even prefer lowercase letters. Since you're pretty
sure those work, why not just use them?

Best
Erick

On Wed, Sep 28, 2011 at 6:59 AM, Luis Neves  wrote:
>
>
> Hello all,
>
> I'm experimenting with the "Distributed Search" bits in the nightly builds
> and I'm facing a problem.
>
> I have on my schema.xml some dynamic fields defined like this:
>
> 
>  multiValued="true" />
> 
>
>
> When hitting a single shard the following query works fine:
>
> http:///select?q=*:*&fl=ts,$distinct_boxes
>
> But when I add the "&distrib=true" parameter I get a NullPointerException:
>
>
> java.lang.NullPointerException
> at
> org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:1025)
> at
> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:725)
> at
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:700)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:292)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1451)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
> at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:326)
> at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> at
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> at
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
> at
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
>
>
>
> The "$" in "$distinct_boxes" appears to be the culprit somehow, the query:
>
> /select?q=*:*&fl=ts,distinct_boxes&distrib=true>
>
> works without errors, but of course doesn't retrieve the field I want.
>
> Funnily enough when requesting the uniqueKey field there are no errors:
>
> /select?q=*:*&fl=tid,ts,$distinct_boxes&distrib=true>
>
> But somehow the data from the field "$distinct_boxes" doesn't appear in the
> output.
>
> Is there some workaround? Using "fl=*" returns all the data from the fields
> that start with "$" but it severely increases the size of the response.
>
>
> --
> Luis Neves
>
>
>
>


Re: Upgrading from 3.1 to 3.4

2011-09-29 Thread Erick Erickson
They should be outlined in
CHANGES.txt if there are any. But usually
changes to minor versions don't require any
special steps...

Best
Erick

On Wed, Sep 28, 2011 at 4:14 AM, Rohit  wrote:
> I have been using solr 3.1 am planning to update to solr 3.4, whats the
> steps to be followed or anything that needs to be take care of specifically
> for the upgrade?
>
>
>
> Regards,
>
> Rohit
>
>


RE: Errors in requesthandler statistics

2011-09-29 Thread Jaeger, Jay - DOT
If you are asking how to tell which of 94000 records failed in a SINGLE HTTP 
update request, I have no idea, but I suspect that you cannot necessarily tell.

It might help if you copied and pasted what you find in the solr log for the 
failure (see my previous response for how to figure out where that might be -- 
look for a file solr0.log).

-Original Message-
From: roySolr [mailto:royrutten1...@gmail.com] 
Sent: Thursday, September 29, 2011 8:43 AM
To: solr-user@lucene.apache.org
Subject: RE: Errors in requesthandler statistics

Hi,

Thanks for your answer.

I have some logging by jetty. Every request looks like this:


  2011-09-29T12:28:47
  1317292127479
  18470
  org.apache.solr.core.SolrCore
  INFO
  org.apache.solr.core.SolrCore
  execute
  20
  [] webapp=/solr path=/select/
params={spellcheck=true&facet=true&sort=geodist()+asc&sfield=coord&spellcheck.q=test&facet.limit=20&version=2.2&fl=id,what,where}
hits=0 status=0 QTime=12 


How can i see which  gives an error? The file has stored 94000
requests

Roy

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Errors-in-requesthandler-statistics-tp3379163p3379288.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Errors in requesthandler statistics

2011-09-29 Thread roySolr
Hi,

Thanks for your answer.

I have some logging by jetty. Every request looks like this:


  2011-09-29T12:28:47
  1317292127479
  18470
  org.apache.solr.core.SolrCore
  INFO
  org.apache.solr.core.SolrCore
  execute
  20
  [] webapp=/solr path=/select/
params={spellcheck=true&facet=true&sort=geodist()+asc&sfield=coord&spellcheck.q=test&facet.limit=20&version=2.2&fl=id,what,where}
hits=0 status=0 QTime=12 


How can i see which  gives an error? The file has stored 94000
requests

Roy

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Errors-in-requesthandler-statistics-tp3379163p3379288.html
Sent from the Solr - User mailing list archive at Nabble.com.


Indexing geohash in solrj - Multivalued spatial search

2011-09-29 Thread Alessandro Benedetti
Hi all,
I have already read the topics in the mailing list that are regarding
spatial search, but I haven't found an answer ...
I have to index a multivalued field of type : "geohash" via solrj.
Now I build a string with the lat and lon comma separated ( like
 54.569468,67.58494 ) and index it in the geohash field.
The indexing process seems to work , but when I try a spatial search, Solr
returns me an exception with fieldcache on multivalued field.

I'm using Lucid Work enterprise 1.8 , and , as I know , it should be
integrated with this patch (
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994256#comment-12994256
).

Am I indexing wrong? Am I missing something?
The type of my spatial field is geohash ...

Cheers

-- 
---
Alessandro Benedetti

Sourcesense - making sense of Open Source: http://www.sourcesense.com


Re: How to reserve ids?

2011-09-29 Thread Erick Erickson
Hmmm, if treating them as stopwords, wouldn't
you have to list all the possible variants? E.g.
mystuff.msn.com
yourstuff.msn.com

etc? Is that sufficient or do you want
*.msn.com (which isn't legal in a stopword
file as far as I know)?

Best
Erick

On Tue, Sep 27, 2011 at 11:39 PM, Otis Gospodnetic
 wrote:
> Hi Gabriele,
>
> If you have a copy of Lucene in Action 2, that may be the easiest place to 
> read up on stopwords.  In short, when something is a stopword, it is just 
> that stopword that gets removed and thus not indexed and thus when you search 
> for it, it will not find a document that originally had that word.
>
> Otis
>
> P.S.
> Yes, reply works better. :)
> 
>
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>>
>>From: Gabriele Kahlout 
>>To: solr-user@lucene.apache.org; Otis Gospodnetic 
>>Sent: Tuesday, September 27, 2011 6:43 PM
>>Subject: Re: How to reserve ids?
>>
>>
>>Otis,
>>
>>I'm following up on this as solving my problem though the stopwords mechanism 
>>would be great. Do stopwords apply also to the url/id field?
>>
>>Continuing on the msn.com example, with "msn.com" as a stopword msn.com 
>>webpage may still actually be indexed if neither the title nor the body 
>>contains "msn.com". Isn't it?
>>
>>P.S.
>>I just click on 'reply to all' (or reply on the phone). If it bothers you 
>>I'll make the less lazy effort of selecting 'reply'
>>
>>
>>On Tue, Sep 27, 2011 at 6:40 PM, Otis Gospodnetic 
>> wrote:
>>
>>Gabriele,
>>>
>>>Using "msn.com" as a stopword would simply mean that msn.com would not be 
>>>indexed and therefore a search for "msn.com" would not yield results.  You 
>>>could still search for "hotmail" and it may match documents that have 
>>>"msn.com" token stored in them, even though "msn.com" is a stopword.
>>>
>>>Otis
>>>
>>>P.S.
>>>No need to CC me, I'm on the list.
>>>
>>>
>>>Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>>Lucene ecosystem search :: http://search-lucene.com/
>>>
>>>

From: Gabriele Kahlout 
To: solr-user@lucene.apache.org; Otis Gospodnetic 

Sent: Tuesday, September 27, 2011 1:58 AM
Subject: Re: How to reserve ids?
>>>

I'm interested in the stopwords solution as it sounds like less work but 
i'm not sure i understand how it works. By having msn.com as a stopword it 
doesnt mean i wont get msn.com as a result for say 'hotmail'. My 
understanding is that msn.com will never make it to the similarity function 
and thus affect the score calculation. But seldom does the url anyway (in 
my searches on content)!


>>
>>
>>--
>>Regards,
>>K. Gabriele
>>
>>--- unchanged since 20/9/10 ---
>>P.S. If the subject contains "[LON]" or the addressee acknowledges the 
>>receipt within 48 hours then I don't resend the email.
>>subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) 
>>< Now + 48h) ⇒ ¬resend(I, this).
>>
>>If an email is sent by a sender that is not a trusted contact or the email 
>>does not contain a valid code then the email is not received. A valid code 
>>starts with a hyphen and ends with "X".
>>∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ 
>>L(-[a-z]+[0-9]X)).
>>
>>
>>
>>


RE: Errors in requesthandler statistics

2011-09-29 Thread Jaeger, Jay - DOT
I am not expert, but based on my experience, the information you are looking 
for should indeed be in your logs.

There are at least three logs you might look for / at:

- An HTTP request log
- The solr log
- Logging by the application server / JVM

Some information is available at http://wiki.apache.org/solr/SolrLogging, but 
it is pretty sparse.  For Jetty, more detailed information is available at 
http://wiki.apache.org/solr/LoggingInDefaultJettySetup 

If you have an HTTP server out in front of your application server, or if your 
application server logs HTTP requests (like Jetty does, for example), you can 
spot errors via the HTTP status code returned, but there is no detail.

Otherwise errors are also logged in the Solr log.  It is (or at least can be) 
pretty detailed.  See the links above.  

We created a logging.properties entry to put the logs where we wanted them, per 
the LoggingInDefaultJettySetup link, above.

The application server log is system dependent.  For our prototype we started 
Jetty as a Windows service, and the logs end up in the jetty-service.log 
defined by the service wrapper.  For other application servers, the place, and 
what is in the log, may vary.

It is a good idea to find out where your logs live before you need them.  ;^)

JRJ

-Original Message-
From: roySolr [mailto:royrutten1...@gmail.com] 
Sent: Thursday, September 29, 2011 7:56 AM
To: solr-user@lucene.apache.org
Subject: Errors in requesthandler statistics

Hello,

I was taking a look to my SOLR statistics and i see in part of the
requesthandler a count of 23 by errors. How can i see which requests returns
this errors? Can i log this somewhere?

Thanks
Roy

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Errors-in-requesthandler-statistics-tp3379163p3379163.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH when using XML Files questions

2011-09-29 Thread Erick Erickson
Specific replies below, but what I'd seriously consider
is writing my own filesystem-aware hook that pushed
documents to known Solr servers rather than using
DIH to pull them. You could use the code from
FileSystemEntityProcessor as a base and go from there.
The FileSystemEntityProcessor isn't really intended to
do very complex stuff.

1> Don't think this is possible OOB. There's nothing built in to the
 DIH that puts in filesystem hooks and automatically tries to index
 it
2> Nope. DIH is pretty simple that way as per the
 FileListEntityProcessor.
3> I'm pretty sure this is irrelevant to FileSystemEntityProcessor,
 it's really used for the database importation.
4> "whatever order Java returns them in". Take a look at the
  FileListEntityProcessor code, but the relevant bit is below.
So the ordering is whatever Java does which I don't know
   what, if any, guarantees are made.
private void getFolderFiles(File dir, final List> fileDetails) {
// Fetch an array of file objects that pass the filter, however the
// returned array is never populated; accept() always returns false.
// Rather we make use of the fileDetails array which is populated as
// a side affect of the accept method.
dir.list(new FilenameFilter() {
  public boolean accept(File dir, String name) {
File fileObj = new File(dir, name);
if (fileObj.isDirectory()) {
  if (recursive) getFolderFiles(fileObj, fileDetails);
} else if (fileNamePattern == null) {
  addDetails(fileDetails, dir, name);
} else if (fileNamePattern.matcher(name).find()) {
  if (excludesPattern != null && excludesPattern.matcher(name).find())
return false;
  addDetails(fileDetails, dir, name);
}
return false;
  }
});
  }

On Tue, Sep 27, 2011 at 4:51 PM, Gabriel Cooper  wrote:
> I'm researching using DataImportHandler to import my data files utilizing
> FileDataSource with FileListEntityProcessor and have a couple questions
> before I get started that I'm hoping you guys can assist with.
>
> 1) I would like to put a file on the local filesystem in the configured
> location and have Solr see and process the file without additional effort on
> my part.
> 1a) Is this doable in any way? From what I've seen, this is not supported
> and I must manually call a URL (e.g.
> http://foo/solr/dataimport?command=full-import).
> 1b) The manual, URL-based invocation method seems perfectly logical in a
> database-oriented world, where one might schedule an update to run regularly
> but in my case I have a couple identical indexes I load balance between and
> don't want to run the same hefty query multiple times in parallel. As such,
> I'm doing one query, writing the results to an XML file, pushing that file
> to each box, and then wanting that file processed. I'd like the process to
> be as automated as possible.
>
> 2) I would like any files processed by Solr to be deleted after they've been
> imported. I haven't seen any way to do this currently. I thought I might be
> able to subclass something, but FileListEntityProcessor, for example,
> doesn't seem to give any handles at the right time in the workflow to delete
> a file.
>
> 3) When reading the DIH documentation, I ran across this statement: "When
> delta-import command is executed, it reads the start time stored in *
> conf/dataimport.properties*. It uses that timestamp to run delta queries and
> after completion, updates the timestamp in *conf/dataimport.properties*." If
> it really does update the date to the completion date, what happens to any
> files added between the start and end dates? Are they lost?
>
> 4) For delta imports, I don't see mention of how processed files are ordered
> other than that it tries not to re-import files older than that mentioned in
> the conf/dataimport.properties file. In cases where order matters, does it
> order the files by name or creation date or ...?
>
> Thanks for any help,
>
> Gabriel.
>


Re: SolrCloud: is there a programmatic way to create an ensemble

2011-09-29 Thread Mark Miller
That's normally what you want to do - setup a separate quorum for production.

On Sep 29, 2011, at 1:36 AM, Jamie Johnson wrote:

> I'm not a solrcloud guru, but why not start your zookeeper quorum separately?
> 
> I also believe that you can specify a zoo.cfg file which will create a
> zk quorum from solr
> 
> example zoo.cfg (from
> http://zookeeper.apache.org/doc/current/zookeeperStarted.html#sc_RunningReplicatedZooKeeper)
> 
> tickTime=2000
> dataDir=/var/zookeeper
> clientPort=2181
> initLimit=5
> syncLimit=2
> server.1=zoo1:2888:3888
> server.2=zoo2:2888:3888
> server.3=zoo3:2888:3888
> 
> On Thu, Sep 29, 2011 at 12:17 AM, Pulkit Singhal
>  wrote:
>> Did you find out about this?
>> 
>> 2011/8/2 Yury Kats :
>>> I have multiple SolrCloud instances, each running its own Zookeeper
>>> (Solr launched with -DzkRun).
>>> 
>>> I would like to create an ensemble out of them. I know about -DzkHost
>>> parameter, but can I achieve the same programmatically? Either with
>>> SolrJ or REST API?
>>> 
>>> Thanks,
>>> Yury
>>> 
>> 

- Mark Miller
lucidimagination.com
2011.lucene-eurocon.org | Oct 17-20 | Barcelona












RE: About solr distributed search

2011-09-29 Thread Jaeger, Jay - DOT
I am no expert, but here is my take and our situation.

Firstly, are you asking what the minimum number of documents is before it makes 
*any* sense at all to use a distributed search, or are you asking what the 
maximum number of documents is before a distributed search is essentially 
required?  The answers would be different.  I get the feeling you are asking 
the second question, so I'll proceed under that assumption.

I expect that in part the answer is "it depends".  I expect that it is mostly a 
function of the size of the index (and the interaction between that and memory 
and search performance), which depends on both the number of documents and how 
much is stored for the documents.  It also would depend upon your update load.

If the documents are small and/or the amount of stuff you store per document is 
small , then until the number of documents and/or updates gets truly enormous a 
single machine will probably be fine.

But, if your documents (the amount stored per document) is very large, then at 
some point the index files get so large that performance on a single machine 
isn't adequate.  Alternatively, if your update load is very very large, you 
might need to spread out that load among multiple servers to handle the update 
load without crippling your ability to respond to queries.

As for a specific instance, we have a single index of 7 Million (going on 28 
Million), with maybe 512 bytes of data stored for each document, with maybe 26 
or so indexed fields (we have a *lot* of copyField operations in order to index 
the data the way we want it, yet preserve the original data to return), and did 
not need to use distributed search.

JRJ

-Original Message-
From: Pengkai Qin [mailto:qin19890...@163.com] 
Sent: Thursday, September 29, 2011 5:15 AM
To: solr-user@lucene.apache.org; d...@lucene.apache.org
Subject: About solr distributed search

Hi all,

Now I'm doing research on solr distributed search, and it is said documents 
more than one million is reasonable to use distributed search.
So I want to know, does anyone have the test result(Such as time cost) of using 
single index and distributed search of more than one million data? I need the 
test result very urgent, thanks in advance!

Best Regards,
Pengkai




Re: Solr stopword problem in Query

2011-09-29 Thread Erick Erickson
I think your problem is that you've set

omitTermFreqAndPositions="true"

It's not real clear from the Wiki page, but
the tricky little phrase

"Queries that rely on position that are issued
on a field with this option will silently fail to
find documents."

And phrase queries rely on position information

Best
Erick

On Tue, Sep 27, 2011 at 11:00 AM, Rahul Warawdekar
 wrote:
> Hi Isan,
>
> The schema.xml seems OK to me.
>
> Is "textForQuery" the only field you are searching in ?
> Are you also searching on any other non text based fields ? If yes, please
> provide schema description for those fields also.
> Also, provide your solrconfig.xml file.
>
>
> On Tue, Sep 27, 2011 at 1:12 AM, Isan Fulia wrote:
>
>> Hi Rahul,
>>
>> I also tried searching "Coke Studio MTV" but no documents were returned.
>>
>> Here is the snippet of my schema file.
>>
>>  > positionIncrementGap="100" autoGeneratePhraseQueries="true">
>>
>>      
>>        
>>
>>        >                ignoreCase="true"
>>
>>                words="stopwords_en.txt"
>>                enablePositionIncrements="true"
>>
>>                />
>>        > generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>>
>>        
>>
>>        > protected="protwords.txt"/>
>>
>>        
>>      
>>
>>      
>>        
>>
>>        > synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>
>>        >                ignoreCase="true"
>>
>>                words="stopwords_en.txt"
>>                enablePositionIncrements="true"
>>
>>                />
>>        > generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>>
>>        
>>
>>        > protected="protwords.txt"/>
>>
>>        
>>      
>>
>>    
>>
>>
>> *> multiValued="false"/>
>> > multiValued="false"/>
>>
>> **> multiValued="true" omitTermFreqAndPositions="true"/>**
>>
>> 
>> *
>>
>>
>> Thanks,
>> Isan Fulia.
>>
>>
>> On 26 September 2011 21:19, Rahul Warawdekar > >wrote:
>>
>> > Hi Isan,
>> >
>> > Does your search return any documents when you remove the 'at' keyword
>> and
>> > just search for "Coke studio MTV" ?
>> > Also, can you please provide the snippet of schema.xml file where you
>> have
>> > mentioned this field name and its "type" description ?
>> >
>> > On Mon, Sep 26, 2011 at 6:09 AM, Isan Fulia > > >wrote:
>> >
>> > > Hi all,
>> > >
>> > > I have a text field named* textForQuery* .
>> > > Following content has been indexed into solr in field textForQuery
>> > > *Coke Studio at MTV*
>> > >
>> > > when i fired the query as
>> > > *textForQuery:("coke studio at mtv")* the results showed 0 documents
>> > >
>> > > After runing the same query in debugMode i got the following results
>> > >
>> > > 
>> > > 
>> > > textForQuery:("coke studio at mtv")
>> > > textForQuery:("coke studio at mtv")
>> > > PhraseQuery(textForQuery:"coke studio ?
>> > mtv")
>> > > textForQuery:"coke studio *?
>> *mtv"
>> > >
>> > > Why the query did not matched any document even when there is a
>> document
>> > > with value of textForQuery as *Coke Studio at MTV*?
>> > > Is this because of the stopword *at* present in stopwordList?
>> > >
>> > >
>> > >
>> > > --
>> > > Thanks & Regards,
>> > > Isan Fulia.
>> > >
>> >
>> >
>> >
>> > --
>> > Thanks and Regards
>> > Rahul A. Warawdekar
>> >
>>
>>
>>
>> --
>> Thanks & Regards,
>> Isan Fulia.
>>
>
>
>
> --
> Thanks and Regards
> Rahul A. Warawdekar
>


Errors in requesthandler statistics

2011-09-29 Thread roySolr
Hello,

I was taking a look to my SOLR statistics and i see in part of the
requesthandler a count of 23 by errors. How can i see which requests returns
this errors? Can i log this somewhere?

Thanks
Roy

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Errors-in-requesthandler-statistics-tp3379163p3379163.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: 32-bit to 64-bit

2011-09-29 Thread Jaeger, Jay - DOT
Are you changing just the host OS or the JVM, or both, from 32 bit to 64 bit?

If it is just the OS, the answer is definitely no, you don't need to do 
anything more than copy.

If the answer is the JVM, I *think* the answer is still no, but others more 
authoritative than I may wish to respond.

-Original Message-
From: - - [mailto:loh@hotmail.com] 
Sent: Thursday, September 29, 2011 5:15 AM
To: solr-user@lucene.apache.org
Subject: 32-bit to 64-bit


Hi,
I indexed my data on my 32-bit computer.Do I need to re-index if I upload my 
data to a 64-bit server or does copying the data directories would suffice?
Thank you.


Re: basic solr cloud questions

2011-09-29 Thread Yury Kats
On 9/29/2011 7:22 AM, Darren Govoni wrote:
> That was kinda my point. The "new" cloud implementation
> is not about replication, nor should it be. But rather about
> horizontal scalability where "nodes" manage different parts
> of a unified index. 

It;s about many things. You stated one, but there are goals,
one of them being tolerance to node outages. In a cloud, when
one of your many nodes fail, you don't want to stop querying and
indexing. For this to happen, you need to maintain redundant copies
of the same pieces of the index, hence you need to replicate.

> One of the design goals of the "new" cloud
> implementation is for this to happen more or less automatically.

True, but there is a big gap between goals and current state.
Right now, there is distributed search, but not distributed indexing
or auto-sharding, or auto-replication. So if you want to use the SolrCloud
now (as many of us do), you need do a number of things yourself,
even if they might be done by SolrCloud automatically in the future.

> To me that means one does not have to manually distributed
> documents or enforce replication as Yurly suggests.
> Replication is different to me than what was being asked.
> And perhaps I misunderstood the original question.
> 
> Yurly's response introduced the term "core" where the original
> person was referring to "nodes". For all I know, those are two
> different things in the new cloud design terminology (I believe they are).
> 
> I guess understanding "cores" vs. "nodes" vs "shards" is helpful. :)

Shard is a slice of index. Index is managed/stored in a core.
Nodes are Solr instances, usually physical machines.

Each node can host multiple shards, and each shard can consist of multiple 
cores.
However, all cores within the same shard must have the same content.

This is where the OP ran into the problem. The OP had 1 shard, consisting of two
cores on two nodes. Since there is no distributed indexing yet, all documents 
were
indexed into a single core. However, there is distributed search, therefore 
queries
were sent randomly to different cores of the same shard. Since one core in the 
shard
had documents and the other didn't, the query result was random.

To solve this problem, the OP must make sure all cores within the same shard 
(be they
on the same node or not) have the same content. This can currently be achieved 
by:
a) setting up replication between cores. you index into one core and the other 
core
replicates the content
b) indexing into both cores

Hope this clarifies.


Re: SolrCloud: is there a programmatic way to create an ensemble

2011-09-29 Thread Yury Kats
Nope

On 9/29/2011 12:17 AM, Pulkit Singhal wrote:
> Did you find out about this?
> 
> 2011/8/2 Yury Kats :
>> I have multiple SolrCloud instances, each running its own Zookeeper
>> (Solr launched with -DzkRun).
>>
>> I would like to create an ensemble out of them. I know about -DzkHost
>> parameter, but can I achieve the same programmatically? Either with
>> SolrJ or REST API?
>>
>> Thanks,
>> Yury
>>
> 



Re: Query failing because of omitTermFreqAndPositions

2011-09-29 Thread Michael McCandless
Once a given field has omitted positions in the past, even for just
one document, it "sticks" and that field will forever omit positions.

Try creating a new index, never omitting positions from that field?

Mike McCandless

http://blog.mikemccandless.com

On Thu, Sep 29, 2011 at 1:14 AM, Isan Fulia  wrote:
> Hi All,
>
> My schema consisted of field textForQuery which was defined as
>  multiValued="true"/>
>
> After indexing 10 lakhs  of  documents  I changed the field to
>  multiValued="true" *omitTermFreqAndPositions="true"*/>
>
> So documents that were indexed after that omiited the position information
> of the terms.
> As a result I was not able to search the text which rely on position
> information for eg. "coke studio at mtv" even though its present in some
> documents.
>
> So I again changed the field textForQuery to
>  multiValued="true"/>
>
> But now even for new documents added  the query requiring positon
> information is still failing.
> For example i reindexed certain documents that consisted of "coke studio at
> mtv" but still the query is not returning any documents when searched for
> *textForQuery:"coke studio at mtv"*
>
> Can anyone please help me out why this is happening
>
>
> --
> Thanks & Regards,
> Isan Fulia.
>


Re: basic solr cloud questions

2011-09-29 Thread Darren Govoni

That was kinda my point. The "new" cloud implementation
is not about replication, nor should it be. But rather about
horizontal scalability where "nodes" manage different parts
of a unified index. One of the design goals of the "new" cloud
implementation is for this to happen more or less automatically.

To me that means one does not have to manually distributed
documents or enforce replication as Yurly suggests.
Replication is different to me than what was being asked.
And perhaps I misunderstood the original question.

Yurly's response introduced the term "core" where the original
person was referring to "nodes". For all I know, those are two
different things in the new cloud design terminology (I believe they are).

I guess understanding "cores" vs. "nodes" vs "shards" is helpful. :)

cheers!
Darren


On 09/29/2011 12:00 AM, Pulkit Singhal wrote:

@Darren: I feel that the question itself is misleading. Creating
shards is meant to separate out the data ... not keep the exact same
copy of it.

I think the two node setup that was attempted by Sam mislead him and
us into thinking that configuring two nodes which are to be named
"shard1" ... somehow means that they are instantly replicated too ...
this is not the case! I can see how this misunderstanding can develop
as I too was confused until Yury cleared it up.

@Sam: If you are interested in performing a quick exercise to
understand the pieces involved for replication rather than sharding
... perhaps this link would be of help in taking you through it:
http://pulkitsinghal.blogspot.com/2011/09/setup-solr-master-slave-replication.html

- Pulkit

2011/9/27 Yury Kats:

On 9/27/2011 5:16 PM, Darren Govoni wrote:

On 09/27/2011 05:05 PM, Yury Kats wrote:

You need to either submit the docs to both nodes, or have a replication
setup between the two. Otherwise they are not in sync.

I hope that's not the case. :/ My understanding (or hope maybe) is that
the new Solr Cloud implementation will support auto-sharding and
distributed indexing. This means that shards will receive different
documents regardless of which node received the submitted document
(spread evenly based on a hash<->node assignment). Distributed queries
will thus merge all the solr shard/node responses.

All cores in the same shard must somehow have the same index.
Only then can you continue servicing searches when individual cores
fail. Auto-sharding and distributed indexing don't have anything to
do with this.

In the future, SolrCloud may be managing replication between cores
in the same shard automatically. But right now it does not.





Re: SOLR Index Speed

2011-09-29 Thread Lord Khan Han
Hi,

The no-op run completed in 20 minutes. The only commented line was
"solr.addBean(doc)" We've tried SUSS as a drop in replacement for
CommonsHttpSolrServer but it's behavior was weird. We have seen 10Ks of
seconds for updates and it continues for a very long time after sending to
solr is complete. We thought that it was because we are indexing POJOS as
documents. BTW, SOLR-1565 and SOLR-2755 says that SUSS does not support
binary payload.


CommonsHttpSolrServer solr = new CommonsHttpSolrServer(url);

solr.setRequestWriter(new BinaryRequestWriter());

...

// doc is a solrj annotated POJO

solr.addBean(doc)


Any thoughts what may be taking too long? Before mapreduce we were indexing
in 2-3 hours to localhost using the same code base.

On Tue, Sep 27, 2011 at 8:55 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

> Hello,
>
> By the way, should you need help with Hadoop+Solr, please feel free to get
> in touch with us at Sematext (see below) - we happen to work with Hadoop and
> Solr on a daily basis and have successfully implemented parallel indexing
> into Solr with/from Hadoop.
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase
> Lucene ecosystem search :: http://search-lucene.com/
>
>
> --
> *From:* Otis Gospodnetic 
> *To:* "solr-user@lucene.apache.org" 
> *Sent:* Tuesday, September 27, 2011 1:37 PM
>
> *Subject:* Re: SOLR Index Speed
>
> Hi,
>
> No need to use reply-all and CC me directly, I'm on the list :)
>
> It sounds like Solr is not the problem, but the Hadoop side.  For example,
> what if you change your reducer not to call Solr but do some no-op.  Does it
> go beyond 500-700 docs/minute?
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> >
> >From: Lord Khan Han 
> >To: solr-user@lucene.apache.org; Otis Gospodnetic <
> otis_gospodne...@yahoo.com>
> >Sent: Tuesday, September 27, 2011 4:42 AM
> >Subject: Re: SOLR Index Speed
> >
> >Our producer (hadoop  mapper prepare the docs for submitting and the
> reducer
> >diriectly submit from solrj  http submit..) now 32 reducer but still the
> >indexing speed 500 - 700 doc per minute.  submission coming from a hadoop
> >cluster so submit speed is not a problem.  I couldnt use the full solr
> index
> >machine resources.
> >
> >I gave 12 gig heap to solr and machine is not swapping.
> >
> >I couldnt figure out the problem if there is..
> >
> >PS: We are committing at the end of the submit.
> >
> >
> >On Tue, Sep 27, 2011 at 11:37 AM, Lord Khan Han  >wrote:
> >
> >> Sorry :)  it is not 500 doc per sec.  ( It is what i wish I think)  It
> is
> >> 500 doc per MINUTE..
> >>
> >>
> >>
> >> On Tue, Sep 27, 2011 at 7:14 AM, Otis Gospodnetic <
> >> otis_gospodne...@yahoo.com> wrote:
> >>
> >>> Hello,
> >>>
> >>> > PS: solr streamindex  is not option because we need to submit
> javabin...
> >>>
> >>>
> >>> If you are referring to StreamingUpdateSolrServer, then the above
> >>> statement makes no sense and you should give SUSS a try.
> >>>
> >>> Are you sure your 16 reducers produce more than 500 docs/second?
> >>> I think somebody already suggested increasing the number of reducers to
> >>> ~32.
> >>> What happens to your CPU load and indexing speed then?
> >>>
> >>>
> >>> Otis
> >>> 
> >>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> >>> Lucene ecosystem search :: http://search-lucene.com/
> >>>
> >>>
> >>> >
> >>> >From: Lord Khan Han 
> >>> >To: solr-user@lucene.apache.org
> >>> >Sent: Monday, September 26, 2011 7:09 AM
> >>> >Subject: SOLR Index Speed
> >>> >
> >>> >Hi,
> >>> >
> >>> >We have 500K web document and usind solr (trunk) to index it. We have
> >>> >special anaylizer which little bit heavy cpu .
> >>> >Our machine config:
> >>> >
> >>> >32 x cpu
> >>> >32 gig ram
> >>> >SAS HD
> >>> >
> >>> >We are sending document with 16 reduce client (from hadoop) to the
> stand
> >>> >alone solr server. the problem is we couldnt get speedier than the 500
> >>> doc /
> >>> >per sec. 500K document tooks 7-8 hours to index :(
> >>> >
> >>> >While indexin the the solr server cpu load is around : 5-6  (32 max)
> it
> >>> >means  %20 of the cpu total power. We have plenty ram ...
> >>> >
> >>> >I turned of auto commit  and give 8198 rambuffer .. there is no io
> wait
> >>> ..
> >>> >
> >>> >How can I make it faster ?
> >>> >
> >>> >PS: solr streamindex  is not option because we need to submit
> javabin...
> >>> >
> >>> >thanks..
> >>> >
> >>> >
> >>> >
> >>>
> >>
> >>
> >
> >
> >
>
>


About solr distributed search

2011-09-29 Thread Pengkai Qin
Hi all,

Now I'm doing research on solr distributed search, and it is said documents 
more than one million is reasonable to use distributed search.
So I want to know, does anyone have the test result(Such as time cost) of using 
single index and distributed search of more than one million data? I need the 
test result very urgent, thanks in advance!

Best Regards,
Pengkai




32-bit to 64-bit

2011-09-29 Thread - -

Hi,
I indexed my data on my 32-bit computer.Do I need to re-index if I upload my 
data to a 64-bit server or does copying the data directories would suffice?
Thank you.

Re: autosuggest combination of data from documents and popular queries

2011-09-29 Thread abhayd
hi Hoss,
This helps.

Only thing i am not sure is use of TermsComponent. As I understand
TermsComponent allows sorking only on count|index. So I m not sure how
popularity could be used for sort or boost.

Any thoughts around using TermsComponent with popularity? If this is
possible then i dont think I would even need ngrams at all

Any suggestions?

abhay

--
View this message in context: 
http://lucene.472066.n3.nabble.com/autosuggest-combination-of-data-from-documents-and-popular-queries-tp3360657p3378874.html
Sent from the Solr - User mailing list archive at Nabble.com.