in Problem

2013-10-01 Thread PAVAN
Hi,

When i type any query string without in it is giving proper results. But
when i try same query string using in then it is not displaying the proper
results. May i know what is the problem.

And i mentioned in as a stopword. If remove in from the stop words it is
not showing relevant results.

Ex :

used computers chennai -- showing good results

used computer in chennai -- Not showing proper results 


Can anybody tell me what is the problem?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/in-Problem-tp4092866.html
Sent from the Solr - User mailing list archive at Nabble.com.


Not able to run sample solr examples

2013-10-01 Thread mamta
Hi,

I am running Solr on Tomcat server and am able to go to the solr link from
my Tomcat manager.

I want to try running quieries through the solr admin page on the solr
examples which come built-in when i install solr.

How can i run queries on those examples?

Thanks,
Mamta



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-able-to-run-sample-solr-examples-tp4092872.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Issue in parallel Indexing using multiple csv files

2013-10-01 Thread zaheer.java
Ran more tests. It works.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Issue-in-parallel-Indexing-using-multiple-csv-files-tp4092452p4092873.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: in Problem

2013-10-01 Thread Dmitry Kan
Hi,

See here, hope it helps.

http://stackoverflow.com/questions/2681393/solr-is-there-a-way-to-include-stopwords-when-searching-exact-phrases


On Tue, Oct 1, 2013 at 9:34 AM, PAVAN pavans2...@gmail.com wrote:

 Hi,

 When i type any query string without in it is giving proper results. But
 when i try same query string using in then it is not displaying the proper
 results. May i know what is the problem.

 And i mentioned in as a stopword. If remove in from the stop words it
 is
 not showing relevant results.

 Ex :

 used computers chennai -- showing good results

 used computer in chennai -- Not showing proper results


 Can anybody tell me what is the problem?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/in-Problem-tp4092866.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Problem regarding queries enclosed in double quotes in Solr 3.4

2013-10-01 Thread Dmitry Kan
Perhaps you can make a query parser to fix this?
It would parse the incoming query and substitute some_terms with
some_terms ~0


On Tue, Oct 1, 2013 at 7:43 AM, Kunal Mittal kunalmitta...@gmail.comwrote:

 We have a Solr 3.4 setup. When we try to do queries with double quotes
 like :
 semantic web , the query takes a long time to execute.
 One solution we are thinking about is to make the same query without the
 quotes and set the phrase slop(ps) parameter to 0. That is quite quicker
 than the query with the quotes and gives similar results to the query with
 quotes.
 Is there a way to fix this by modifying the schema.xml file? Any
 suggestions
 would be appreciated.




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Problem-regarding-queries-enclosed-in-double-quotes-in-Solr-3-4-tp4092856.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Newbie to Solr

2013-10-01 Thread mamta
Hi,

I want to know that if i have to fire some query through the Solr admin, do
i need to create a new schema.xml? Where do i place it incase iahve to
create a new one.

Incase i can edit the original schema.xml can there be two fields named id
in my schema.xml?

I desperately need help in running queries on the Solr admin which is
configured on a Tomcat server.

What all preparation will i need to do? Schema.xml any docs? 

Any help will be highly appreciated.

Thanks,
Mamta



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Newbie-to-Solr-tp4092876.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: OpenJDK or OracleJDK

2013-10-01 Thread Raheel Hasan
This sounds interesting... Thanks guyz for the replies.. :)


On Tue, Oct 1, 2013 at 8:07 AM, Otis Gospodnetic otis.gospodne...@gmail.com
 wrote:

 Hi,

 A while back I remember we notices some SPM users were having issues
 with OpenJDK.  Since then we've been recommending Oracle's
 implementation to our Solr and to SPM users.  At the same time, we
 haven't seen any issues with OpenJDK in the last ~6 months.  Oracle
 JDK is not slow. :)

 Otis
 --
 Solr  ElasticSearch Support -- http://sematext.com/
 Performance Monitoring -- http://sematext.com/spm



 On Mon, Sep 30, 2013 at 11:02 PM, Shawn Heisey s...@elyograg.org wrote:
  On 9/30/2013 9:28 AM, Raheel Hasan wrote:
  hmm why is that so?
  Isnt Oracle's version a bit slow?
 
  For Java 6, the Sun JDK is the reference implementation.  For Java 7,
  OpenJDK is the reference implementation.
 
  http://en.wikipedia.org/wiki/Reference_implementation
 
  I don't think Oracle's version could really be called slow.  Sun
  invented Java.  Sun open sourced Java.  Oracle bought Sun.
 
  The Oracle implemetation is likely more conservative than some of the
  other implementations, like the one by IBM.  The IBM implementation is
  pretty aggressive with optimization, so aggressive that Solr and Lucene
  have a history of revealing bugs that only exist in that implementation.
 
  Thanks,
  Shawn
 




-- 
Regards,
Raheel Hasan


{soft}Commit and cache flusing

2013-10-01 Thread Dmitry Kan
Hello!

This is a minor thing, perhaps, but thought to ask / share:

if there are no modifications to an index and a softCommit or hardCommit
issued, then solr flushes the cache.

Is this designed on purpose?

Regards,

Dmitry


Re: Problem regarding queries enclosed in double quotes in Solr 3.4

2013-10-01 Thread Upayavira
Which query parser are you using? It seems you are mixing them up.

As far as I know, edismax doesnt support quoted phrases, it uses pf
param to invoke phrase queries. Likewise, the lucene query parser
doesn't support a phrase slop param, it uses a phrase slop~2 syntax.

Upayavira 

On Tue, Oct 1, 2013, at 05:43 AM, Kunal Mittal wrote:
 We have a Solr 3.4 setup. When we try to do queries with double quotes
 like :
 semantic web , the query takes a long time to execute.
 One solution we are thinking about is to make the same query without the
 quotes and set the phrase slop(ps) parameter to 0. That is quite quicker
 than the query with the quotes and gives similar results to the query
 with
 quotes.
 Is there a way to fix this by modifying the schema.xml file? Any
 suggestions
 would be appreciated.
 
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Problem-regarding-queries-enclosed-in-double-quotes-in-Solr-3-4-tp4092856.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: in Problem

2013-10-01 Thread PAVAN
Hi Dmitry,

I already defined in the following way

filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/





--
View this message in context: 
http://lucene.472066.n3.nabble.com/in-Problem-tp4092866p4092899.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr doesn't return TermVectors

2013-10-01 Thread Alessandro Benedetti
Nope, it's not  the last component problem, but it's definetely the
request handler problem, it was the same for me ...
Switching to the /tvrh requesthandler solved my problem.
We should update the wiki !


2013/9/27 Shawn Heisey s...@elyograg.org

 On 9/27/2013 4:02 PM, Jack Krupansky wrote:

 You are using components instead of last-components, so you have to
 all search components, including the QueryComponent. Better to use
 last-components.


 That did it.  Thank you!  I didn't know why this was a problem even with
 your note, until I read the last part of this page, which says that using
 components will entirely replace the default component list with what you
 specify:

 http://wiki.apache.org/solr/**SearchComponenthttp://wiki.apache.org/solr/SearchComponent

 I copied and modified the handler from one I've already got that's using
 TermsComponent, which was using components instead of last-components. That
 handler works, so I figured it would for /tv as well. :)

 Thanks,
 Shawn




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Not able to run sample solr examples

2013-10-01 Thread Kishan Parmar
http://www.coretechnologies.com/products/AlwaysUp/Apps/RunApacheSolrAsAService.html

Regards,

Kishan Parmar
Software Developer
+91 95 100 77394
Jay Shree Krishnaa !!



On Tue, Oct 1, 2013 at 12:48 AM, mamta mamta.al...@gmail.com wrote:

 Hi,

 I am running Solr on Tomcat server and am able to go to the solr link from
 my Tomcat manager.

 I want to try running quieries through the solr admin page on the solr
 examples which come built-in when i install solr.

 How can i run queries on those examples?

 Thanks,
 Mamta



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Not-able-to-run-sample-solr-examples-tp4092872.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Not able to run sample solr examples

2013-10-01 Thread Mamta Alshi
Hi,

My problem is i am not able to run the sample examples given in solr .i
cannot run them through the solr admin consoleit doesn't give me the
result.I have already indexed the documents.

Appreciate your help!

Thanks,
Mamta


On Tue, Oct 1, 2013 at 3:08 PM, Kishan Parmar kishan@gmail.com wrote:


 http://www.coretechnologies.com/products/AlwaysUp/Apps/RunApacheSolrAsAService.html

 Regards,

 Kishan Parmar
 Software Developer
 +91 95 100 77394
 Jay Shree Krishnaa !!



 On Tue, Oct 1, 2013 at 12:48 AM, mamta mamta.al...@gmail.com wrote:

  Hi,
 
  I am running Solr on Tomcat server and am able to go to the solr link
 from
  my Tomcat manager.
 
  I want to try running quieries through the solr admin page on the solr
  examples which come built-in when i install solr.
 
  How can i run queries on those examples?
 
  Thanks,
  Mamta
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Not-able-to-run-sample-solr-examples-tp4092872.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: Newbie to Solr

2013-10-01 Thread Kishan Parmar
yes you have to create your own schema
but in schema file you have to add your xml files field name in it like wise
you can add your field name in it 

or you can add  your filed in the default schema file

whiithout schema you can not add your xml file to solr

my schema is like this
--
?xml version=1.0 encoding=UTF-8 ?
schema name=example version=1.5
fields
 field name=No type=string indexed=true stored=true
required=true multiValued=false /
 field name=Name type=string indexed=true stored=true
required=true multiValued=false /
 field name=Address type=string indexed=true stored=true
required=true multiValued=false /
 field name=Mobile type=string indexed=true stored=true
required=true multiValued=false /
/fields
uniqueKeyNo/uniqueKey

types

  fieldType name=string class=solr.StrField sortMissingLast=true /
  fieldType name=int class=solr.TrieIntField precisionStep=0
positionIncrementGap=0 /
/types
/schema
-

and my file is like this ,,.,.,.,.

-
add
doc
field name=No100120107088/field
field name=Namekishan/field
field name=Addressghatlodia/field
field name=Mobile9510077394/field
/doc
/add

Regards,

Kishan Parmar
Software Developer
+91 95 100 77394
Jay Shree Krishnaa !!



On Tue, Oct 1, 2013 at 1:11 AM, mamta mamta.al...@gmail.com wrote:

 Hi,

 I want to know that if i have to fire some query through the Solr admin, do
 i need to create a new schema.xml? Where do i place it incase iahve to
 create a new one.

 Incase i can edit the original schema.xml can there be two fields named id
 in my schema.xml?

 I desperately need help in running queries on the Solr admin which is
 configured on a Tomcat server.

 What all preparation will i need to do? Schema.xml any docs?

 Any help will be highly appreciated.

 Thanks,
 Mamta



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Newbie-to-Solr-tp4092876.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: Newbie to Solr

2013-10-01 Thread Mamta S Kanade
Can you tell me what all docs I need to create...there needs to be a schema.xml 
and what else? A document having my data?

Also, where these should be placed. There's already a schema.xml

Thanks for the prompt response.

Mamta.

-Original Message-
From: Kishan Parmar [mailto:kishan@gmail.com]
Sent: 01 October, 2013 03:16 PM
To: solr-user@lucene.apache.org
Subject: Re: Newbie to Solr

yes you have to create your own schema
but in schema file you have to add your xml files field name in it like wise 
you can add your field name in it 

or you can add  your filed in the default schema file

whiithout schema you can not add your xml file to solr

my schema is like this
--
?xml version=1.0 encoding=UTF-8 ?
schema name=example version=1.5
fields
 field name=No type=string indexed=true stored=true
required=true multiValued=false /
 field name=Name type=string indexed=true stored=true
required=true multiValued=false /
 field name=Address type=string indexed=true stored=true
required=true multiValued=false /
 field name=Mobile type=string indexed=true stored=true
required=true multiValued=false /
/fields
uniqueKeyNo/uniqueKey

types

  fieldType name=string class=solr.StrField sortMissingLast=true /
  fieldType name=int class=solr.TrieIntField precisionStep=0
positionIncrementGap=0 /
/types
/schema
-

and my file is like this ,,.,.,.,.

-
add
doc
field name=No100120107088/field
field name=Namekishan/field
field name=Addressghatlodia/field
field name=Mobile9510077394/field
/doc
/add

Regards,

Kishan Parmar
Software Developer
+91 95 100 77394
Jay Shree Krishnaa !!



On Tue, Oct 1, 2013 at 1:11 AM, mamta mamta.al...@gmail.com wrote:

 Hi,

 I want to know that if i have to fire some query through the Solr
 admin, do i need to create a new schema.xml? Where do i place it
 incase iahve to create a new one.

 Incase i can edit the original schema.xml can there be two fields
 named id in my schema.xml?

 I desperately need help in running queries on the Solr admin which is
 configured on a Tomcat server.

 What all preparation will i need to do? Schema.xml any docs?

 Any help will be highly appreciated.

 Thanks,
 Mamta



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Newbie-to-Solr-tp4092876.html
 Sent from the Solr - User mailing list archive at Nabble.com.


The content of this email together with any attachments, statements and 
opinions expressed herein contains information that is private and confidential 
are intended for the named addressee(s) only. If you are not the addressee of 
this email you may not copy, forward, disclose or otherwise use it or any part 
of it in any form whatsoever. If you have received this message in error please 
notify postmas...@etisalat.ae by email immediately and delete the message 
without making any copies.


Re: in Problem

2013-10-01 Thread Dmitry Kan
can you run both examples you provided through the query analysis of solr
admin and see if there is any difference with term positions?


On Tue, Oct 1, 2013 at 1:36 PM, PAVAN pavans2...@gmail.com wrote:

 Hi Dmitry,

 I already defined in the following way

 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/in-Problem-tp4092866p4092899.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Newbie to Solr

2013-10-01 Thread Kishan Parmar
you have to create only schema file
dont change anything in solr config file,,

and your xml file which you want to index from solr

if you are new in solr then there is core named collection1
you have to add thee schema file in that collection conf folder

C:\solr\example\solr\collection1\conf

your schema file is should be in c: solr - examples example docs folder
in that folder post.jar and post.sh file there so that you can add yu



Regards,

Kishan Parmar
Software Developer
+91 95 100 77394
Jay Shree Krishnaa !!



On Tue, Oct 1, 2013 at 4:19 AM, Mamta S Kanade mkan...@etisalat.ae wrote:

 Can you tell me what all docs I need to create...there needs to be a
 schema.xml and what else? A document having my data?

 Also, where these should be placed. There's already a schema.xml

 Thanks for the prompt response.

 Mamta.

 -Original Message-
 From: Kishan Parmar [mailto:kishan@gmail.com]
 Sent: 01 October, 2013 03:16 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Newbie to Solr

 yes you have to create your own schema
 but in schema file you have to add your xml files field name in it like
 wise you can add your field name in it 

 or you can add  your filed in the default schema file

 whiithout schema you can not add your xml file to solr

 my schema is like this

 --
 ?xml version=1.0 encoding=UTF-8 ?
 schema name=example version=1.5
 fields
  field name=No type=string indexed=true stored=true
 required=true multiValued=false /
  field name=Name type=string indexed=true stored=true
 required=true multiValued=false /
  field name=Address type=string indexed=true stored=true
 required=true multiValued=false /
  field name=Mobile type=string indexed=true stored=true
 required=true multiValued=false /
 /fields
 uniqueKeyNo/uniqueKey

 types

   fieldType name=string class=solr.StrField sortMissingLast=true /
   fieldType name=int class=solr.TrieIntField precisionStep=0
 positionIncrementGap=0 /
 /types
 /schema

 -

 and my file is like this ,,.,.,.,.


 -
 add
 doc
 field name=No100120107088/field
 field name=Namekishan/field
 field name=Addressghatlodia/field
 field name=Mobile9510077394/field
 /doc
 /add

 Regards,

 Kishan Parmar
 Software Developer
 +91 95 100 77394
 Jay Shree Krishnaa !!



 On Tue, Oct 1, 2013 at 1:11 AM, mamta mamta.al...@gmail.com wrote:

  Hi,
 
  I want to know that if i have to fire some query through the Solr
  admin, do i need to create a new schema.xml? Where do i place it
  incase iahve to create a new one.
 
  Incase i can edit the original schema.xml can there be two fields
  named id in my schema.xml?
 
  I desperately need help in running queries on the Solr admin which is
  configured on a Tomcat server.
 
  What all preparation will i need to do? Schema.xml any docs?
 
  Any help will be highly appreciated.
 
  Thanks,
  Mamta
 
 
 
  --
  View this message in context:
  http://lucene.472066.n3.nabble.com/Newbie-to-Solr-tp4092876.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 

 The content of this email together with any attachments, statements and
 opinions expressed herein contains information that is private and
 confidential are intended for the named addressee(s) only. If you are not
 the addressee of this email you may not copy, forward, disclose or
 otherwise use it or any part of it in any form whatsoever. If you have
 received this message in error please notify postmas...@etisalat.ae by
 email immediately and delete the message without making any copies.



Re: Sorting dependent on user preferences with FunctionQuery

2013-10-01 Thread Snubbel
Hello,

thanks for your answers.
I checked your suggestions, but I'm not quite there yet.

With field collapsing, I only get the top result per category, which is not
what I want, I want to have all results!

And boosting is quite an interesting idea. With the following I get what I
need, all results but Books at the top: 
q=+*:* category:Book^2.2q.op=OR

Unfortunatly, our basic operator is AND. I'm not sure if our customers are
ok whith the results if we change that.  
How can I do it with AND?

Best regards, Nikola



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-dependent-on-user-preferences-with-FunctionQuery-tp4092119p4092912.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Newbie to Solr

2013-10-01 Thread Mamta Alshi
I can have only one schema.xml file right? Can i over-write the one which
originally comes with solr set-up?

the original schema.xml is @ C:\solr\solr\solr\conf along with post.sh et
all..where should my other document be?

i need to run post.jar on my doc file (xml) to index it right?

I could unfortunately not get any document which tells me how to run solr
queries through my tomcat..do you know of any link/books?

Thank you! Kishan.

Thanks,
Mamta


On Tue, Oct 1, 2013 at 3:30 PM, Kishan Parmar kishan@gmail.com wrote:

 you have to create only schema file
 dont change anything in solr config file,,

 and your xml file which you want to index from solr

 if you are new in solr then there is core named collection1
 you have to add thee schema file in that collection conf folder

 C:\solr\example\solr\collection1\conf

 your schema file is should be in c: solr - examples example docs folder
 in that folder post.jar and post.sh file there so that you can add yu



 Regards,

 Kishan Parmar
 Software Developer
 +91 95 100 77394
 Jay Shree Krishnaa !!



 On Tue, Oct 1, 2013 at 4:19 AM, Mamta S Kanade mkan...@etisalat.ae
 wrote:

  Can you tell me what all docs I need to create...there needs to be a
  schema.xml and what else? A document having my data?
 
  Also, where these should be placed. There's already a schema.xml
 
  Thanks for the prompt response.
 
  Mamta.
 
  -Original Message-
  From: Kishan Parmar [mailto:kishan@gmail.com]
  Sent: 01 October, 2013 03:16 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Newbie to Solr
 
  yes you have to create your own schema
  but in schema file you have to add your xml files field name in it like
  wise you can add your field name in it 
 
  or you can add  your filed in the default schema file
 
  whiithout schema you can not add your xml file to solr
 
  my schema is like this
 
 
 --
  ?xml version=1.0 encoding=UTF-8 ?
  schema name=example version=1.5
  fields
   field name=No type=string indexed=true stored=true
  required=true multiValued=false /
   field name=Name type=string indexed=true stored=true
  required=true multiValued=false /
   field name=Address type=string indexed=true stored=true
  required=true multiValued=false /
   field name=Mobile type=string indexed=true stored=true
  required=true multiValued=false /
  /fields
  uniqueKeyNo/uniqueKey
 
  types
 
fieldType name=string class=solr.StrField sortMissingLast=true
 /
fieldType name=int class=solr.TrieIntField precisionStep=0
  positionIncrementGap=0 /
  /types
  /schema
 
 
 -
 
  and my file is like this ,,.,.,.,.
 
 
 
 -
  add
  doc
  field name=No100120107088/field
  field name=Namekishan/field
  field name=Addressghatlodia/field
  field name=Mobile9510077394/field
  /doc
  /add
 
  Regards,
 
  Kishan Parmar
  Software Developer
  +91 95 100 77394
  Jay Shree Krishnaa !!
 
 
 
  On Tue, Oct 1, 2013 at 1:11 AM, mamta mamta.al...@gmail.com wrote:
 
   Hi,
  
   I want to know that if i have to fire some query through the Solr
   admin, do i need to create a new schema.xml? Where do i place it
   incase iahve to create a new one.
  
   Incase i can edit the original schema.xml can there be two fields
   named id in my schema.xml?
  
   I desperately need help in running queries on the Solr admin which is
   configured on a Tomcat server.
  
   What all preparation will i need to do? Schema.xml any docs?
  
   Any help will be highly appreciated.
  
   Thanks,
   Mamta
  
  
  
   --
   View this message in context:
   http://lucene.472066.n3.nabble.com/Newbie-to-Solr-tp4092876.html
   Sent from the Solr - User mailing list archive at Nabble.com.
  
 
  The content of this email together with any attachments, statements and
  opinions expressed herein contains information that is private and
  confidential are intended for the named addressee(s) only. If you are not
  the addressee of this email you may not copy, forward, disclose or
  otherwise use it or any part of it in any form whatsoever. If you have
  received this message in error please notify postmas...@etisalat.ae by
  email immediately and delete the message without making any copies.
 



SolrCloud. Scale-test by duplicating same index to the shards and make it behave each index is different (uniqueId).

2013-10-01 Thread Thomas Egense
Hello everyone,
I have a small challenge performance testing a SolrCloud setup. I have 10
shards, and each shard is supposed to have index-size ~200GB. However I
only have a single index of 200GB because it will take too long to build
another index with different data,  and I hope to somehow use this index on
all 10 shards and make it behave as all documents are different on each
shard. So building more indexes from new data is not an option.

Making a query to a SolrCloud is a two-phase operation. First all shards
receive the query and return ID's and ranking. The merger will then remove
duplicate ID's and then the full documents will be retreived.

When I copy this index to all shards and make a request the following will
happen: Phase one: All shards will receive the query and return ids+ranking
(actually same set from all shards). This part is realistic enough.
Phase two: ID's will be merged and retrieving the documents is not
realistic as if they were spread out between shards (IO wise).

Is there any way I can 'fake' this somehow and have shards return a
prefixed_id for phase1 etc., which then also have to be undone when
retriving the documents for phase2.  I have tried making the hack in
org.apache.solr.handler.component.QueryComponent and a few other classes,
but no success. (The resultset are always empty). I do not need to index
any new documents, which would also be a challenge due to the ID
hash-interval for the shards with this hack.

Anyone has a good idea how to make this hack work?

From,
Thomas Egense


solr cpu usage

2013-10-01 Thread adfel70
hi
We're building a spec for a machine to purchase.
We're going to buy 10 machines.
we aren't sure yet how many proccesses we will run per machine.
the question is  -should we buy faster cpu with less cores or slower cpu
with more cores?
in any case we will have 2 cpus in each machine.
should we buy 2.6Ghz cpu with 8 cores or 3.5Ghz cpu with 4 cores?

what will we gain by having many cores?

what kinds of usages would make cpu be the bottleneck?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-cpu-usage-tp4092938.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: {soft}Commit and cache flusing

2013-10-01 Thread Shawn Heisey
On 10/1/2013 2:48 AM, Dmitry Kan wrote:
 This is a minor thing, perhaps, but thought to ask / share:
 
 if there are no modifications to an index and a softCommit or hardCommit
 issued, then solr flushes the cache.

Any time you do a commit that opens a new Searcher object
(openSearcher=true, which is required if you want index changes to be
visible to people making queries), the caches are invalidated.  This is
because the layout of the index (and therefore the Lucene internal IDs)
can completely change with *any* commit/merge, and there is no easy and
reliable way to determine when the those numbers have NOT changed.

If you have warming queries configured, those happen on the new
searcher, populating the new cache.  If you have cache autoWarming
configured, then keys from the old caches are re-queried against the new
index and used to populate the new cache.

I do not understand deep Lucene internals, but what I've seen come
through Jira activity and commits over the last year or two has been a
strong move towards per-segment thinking instead of whole-index
thinking.  If this idea becomes applicable to all aspects of Lucene,
then perhaps Solr caches can also become per-segment, and will not need
to be completely invalidated except in the case of a major merge or
forceMerge.

Thanks,
Shawn



how to manually update a field in the index without re-crawling?

2013-10-01 Thread eShard
Good morning,
I'm currently using Solr 4.0 FINAL.
I indexed a website and it took over 24 hours to crawl.
I just realized I need to rename one of the fields (or add a new one). 
so I added the new field to the schema,
But how do I copy the data over from the old field to the new field without
recrawling everything?

Is this possible?

I was thinking about maybe putting an update chain processor in the /update
handler but I'm not sure that will work.

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-manually-update-a-field-in-the-index-without-re-crawling-tp4092955.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Doing time sensitive search in solr

2013-10-01 Thread Erick Erickson
Try it and see :).

Dynamic fields are just like regular fields once you index a document
that uses one. After that, they should behave just like regular.

If you're asking if you can create a query like *_txt:text meaning
search all the fields that end with _txt for the word text, I don't
think so. An alternative is to copy all the fields into a catch-all
field...

Best,
Erick

On Mon, Sep 30, 2013 at 3:41 PM, Darniz rnizamud...@edmunds.com wrote:
 Hello
 i just wanted to make sure can we query dynamic fields using wildcard well
 if not then i dont think this solution might work, since i dont know the
 exact concrete name of the field.





 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4092830.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.0 is stripping XML format from RSS content field

2013-10-01 Thread eShard
If anyone is interested, I managed to resolve this a long time ago.
I used a Data Import Handler instead and it worked beautifully.
DIH are very forgiving and it takes what ever XML data is there and injects
it into the Solr Index.
It's a lot faster than crawling too.
You use XPATH to map the fields to your schema.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-is-stripping-XML-format-from-RSS-content-field-tp4039809p4092961.html
Sent from the Solr - User mailing list archive at Nabble.com.


Auto Suggest - Time decay

2013-10-01 Thread SolrLover
I am trying to implement an auto suggest based on time decay function. I have
a separate index just to store auto suggest keywords.

I would be calculating the frequency over time rather than just calculating
just based on frequency alone. 

I am thinking of using a database to perform the calculation and update the
SOLR index with the boost calculated based on time decay function. I am not
sure if there is a better way to do this...

I need to boost the terms based on the frequency over time,

Ex: when someone searches for 'apple' 1 times during a iphone launch
(one particular day) shouldn't really make apple come up in the auto
suggestion always when someone types in the keyword 'a' rather it should
lose its popularity exponentially..

Anyone has any suggestions?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-Suggest-Time-decay-tp4092965.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Auto Suggest - Time decay

2013-10-01 Thread Ing. Jorge Luis Betancourt Gonzalez
Are you using the suggester component? or a separated core? I've used a 
separated core to store suggestions and order this suggestions (queries 
performed on the frontend) using a time decay function, and it works great for 
me.

Regards,

- Mensaje original -
De: SolrLover bbar...@gmail.com
Para: solr-user@lucene.apache.org
Enviados: Martes, 1 de Octubre 2013 12:12:13
Asunto: Auto Suggest - Time decay

I am trying to implement an auto suggest based on time decay function. I have
a separate index just to store auto suggest keywords.

I would be calculating the frequency over time rather than just calculating
just based on frequency alone. 

I am thinking of using a database to perform the calculation and update the
SOLR index with the boost calculated based on time decay function. I am not
sure if there is a better way to do this...

I need to boost the terms based on the frequency over time,

Ex: when someone searches for 'apple' 1 times during a iphone launch
(one particular day) shouldn't really make apple come up in the auto
suggestion always when someone types in the keyword 'a' rather it should
lose its popularity exponentially..

Anyone has any suggestions?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-Suggest-Time-decay-tp4092965.html
Sent from the Solr - User mailing list archive at Nabble.com.

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Percolate feature?

2013-10-01 Thread Charlie Hull

On 01/10/2013 04:12, Otis Gospodnetic wrote:

Just came across this ancient thread.  Charlie, did this end up
happening?  I suspect Wolfgang may be interested, but that's just a
wild guess.


Hi Otis  all,

Yes we're actually planning to talk about it at Lucene Revolution in 
November and open source it around then - it's called 'Luwak' and we're 
working on a live customer implementation based on it currently.


I was curious about your feeling that what you were open-sourcing
might be a lot faster and more flexible than ES's percolator - can you
share more about why do you have that feeling and whether you've
confirmed this?


Difficult to say at present - we've not done a direct comparative test 
yet and obviously we like our own implementation! It works very well for 
our clients' use case.


Cheers

Charlie



Thanks,
Otis
--
Solr  ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Mon, Aug 5, 2013 at 6:34 AM, Charlie Hull char...@flax.co.uk wrote:

On 03/08/2013 00:50, Mark wrote:


We have a set number of known terms we want to match against.

In Index:
term one
term two
term three

I know how to match all terms of a user query against the index but we
would like to know how/if we can match a user's query against all the terms
in the index?

Search Queries:
my search term = 0 matches
my term search one = 1 match  (term one)
some prefix term two = 1 match (term two)
one two three = 0 matches

I can only explain this is almost a reverse search???

I came across the following from ElasticSearch
(http://www.elasticsearch.org/guide/reference/api/percolate/) and it sounds
like this may accomplish the above but haven't tested. I was wondering if
Solr had something similar or an alternative way of accomplishing this?

Thanks



Hi Mark,

We've built something that implements this kind of reverse search for our
clients in the media monitoring sector - we're working on releasing the core
of this as open source very soon, hopefully in a month or two. It's based on
Lucene.

Just for reference it's able to apply tens of thousands of stored queries to
a document per second (our clients often have very large and complex Boolean
strings representing their clients' interests and may monitor hundreds of
thousands of news stories every day). It also records the positions of every
match. We suspect it's a lot faster and more flexible than Elasticsearch's
Percolate feature.

Cheers

Charlie

--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk



--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-10-01 Thread Andreas Owen
i'm already using URLDataSource

On 30. Sep 2013, at 5:41 PM, P Williams wrote:

 Hi Andreas,
 
 When using 
 XPathEntityProcessorhttp://wiki.apache.org/solr/DataImportHandler#XPathEntityProcessoryour
 DataSource
 must be of type DataSourceReader.  You shouldn't be using
 BinURLDataSource, it's giving you the cast exception.  Use
 URLDataSourcehttps://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-dataimporthandler/org/apache/solr/handler/dataimport/URLDataSource.html
 or
 FileDataSourcehttps://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-dataimporthandler/org/apache/solr/handler/dataimport/FileDataSource.htmlinstead.
 
 I don't think you need to specify namespaces, at least you didn't used to.
 The other thing that I've noticed is that the anywhere xpath expression //
 doesn't always work in DIH.  You might have to be more specific.
 
 Cheers,
 Tricia
 
 
 
 
 
 On Sun, Sep 29, 2013 at 9:47 AM, Andreas Owen a...@conx.ch wrote:
 
 how dum can you get. obviously quite dum... i would have to analyze the
 html-pages with a nested instance like this:
 
 entity name=rec processor=XPathEntityProcessor
 url=file:///C:\ColdFusion10\cfusion\solr\solr\tkbintranet\docImportUrl.xml
 forEach=/docs/doc dataSource=main
 
entity name=htm processor=XPathEntityProcessor
 url=${rec.urlParse} forEach=/xhtml:html dataSource=dataUrl
field column=text xpath=//content /
field column=h_2 xpath=//body /
field column=text_nohtml xpath=//text /
field column=h_1 xpath=//h:h1 /
/entity
 /entity
 
 but i'm pretty sure the foreach is wrong and the xpath expressions. in the
 moment i getting the following error:
 
Caused by: java.lang.RuntimeException:
 org.apache.solr.handler.dataimport.DataImportHandlerException:
 java.lang.ClassCastException:
 sun.net.www.protocol.http.HttpURLConnection$HttpInputStream cannot be cast
 to java.io.Reader
 
 
 
 
 
 On 28. Sep 2013, at 1:39 AM, Andreas Owen wrote:
 
 ok i see what your getting at but why doesn't the following work:
 
  field xpath=//h:h1 column=h_1 /
  field column=text xpath=/xhtml:html/xhtml:body /
 
 i removed the tiki-processor. what am i missing, i haven't found
 anything in the wiki?
 
 
 On 28. Sep 2013, at 12:28 AM, P Williams wrote:
 
 I spent some more time thinking about this.  Do you really need to use
 the
 TikaEntityProcessor?  It doesn't offer anything new to the document you
 are
 building that couldn't be accomplished by the XPathEntityProcessor alone
 from what I can tell.
 
 I also tried to get the Advanced
 Parsinghttp://wiki.apache.org/solr/TikaEntityProcessorexample to
 work without success.  There are some obvious typos (document
 instead of /document) and an odd order to the pieces (dataSources is
 enclosed by document).  It also looks like
 FieldStreamDataSource
 http://lucene.apache.org/solr/4_3_1/solr-dataimporthandler/org/apache/solr/handler/dataimport/FieldStreamDataSource.html
 is
 the one that is meant to work in this context. If Koji is still around
 maybe he could offer some help?  Otherwise this bit of erroneous
 instruction should probably be removed from the wiki.
 
 Cheers,
 Tricia
 
 $ svn diff
 Index:
 
 solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java
 ===
 ---
 
 solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java
   (revision 1526990)
 +++
 
 solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java
   (working copy)
 @@ -99,13 +99,13 @@
   runFullImport(getConfigHTML(identity));
   assertQ(req(*:*), testsHTMLIdentity);
 }
 -
 +
 private String getConfigHTML(String htmlMapper) {
   return
   dataConfig +
 dataSource type='BinFileDataSource'/ +
 document +
 -entity name='Tika' format='xml'
 processor='TikaEntityProcessor'  +
 +entity name='Tika' format='html'
 processor='TikaEntityProcessor'  +
  url=' +
 getFile(dihextras/structured.html).getAbsolutePath() + '  +
   ((htmlMapper == null) ?  : ( htmlMapper=' + htmlMapper +
 ')) +  +
 field column='text'/ +
 @@ -114,4 +114,36 @@
   /dataConfig;
 
 }
 +  private String[] testsHTMLH1 = {
 +  //*[@numFound='1']
 +  , //str[@name='h1'][contains(.,'H1 Header')]
 +  };
 +
 +  @Test
 +  public void testTikaHTMLMapperSubEntity() throws Exception {
 +runFullImport(getConfigSubEntity(identity));
 +assertQ(req(*:*), testsHTMLH1);
 +  }
 +
 +  private String getConfigSubEntity(String htmlMapper) {
 +return
 +dataConfig +
 +dataSource type='BinFileDataSource' name='bin'/ +
 +dataSource type='FieldStreamDataSource' name='fld'/ +
 +document +
 +entity 

Re: Auto Suggest - Time decay

2013-10-01 Thread SolrLover
I am using a totally separate core for storing the auto suggest keywords.

Would you be able to send me some more details on your implementation? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-Suggest-Time-decay-tp4092965p4092969.html
Sent from the Solr - User mailing list archive at Nabble.com.


Autosuggest - Custom sorting

2013-10-01 Thread SolrLover
Is there a way to sort the returned Autosuggest list based on a particular
value (ex: score)?

I am trying to sort the returned suggestions based on a field that has been
calculated manually but not sure how to use that field for sorting
suggestions.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Autosuggest-Custom-sorting-tp4092980.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to manually update a field in the index without re-crawling?

2013-10-01 Thread Shawn Heisey

On 10/1/2013 9:03 AM, eShard wrote:

I'm currently using Solr 4.0 FINAL.
I indexed a website and it took over 24 hours to crawl.
I just realized I need to rename one of the fields (or add a new one).
so I added the new field to the schema,
But how do I copy the data over from the old field to the new field without
recrawling everything?

Is this possible?

I was thinking about maybe putting an update chain processor in the /update
handler but I'm not sure that will work.


If you meet all the caveats and limitations, then you can use the atomic 
update functionality to add the new field and delete the old field.  For 
each document, you'll need the value of the uniqueKey and the value of 
the field that you want to essentially rename.


http://wiki.apache.org/solr/Atomic_Updates

If you have not configured your fields in the way described by the 
caveats and limitations section of that wiki page, then you will have 
to reindex.  There is no way around that requirement.


Final comment, unrelated to your question: 4.0 is ancient and buggy. 
You're going to need to upgrade before too long.


Thanks,
Shawn



Re: Auto Suggest - Time decay

2013-10-01 Thread Ing. Jorge Luis Betancourt Gonzalez
For that core just use a boost factor as explained on [1]:

You could use a query like this to see (before make any change) how your 
suggestions will be retrieved, in this case a query for goog has been made, 
and recent documents will be boosted (an extra bonus will be given for the 
newer documents).

http://localhost:8983/solr/select?q={!boost 
b=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)}goog

If this is enough for you you could poot the boost parameter in your request 
handler and make it even simpler so any query againsta this particular request 
handler will be automatically boosted by date.

PS: You could tweak the above formula used in the boost parameter for a more 
suitable to your needs.

- Mensaje original -
De: SolrLover bbar...@gmail.com
Para: solr-user@lucene.apache.org
Enviados: Martes, 1 de Octubre 2013 12:19:51
Asunto: Re: Auto Suggest - Time decay

I am using a totally separate core for storing the auto suggest keywords.

Would you be able to send me some more details on your implementation? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-Suggest-Time-decay-tp4092965p4092969.html
Sent from the Solr - User mailing list archive at Nabble.com.

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Auto Suggest - Time decay

2013-10-01 Thread Ing. Jorge Luis Betancourt Gonzalez
Sorry, I forgot the link:

[1] - http://wiki.apache.org/solr/SolrRelevancyFAQ

- Mensaje original -
De: Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu
Para: solr-user@lucene.apache.org
Enviados: Martes, 1 de Octubre 2013 13:34:03
Asunto: Re: Auto Suggest - Time decay

For that core just use a boost factor as explained on [1]:

You could use a query like this to see (before make any change) how your 
suggestions will be retrieved, in this case a query for goog has been made, 
and recent documents will be boosted (an extra bonus will be given for the 
newer documents).

http://localhost:8983/solr/select?q={!boost 
b=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)}goog

If this is enough for you you could poot the boost parameter in your request 
handler and make it even simpler so any query againsta this particular request 
handler will be automatically boosted by date.

PS: You could tweak the above formula used in the boost parameter for a more 
suitable to your needs.

- Mensaje original -
De: SolrLover bbar...@gmail.com
Para: solr-user@lucene.apache.org
Enviados: Martes, 1 de Octubre 2013 12:19:51
Asunto: Re: Auto Suggest - Time decay

I am using a totally separate core for storing the auto suggest keywords.

Would you be able to send me some more details on your implementation? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-Suggest-Time-decay-tp4092965p4092969.html
Sent from the Solr - User mailing list archive at Nabble.com.

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: {soft}Commit and cache flusing

2013-10-01 Thread Dmitry Kan
Thanks a lot Shawn for an exhaustive reply!

Regards,
Dmitry


On Tue, Oct 1, 2013 at 5:37 PM, Shawn Heisey s...@elyograg.org wrote:

 On 10/1/2013 2:48 AM, Dmitry Kan wrote:
  This is a minor thing, perhaps, but thought to ask / share:
 
  if there are no modifications to an index and a softCommit or hardCommit
  issued, then solr flushes the cache.

 Any time you do a commit that opens a new Searcher object
 (openSearcher=true, which is required if you want index changes to be
 visible to people making queries), the caches are invalidated.  This is
 because the layout of the index (and therefore the Lucene internal IDs)
 can completely change with *any* commit/merge, and there is no easy and
 reliable way to determine when the those numbers have NOT changed.

 If you have warming queries configured, those happen on the new
 searcher, populating the new cache.  If you have cache autoWarming
 configured, then keys from the old caches are re-queried against the new
 index and used to populate the new cache.

 I do not understand deep Lucene internals, but what I've seen come
 through Jira activity and commits over the last year or two has been a
 strong move towards per-segment thinking instead of whole-index
 thinking.  If this idea becomes applicable to all aspects of Lucene,
 then perhaps Solr caches can also become per-segment, and will not need
 to be completely invalidated except in the case of a major merge or
 forceMerge.

 Thanks,
 Shawn




Re: Doing time sensitive search in solr

2013-10-01 Thread Darniz
Thanks Eric 
When i did solr in 2010 i thought now they might have evolved and allow
doing query by providing wildcard in field name, but looks like i have to
provide a concrete dynamic field name to query.

Anyway will look in the catch all fields.

Do you have any examples on how a catch all fields will help with this, or
how my doc will look like and how can i query. 

darniz



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4092989.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sorting dependent on user preferences with FunctionQuery

2013-10-01 Thread Chris Hostetter
: select?q=*%3A*sort=query(qf=category v='Book')desc
: 
: but Solr returns Can't determine a Sort Order (asc or desc) in sort.

the root cause of that error is that you don't have any whitespace between 
your query function and desc

as for your broader goal: doing a straight sort on the users pref is 
probably not the best idea -- it's better to incorporate user prefrnces 
into boosting functions so you still retain the benefits of the relevancy 
score based on what the user searched for --- even if you know someone 
generally buys a lot of books, if they search for the beatles white 
album you probably don't want all the books that mention the white album, 
even just tangentially, to appear before the album itself.

I did a talk last year on boosting  biasing that introduces a lot of 
hte concepts to think about and the basics of how to appoach problems like 
this in solr...

https://people.apache.org/~hossman/ac2012eu/
http://vimeopro.com/user11514798/apache-lucene-eurocon-2012/video/55822630

-Hoss


Advice for using Solr 4.5 custom sharding to handle rolling time-oriented event data

2013-10-01 Thread Brett Hoerner
I'm interesting in using the new custom sharding features in the
collections API to search a rolling window of event data. I'd appreciate a
spot/sanity check of my plan/understanding.

Say I only care about the last 7 days of events and I have thousands per
second (billions per week).

Am I correct that I could create a new shard for each hour, and send events
that happen in those hour with the ID (uniqueKey) of
`new_event_hour!event_id` so that each hour block of events goes into one
shard?

I *always* query these events by the time in which they occurred, which is
another TrieInt field that I index with every document. So at query time I
would need to calculate the range the user cared about and send something
like _route_=hour1_route_=hour2 if I wanted to only query those two
shards. (I *can* set multiple _route_ arguments in one query, right? And
Solr will handle merging results like it would with any other cores?)

Some scheduled task would drop and delete shards after they were more than
7 days old.

Does all of that make sense? Do you see a smarter way to do large
time-oriented search in SolrCloud?

Thanks!


Re: Profiling Solr Lucene for query

2013-10-01 Thread Isaac Hebsh
Hi Dmitry,

I'm trying to examine your suggestion to create a frontend node. It sounds
pretty usefull.
I saw that every node in solr cluster can serve request for any collection,
even if it does not hold a core of that collection. because of that, I
thought that adding a new node to the cluster (aka, the frontend/gateway
server), and creating a dummy collection (with 1 dummy core), will solve
the problem.

But, I see that a request which sent to the gateway node, is not then sent
to the shards. Instead, the request is proxyed to a (random) core of the
requested collection, and from there it is sent to the shards. (It is
reasonable, because the SolrCore on the gateway might run with different
configuration, etc). This means that my new node isn't functioning as a
frontend (which responsible for sorting, etc.), but as a poor load
balancer. No performance improvement will come from this implementation.

So, how do you suggest to implement a frontend? On the one hand, it has to
run a core of the target collection, but on the other hand, we don't want
it to hold any shard contents.


On Fri, Sep 13, 2013 at 1:08 PM, Dmitry Kan solrexp...@gmail.com wrote:

 Manuel,

 Whether to have the front end solr as aggregator of shard results depends
 on your requirements. To repeat, we found merging from many shards very
 inefficient fo our use case. It can be the opposite for you (i.e. requires
 testing). There are some limitations with distributed search, see here:

 http://docs.lucidworks.com/display/solr/Distributed+Search+with+Index+Sharding


 On Wed, Sep 11, 2013 at 3:35 PM, Manuel Le Normand 
 manuel.lenorm...@gmail.com wrote:

  Dmitry - currently we don't have such a front end, this sounds like a
 good
  idea creating it. And yes, we do query all 36 shards every query.
 
  Mikhail - I do think 1 minute is enough data, as during this exact
 minute I
  had a single query running (that took a qtime of 1 minute). I wanted to
  isolate these hard queries. I repeated this profiling few times.
 
  I think I will take the termInterval from 128 to 32 and check the
 results.
  I'm currently using NRTCachingDirectoryFactory
 
 
 
 
  On Mon, Sep 9, 2013 at 11:29 PM, Dmitry Kan solrexp...@gmail.com
 wrote:
 
   Hi Manuel,
  
   The frontend solr instance is the one that does not have its own index
  and
   is doing merging of the results. Is this the case? If yes, are all 36
   shards always queried?
  
   Dmitry
  
  
   On Mon, Sep 9, 2013 at 10:11 PM, Manuel Le Normand 
   manuel.lenorm...@gmail.com wrote:
  
Hi Dmitry,
   
I have solr 4.3 and every query is distributed and merged back for
   ranking
purpose.
   
What do you mean by frontend solr?
   
   
On Mon, Sep 9, 2013 at 2:12 PM, Dmitry Kan solrexp...@gmail.com
  wrote:
   
 are you querying your shards via a frontend solr? We have noticed,
  that
 querying becomes much faster if results merging can be avoided.

 Dmitry


 On Sun, Sep 8, 2013 at 6:56 PM, Manuel Le Normand 
 manuel.lenorm...@gmail.com wrote:

  Hello all
  Looking on the 10% slowest queries, I get very bad performances
  (~60
sec
  per query).
  These queries have lots of conditions on my main field (more
 than a
  hundred), including phrase queries and rows=1000. I do return
 only
   id's
  though.
  I can quite firmly say that this bad performance is due to slow
   storage
  issue (that are beyond my control for now). Despite this I want
 to
 improve
  my performances.
 
  As tought in school, I started profiling these queries and the
 data
   of
~1
  minute profile is located here:
 
  http://picpaste.com/pics/IMG_20130908_132441-ZyrfXeTY.1378637843.jpg
 
  Main observation: most of the time I do wait for readVInt, who's
 stacktrace
  (2 out of 2 thread dumps) is:
 
  catalina-exec-3870 - Thread t@6615
   java.lang.Thread.State: RUNNABLE
   at
 org.apadhe.lucene.store.DataInput.readVInt(DataInput.java:108)
   at
 
 

   
  
 
 org.apaChe.lucene.codeosAockTreeIermsReade$FieldReader$SegmentTermsEnumFrame.loadBlock(BlockTreeTermsReader.java:
  2357)
   at
 
 

   
  
 
 ora.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.seekExact(BlockTreeTermsReader.java:1745)
   at
 org.apadhe.lucene.index.TermContext.build(TermContext.java:95)
   at
 
 

   
  
 
 org.apache.lucene.search.PhraseQuery$PhraseWeight.init(PhraseQuery.java:221)
   at

  org.apache.lucene.search.PhraseQuery.createWeight(PhraseQuery.java:326)
   at
 
 

   
  
 
 org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:183)
   at
 
   
  org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:384)
   at
 
 

   
  
 
 org.apache.lucene.searth.BooleanQuery$BooleanWeight.init(BooleanQuery.java:183)
   at
 
   
  

Re: Profiling Solr Lucene for query

2013-10-01 Thread Shawn Heisey

On 10/1/2013 2:35 PM, Isaac Hebsh wrote:

Hi Dmitry,

I'm trying to examine your suggestion to create a frontend node. It sounds
pretty usefull.
I saw that every node in solr cluster can serve request for any collection,
even if it does not hold a core of that collection. because of that, I
thought that adding a new node to the cluster (aka, the frontend/gateway
server), and creating a dummy collection (with 1 dummy core), will solve
the problem.

But, I see that a request which sent to the gateway node, is not then sent
to the shards. Instead, the request is proxyed to a (random) core of the
requested collection, and from there it is sent to the shards. (It is
reasonable, because the SolrCore on the gateway might run with different
configuration, etc). This means that my new node isn't functioning as a
frontend (which responsible for sorting, etc.), but as a poor load
balancer. No performance improvement will come from this implementation.

So, how do you suggest to implement a frontend? On the one hand, it has to
run a core of the target collection, but on the other hand, we don't want
it to hold any shard contents.


With SolrCloud, every node is a frontend node.  If you're running 
SolrCloud, then it doesn't make sense to try and use that concept.


It only makes sense to create a frontend node (or core) if you are using 
traditional distributed search, where you need to include a shards 
parameter.


http://wiki.apache.org/solr/DistributedSearch

Thanks,
Shawn



Accent insensitive multi-words suggester

2013-10-01 Thread Dominique Bejean

Hi,

Up to now, the best solution I found in order to implement a multi-words 
suggester was to use ShingleFilterFactory filter at index time and the 
termsComponent. At index time the analyzer was :


  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.ASCIIFoldingFilterFactory/
filter class=solr.ElisionFilterFactory ignoreCase=true 
articles=lang/contractions_fr.txt/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt /

filter class=solr.LowerCaseFilterFactory /
filter class=solr.ShingleFilterFactory maxShingleSize=4 
outputUnigrams=true/

  /analyzer


With ASCIIFoldingFilter filter, it works find if the user do not use 
accent in query terms and all suggestions are without accents.
Without ASCIIFoldingFilter filter, it works find if the user do not 
forget accent in query terms and all suggestions are with accents.


Note : I use the StopFilter to avoid suggestions including stop words 
and particularly starting or ending with stop words.



What I need is a suggester where the user can use or not use the accent 
in query terms and the suggestions are returned with accent.


For example, if the user type éco or eco, the suggester should return :

école
école primaire
école publique
école privée
école primaire privée


I think it is impossible to achieve this with the termComponents and I 
should use the SpellCheckComponent instead. However, I don't see how to 
make the suggester accent insensitive and return the suggestions with 
accents.


Did somebody already achieved that ?

Thank you.

Dominique


Re: Profiling Solr Lucene for query

2013-10-01 Thread Isaac Hebsh
Hi Shawn,
I know that every node operates as a frontend. This is the way our cluster
currently run.

If I seperate the frontend from the nodes which hold the shards, I can let
him different amount of CPUs as RAM. (e.g. large amount of RAM to JVM,
because this server won't need the OS cache for reading the index, or more
CPUs because the merging process might be more CPU intensive).

Isn't it possible?


On Wed, Oct 2, 2013 at 12:42 AM, Shawn Heisey s...@elyograg.org wrote:

 On 10/1/2013 2:35 PM, Isaac Hebsh wrote:

 Hi Dmitry,

 I'm trying to examine your suggestion to create a frontend node. It sounds
 pretty usefull.
 I saw that every node in solr cluster can serve request for any
 collection,
 even if it does not hold a core of that collection. because of that, I
 thought that adding a new node to the cluster (aka, the frontend/gateway
 server), and creating a dummy collection (with 1 dummy core), will solve
 the problem.

 But, I see that a request which sent to the gateway node, is not then sent
 to the shards. Instead, the request is proxyed to a (random) core of the
 requested collection, and from there it is sent to the shards. (It is
 reasonable, because the SolrCore on the gateway might run with different
 configuration, etc). This means that my new node isn't functioning as a
 frontend (which responsible for sorting, etc.), but as a poor load
 balancer. No performance improvement will come from this implementation.

 So, how do you suggest to implement a frontend? On the one hand, it has to
 run a core of the target collection, but on the other hand, we don't want
 it to hold any shard contents.


 With SolrCloud, every node is a frontend node.  If you're running
 SolrCloud, then it doesn't make sense to try and use that concept.

 It only makes sense to create a frontend node (or core) if you are using
 traditional distributed search, where you need to include a shards
 parameter.

 http://wiki.apache.org/solr/**DistributedSearchhttp://wiki.apache.org/solr/DistributedSearch

 Thanks,
 Shawn




Re: Profiling Solr Lucene for query

2013-10-01 Thread Shawn Heisey

On 10/1/2013 4:04 PM, Isaac Hebsh wrote:

Hi Shawn,
I know that every node operates as a frontend. This is the way our cluster
currently run.

If I seperate the frontend from the nodes which hold the shards, I can let
him different amount of CPUs as RAM. (e.g. large amount of RAM to JVM,
because this server won't need the OS cache for reading the index, or more
CPUs because the merging process might be more CPU intensive).

Isn't it possible?


Not with SolrCloud.  If you manage all your shards and replicas yourself 
and use manual distributed search, then you can do what you're trying 
to do.  You lose a *LOT* of automation that SolrCloud handles for you if 
you follow this route, though.


I can't find an existing feature request issue for doing this with 
SolrCloud.  It's a good idea, just not possible currently.


Thanks,
Shawn



Re: Newbie to Solr

2013-10-01 Thread Alexandre Rafalovitch
Mamta,

You are trying to do multiple things at once. Slow down before you drown.

Use the default Solr distribution. That runs embedded server. Do not switch
to Tomcat. Do it on your personal machine if you need to (it's just unzip
and run).

Then, go through Solr tutorial. That will answer some of the questions you
are trying to ask here.

Then, if you are still confused, maybe read one of many books on Solr. I
wrote one specifically for people with problems that sound exactly like
yours (starting from basics and doing a learning journey).
http://www.packtpub.com/apache-solr-for-indexing-data/book . But there are
many others.

Then, once you understand what those files and handlers and things are, use
Tomcat if you have to. There is an extra issue with latest Solr and Tomcat
due to logging jars requirements, so make sure to consult WIKI on that and
not just random old internet page.

This will not take long. You just need to stop randomly poking in all
possibly directions and do it systematically.

Good luck,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Oct 1, 2013 at 6:50 PM, Mamta Alshi mamta.al...@gmail.com wrote:

 I can have only one schema.xml file right? Can i over-write the one which
 originally comes with solr set-up?

 the original schema.xml is @ C:\solr\solr\solr\conf along with post.sh et
 all..where should my other document be?

 i need to run post.jar on my doc file (xml) to index it right?

 I could unfortunately not get any document which tells me how to run solr
 queries through my tomcat..do you know of any link/books?

 Thank you! Kishan.

 Thanks,
 Mamta


 On Tue, Oct 1, 2013 at 3:30 PM, Kishan Parmar kishan@gmail.com
 wrote:

  you have to create only schema file
  dont change anything in solr config file,,
 
  and your xml file which you want to index from solr
 
  if you are new in solr then there is core named collection1
  you have to add thee schema file in that collection conf folder
 
  C:\solr\example\solr\collection1\conf
 
  your schema file is should be in c: solr - examples example docs folder
  in that folder post.jar and post.sh file there so that you can add yu
 
 
 
  Regards,
 
  Kishan Parmar
  Software Developer
  +91 95 100 77394
  Jay Shree Krishnaa !!
 
 
 
  On Tue, Oct 1, 2013 at 4:19 AM, Mamta S Kanade mkan...@etisalat.ae
  wrote:
 
   Can you tell me what all docs I need to create...there needs to be a
   schema.xml and what else? A document having my data?
  
   Also, where these should be placed. There's already a schema.xml
  
   Thanks for the prompt response.
  
   Mamta.
  
   -Original Message-
   From: Kishan Parmar [mailto:kishan@gmail.com]
   Sent: 01 October, 2013 03:16 PM
   To: solr-user@lucene.apache.org
   Subject: Re: Newbie to Solr
  
   yes you have to create your own schema
   but in schema file you have to add your xml files field name in it like
   wise you can add your field name in it 
  
   or you can add  your filed in the default schema file
  
   whiithout schema you can not add your xml file to solr
  
   my schema is like this
  
  
 
 --
   ?xml version=1.0 encoding=UTF-8 ?
   schema name=example version=1.5
   fields
field name=No type=string indexed=true stored=true
   required=true multiValued=false /
field name=Name type=string indexed=true stored=true
   required=true multiValued=false /
field name=Address type=string indexed=true stored=true
   required=true multiValued=false /
field name=Mobile type=string indexed=true stored=true
   required=true multiValued=false /
   /fields
   uniqueKeyNo/uniqueKey
  
   types
  
 fieldType name=string class=solr.StrField sortMissingLast=true
  /
 fieldType name=int class=solr.TrieIntField precisionStep=0
   positionIncrementGap=0 /
   /types
   /schema
  
  
 
 -
  
   and my file is like this ,,.,.,.,.
  
  
  
 
 -
   add
   doc
   field name=No100120107088/field
   field name=Namekishan/field
   field name=Addressghatlodia/field
   field name=Mobile9510077394/field
   /doc
   /add
  
   Regards,
  
   Kishan Parmar
   Software Developer
   +91 95 100 77394
   Jay Shree Krishnaa !!
  
  
  
   On Tue, Oct 1, 2013 at 1:11 AM, mamta mamta.al...@gmail.com wrote:
  
Hi,
   
I want to know that if i 

Problems with maxShardsPerNode in 4.5

2013-10-01 Thread Brett Hoerner
It seems that changes in 4.5 collection configuration now require users to
set a maxShardsPerNode (or it defaults to 1).

Maybe this was the case before, but with the new CREATESHARD API it seems a
very restrictive. I've just created a very simple test collection on 3
machines where I set maxShardsPerNode at collection creation time to 1, and
I made 3 shards. Everything is good.

Now I want a 4th shard, it seems impossible to create because the cluster
knows I should only have 1 shard per node. Yet my problem doesn't require
more hardware, I just my new shard to exist on one of the existing servers.

So I try again -- I create a collection with 3 shards and set
maxShardsPerNode to 1000 (just as a silly test). Everything is good.

Now I add shard4 and it immediately tries to add 1000 replicas of shard4...

You can see my earlier email today about time-oriented data in 4.5 to see
what I'm trying to do. I was hoping to have 1 shard per hour/day with the
ability to easily add/drop them as I move the time window (say, a week of
data, 1 per day).

Am I missing something?

Thanks!


Re: Problems with maxShardsPerNode in 4.5

2013-10-01 Thread Brett Hoerner
Related, 1 more try:

Created collection starting with 4 shards on 1 box. Had to set
maxShardsPerNode to 4 to do this.

Now I want to roll over my time window, so to attempt to deal with the
problems noted above I delete the oldest shard first. That works fine.

Now I try to add my new shard, which works, but again it defaults to
maxShardsPerNode # of replicas, so I'm left with:

* [deleted by me] hour0
* hour1 - 1 replica
* hour2 - 1 replica
* hour3 - 1 replica
* hour4 - 4 replicas [  the one I created after deleting hour0]

Still at a loss as to how I would create 1 new shard with 1 replica on any
server in 4.5?

Thanks!


On Tue, Oct 1, 2013 at 8:14 PM, Brett Hoerner br...@bretthoerner.comwrote:

 It seems that changes in 4.5 collection configuration now require users to
 set a maxShardsPerNode (or it defaults to 1).

 Maybe this was the case before, but with the new CREATESHARD API it seems
 a very restrictive. I've just created a very simple test collection on 3
 machines where I set maxShardsPerNode at collection creation time to 1, and
 I made 3 shards. Everything is good.

 Now I want a 4th shard, it seems impossible to create because the cluster
 knows I should only have 1 shard per node. Yet my problem doesn't require
 more hardware, I just my new shard to exist on one of the existing servers.

 So I try again -- I create a collection with 3 shards and set
 maxShardsPerNode to 1000 (just as a silly test). Everything is good.

 Now I add shard4 and it immediately tries to add 1000 replicas of shard4...

 You can see my earlier email today about time-oriented data in 4.5 to see
 what I'm trying to do. I was hoping to have 1 shard per hour/day with the
 ability to easily add/drop them as I move the time window (say, a week of
 data, 1 per day).

 Am I missing something?

 Thanks!



Re: Problems with maxShardsPerNode in 4.5

2013-10-01 Thread Shalin Shekhar Mangar
Thanks for reporting this Brett. This is indeed a bug. A workaround is to
specify replicationFactor=1 with the createShard command which will create
only one replica even if maxShardsPerNode=1000 at collection level.

I'll open an issue.


On Wed, Oct 2, 2013 at 7:25 AM, Brett Hoerner br...@bretthoerner.comwrote:

 Related, 1 more try:

 Created collection starting with 4 shards on 1 box. Had to set
 maxShardsPerNode to 4 to do this.

 Now I want to roll over my time window, so to attempt to deal with the
 problems noted above I delete the oldest shard first. That works fine.

 Now I try to add my new shard, which works, but again it defaults to
 maxShardsPerNode # of replicas, so I'm left with:

 * [deleted by me] hour0
 * hour1 - 1 replica
 * hour2 - 1 replica
 * hour3 - 1 replica
 * hour4 - 4 replicas [  the one I created after deleting hour0]

 Still at a loss as to how I would create 1 new shard with 1 replica on any
 server in 4.5?

 Thanks!


 On Tue, Oct 1, 2013 at 8:14 PM, Brett Hoerner br...@bretthoerner.com
 wrote:

  It seems that changes in 4.5 collection configuration now require users
 to
  set a maxShardsPerNode (or it defaults to 1).
 
  Maybe this was the case before, but with the new CREATESHARD API it seems
  a very restrictive. I've just created a very simple test collection on 3
  machines where I set maxShardsPerNode at collection creation time to 1,
 and
  I made 3 shards. Everything is good.
 
  Now I want a 4th shard, it seems impossible to create because the cluster
  knows I should only have 1 shard per node. Yet my problem doesn't
 require
  more hardware, I just my new shard to exist on one of the existing
 servers.
 
  So I try again -- I create a collection with 3 shards and set
  maxShardsPerNode to 1000 (just as a silly test). Everything is good.
 
  Now I add shard4 and it immediately tries to add 1000 replicas of
 shard4...
 
  You can see my earlier email today about time-oriented data in 4.5 to see
  what I'm trying to do. I was hoping to have 1 shard per hour/day with the
  ability to easily add/drop them as I move the time window (say, a week of
  data, 1 per day).
 
  Am I missing something?
 
  Thanks!
 




-- 
Regards,
Shalin Shekhar Mangar.