Re: Does sorting skip everything having to do with relevancy?

2014-04-04 Thread Alvaro Cabrerizo
Hi,

By default solr is using the sort parameter over the score field. So if
you overwrite it using other sort field, yes solr will use the parameter
you've provided. Remember, you can use multiple fields for
sortinghttp://wiki.apache.org/solr/CommonQueryParameters#sort so
you can make something like: sort score desc, your_field1 asc, your_field2
desc

The score of documents is calculated on every query (it does not depend on
the sort parameter or the debugQueryParameter) and the debubQuery is only a
mechanism for showing (or hidding) how score was calculated. If you want to
see a document score for a particular query (apart from the debugQuery) you
can ask for it in the solr response adding the parameter *fl=*,score* to
your request.

Regards.




On Fri, Apr 4, 2014 at 4:42 AM, Shawn Heisey s...@elyograg.org wrote:

 If I provide a sort parameter, will Solr (4.6.1) skip score/boost
 processing?

 In particular I would like to know what happens if I have a boost
 parameter (with a complex function) for edismax search, but I include a
 sort parameter on one of my fields.  I am using distributed search.

 I do know that if debugQuery is used, the score IS calculated, but I'm
 talking about when debugQuery is not used.

 Thanks,
 Shawn




Re: Boosing Basic

2014-04-04 Thread Alvaro Cabrerizo
Hi,

If I were you, I would start reading the edismax
documentationhttps://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser.
Apart from the wiki, you can find in every distribution a full example with
the configuration of the edismax query parser (check the xml node
requestHandler name=/browse in the next file:
$YOUR_SOLR_DISTRIBUTION_DIRECTORY/solr/example/solr/collection1/conf/solrconfig.xml).

Regards.


On Thu, Apr 3, 2014 at 6:55 PM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote:

 Hello,

 I am trying to implement boosting but I am not able to find a good
 example, Some places asking to add ^10 to boost score in some places it
 says use bf . I have query with condition (Name OR Description OR
 ProductType) but I like to show the results first Name and need to boost
 the condition.

 Thanks

 Ravi



Re: Solr join and lucene scoring

2014-04-04 Thread Alvaro Cabrerizo
Hi,

The defect you are referencing is closed with a resolution of *Invalid*, so
it seems the scoring is working fine with the join.  I've made the next two
tests on my own data and seems it is working:

*TestA*

   - fl=id,score
   - q=notebook
   - fq={!join from=product_list to=id fromIndex=product}id:*
   - rows=2

Gives me the next result with the score calculated:
doc
str name=id4ADCBA5F-B532-4154-8E12-47311DC0FD50/str
float name=score*2.6598556*/float
/doc
doc
str name=idC861CC4A-6481-4754-946F-EA3903371C80/str
float name=score*2.6598551*/float
/doc
/result

*TESTB*

   - fl=id,score
   - q=notebook AND _query_:{!join from=product_list to=id
   fromIndex=product}id:*
   - rows=2

Gives me the next result with the score calcualted:

doc
str name=id5C449525-8A69-409B-829C-671E147BF6BB/str
float name=score*0.1573925*/float
/doc
doc
str name=idD1A719E8-F843-4E8D-AD82-64AA88D78BBB/str
float name=score*0.1571764*/float
/doc

 Regards.


On Thu, Apr 3, 2014 at 11:42 AM, m...@preselect-media.com wrote:

 Hello,

 referencing to this issue:
 https://issues.apache.org/jira/browse/SOLR-4307

 Is it still not possible with the solr query time join to use scoring?
 Do I still have to write my own plugin or is there a plugin somewhere I
 could use?

 I never wrote a plugin for solr before, so I would prefer if I don't have
 to start from scratch.

 THX,
 Moritz




How to reduce the search speed of solrcloud

2014-04-04 Thread Sathya
Hi All,

Hi All, I am new to Solr. And i dont know how to increase the search speed
of solrcloud. I have indexed nearly 4 GB of data. When i am searching a
document using java with solrj, solr takes more 6 seconds to return a query
result. Any one please help me to reduce the search query time to less than
500 ms. i have allocate the 4 GB ram for solr. Please let me know for
further details about solrcloud config.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to reduce the search speed of solrcloud

2014-04-04 Thread Alexandre Rafalovitch
Show a sample query string that does that (takes 6 seconds to return).
Including all defaults you may have put in solrconfig.xml (if any).
That might give us a hint which features you are using and what
possible direction you could go in next. For the bonus points, enable
debug flag and rows=1 parameter to see how big your documents
themselves are.

You may have issues with a particular non-cloud-friendly feature, with
caches, with not reusing parts of your queries as 'fq', returning too
many fields or a bunch of other things.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, Apr 4, 2014 at 2:31 PM, Sathya sathia.blacks...@gmail.com wrote:
 Hi All,

 Hi All, I am new to Solr. And i dont know how to increase the search speed
 of solrcloud. I have indexed nearly 4 GB of data. When i am searching a
 document using java with solrj, solr takes more 6 seconds to return a query
 result. Any one please help me to reduce the search query time to less than
 500 ms. i have allocate the 4 GB ram for solr. Please let me know for
 further details about solrcloud config.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to reduce the search speed of solrcloud

2014-04-04 Thread Sathya
Hi Alex,

str name=id33026985/str str name=subjectComponent Audio\:A
Shopping List/str str name=download_date2012-01-11
09:02:42.96/str

This is what i am  indexed in solr. I have only 3 fields in index. And
i am just indexing id, subject and date of the news articles. Nearly 5
crore documents. Also i have attached my solrconfig and solr.xml file.
If u need more information, pls let me know.

On Fri, Apr 4, 2014 at 1:15 PM, Alexandre Rafalovitch [via Lucene]
ml-node+s472066n4129068...@n3.nabble.com wrote:

 Show a sample query string that does that (takes 6 seconds to return).
 Including all defaults you may have put in solrconfig.xml (if any).
 That might give us a hint which features you are using and what
 possible direction you could go in next. For the bonus points, enable
 debug flag and rows=1 parameter to see how big your documents
 themselves are.

 You may have issues with a particular non-cloud-friendly feature, with
 caches, with not reusing parts of your queries as 'fq', returning too
 many fields or a bunch of other things.

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr 
 proficiency


 On Fri, Apr 4, 2014 at 2:31 PM, Sathya [hidden email] wrote:

  Hi All,
 
  Hi All, I am new to Solr. And i dont know how to increase the search speed
  of solrcloud. I have indexed nearly 4 GB of data. When i am searching a
  document using java with solrj, solr takes more 6 seconds to return a query
  result. Any one please help me to reduce the search query time to less than
  500 ms. i have allocate the 4 GB ram for solr. Please let me know for
  further details about solrcloud config.
 
 
 
  --
  View this message in context: 
  http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067.html
  Sent from the Solr - User mailing list archive at Nabble.com.


 
 If you reply to this email, your message will be added to the discussion 
 below:
 http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129068.html
 To unsubscribe from How to reduce the search speed of solrcloud, click here.
 NAML


solrconfig.xml (101K) 
http://lucene.472066.n3.nabble.com/attachment/4129073/0/solrconfig.xml
solr.xml (1K) http://lucene.472066.n3.nabble.com/attachment/4129073/1/solr.xml




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129073.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to reduce the search speed of solrcloud

2014-04-04 Thread Alexandre Rafalovitch
What does your Solr query looks like (check the Solr backend log if
you don't know)?

And how many document is that? 50 million? Does not sound like much
for 3 fields. And what's the definitions (schema.xml rather than
solr.xml).

And what happens if you issue the query directly to Solr rather than
through the client? Is the speed much different?

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, Apr 4, 2014 at 3:12 PM, Sathya sathia.blacks...@gmail.com wrote:
 Hi Alex,

 str name=id33026985/str str name=subjectComponent Audio\:A
 Shopping List/str str name=download_date2012-01-11
 09:02:42.96/str

 This is what i am  indexed in solr. I have only 3 fields in index. And
 i am just indexing id, subject and date of the news articles. Nearly 5
 crore documents. Also i have attached my solrconfig and solr.xml file.
 If u need more information, pls let me know.

 On Fri, Apr 4, 2014 at 1:15 PM, Alexandre Rafalovitch [via Lucene]
 ml-node+s472066n4129068...@n3.nabble.com wrote:

 Show a sample query string that does that (takes 6 seconds to return).
 Including all defaults you may have put in solrconfig.xml (if any).
 That might give us a hint which features you are using and what
 possible direction you could go in next. For the bonus points, enable
 debug flag and rows=1 parameter to see how big your documents
 themselves are.

 You may have issues with a particular non-cloud-friendly feature, with
 caches, with not reusing parts of your queries as 'fq', returning too
 many fields or a bunch of other things.

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr 
 proficiency


 On Fri, Apr 4, 2014 at 2:31 PM, Sathya [hidden email] wrote:

  Hi All,
 
  Hi All, I am new to Solr. And i dont know how to increase the search speed
  of solrcloud. I have indexed nearly 4 GB of data. When i am searching a
  document using java with solrj, solr takes more 6 seconds to return a query
  result. Any one please help me to reduce the search query time to less than
  500 ms. i have allocate the 4 GB ram for solr. Please let me know for
  further details about solrcloud config.
 
 
 
  --
  View this message in context: 
  http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067.html
  Sent from the Solr - User mailing list archive at Nabble.com.


 
 If you reply to this email, your message will be added to the discussion 
 below:
 http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129068.html
 To unsubscribe from How to reduce the search speed of solrcloud, click here.
 NAML


 solrconfig.xml (101K) 
 http://lucene.472066.n3.nabble.com/attachment/4129073/0/solrconfig.xml
 solr.xml (1K) 
 http://lucene.472066.n3.nabble.com/attachment/4129073/1/solr.xml




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129073.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to reduce the search speed of solrcloud

2014-04-04 Thread Sathya
Hi,

I have attached my schema.xml file too.

And you are right. I have 50 million documents. When i use solr
browser to search a document, it will return within 1000 to 2000 ms.

My query looks like this:
http://10.10.1.14:5050/solr/set_recent_shard1_replica5/select?q=subjectindent=true

On 4/4/14, Alexandre Rafalovitch [via Lucene]
ml-node+s472066n4129074...@n3.nabble.com wrote:


 What does your Solr query looks like (check the Solr backend log if
 you don't know)?

 And how many document is that? 50 million? Does not sound like much
 for 3 fields. And what's the definitions (schema.xml rather than
 solr.xml).

 And what happens if you issue the query directly to Solr rather than
 through the client? Is the speed much different?

 Regards,
Alex.

 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Fri, Apr 4, 2014 at 3:12 PM, Sathya sathia.blacks...@gmail.com wrote:
 Hi Alex,

 str name=id33026985/str str name=subjectComponent Audio\:A
 Shopping List/str str name=download_date2012-01-11
 09:02:42.96/str

 This is what i am  indexed in solr. I have only 3 fields in index. And
 i am just indexing id, subject and date of the news articles. Nearly 5
 crore documents. Also i have attached my solrconfig and solr.xml file.
 If u need more information, pls let me know.

 On Fri, Apr 4, 2014 at 1:15 PM, Alexandre Rafalovitch [via Lucene]
 ml-node+s472066n4129068...@n3.nabble.com wrote:

 Show a sample query string that does that (takes 6 seconds to return).
 Including all defaults you may have put in solrconfig.xml (if any).
 That might give us a hint which features you are using and what
 possible direction you could go in next. For the bonus points, enable
 debug flag and rows=1 parameter to see how big your documents
 themselves are.

 You may have issues with a particular non-cloud-friendly feature, with
 caches, with not reusing parts of your queries as 'fq', returning too
 many fields or a bunch of other things.

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Fri, Apr 4, 2014 at 2:31 PM, Sathya [hidden email] wrote:

  Hi All,
 
  Hi All, I am new to Solr. And i dont know how to increase the search
  speed
  of solrcloud. I have indexed nearly 4 GB of data. When i am searching
  a
  document using java with solrj, solr takes more 6 seconds to return a
  query
  result. Any one please help me to reduce the search query time to less
  than
  500 ms. i have allocate the 4 GB ram for solr. Please let me know for
  further details about solrcloud config.
 
 
 
  --
  View this message in context:
  http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067.html
  Sent from the Solr - User mailing list archive at Nabble.com.


 
 If you reply to this email, your message will be added to the discussion
 below:
 http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129068.html
 To unsubscribe from How to reduce the search speed of solrcloud, click
 here.
 NAML


 solrconfig.xml (101K)
 http://lucene.472066.n3.nabble.com/attachment/4129073/0/solrconfig.xml
 solr.xml (1K)
 http://lucene.472066.n3.nabble.com/attachment/4129073/1/solr.xml




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129073.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 ___
 If you reply to this email, your message will be added to the discussion
 below:
 http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129074.html

 To unsubscribe from How to reduce the search speed of solrcloud, visit
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4129067code=c2F0aGlhLmJsYWNrc3RhckBnbWFpbC5jb218NDEyOTA2N3wtMjEyNDcwMTI5OA==


schema.xml (81K) 
http://lucene.472066.n3.nabble.com/attachment/4129075/0/schema.xml




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129075.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to reduce the search speed of solrcloud

2014-04-04 Thread Alexandre Rafalovitch
Well, if the direct browser query is 1000ms and your client query is
6seconds, then it is not Solr itself you need to worry about first.
Something must be wrong at the client. Trying timing that bit. Maybe
it is writing from the client to your ultimate consumer that's the
problem.

Regards,
   Alex.
P.s. You should probably trim your schema to get rid of all the
example fields. Keep _version_ and _root_ but delete all the rest you
don't actually use. Same with dynamic fields and all fieldType
definitions you do not actually use. You can always reintroduce them
later from the example schemas if something is missing.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, Apr 4, 2014 at 3:41 PM, Sathya sathia.blacks...@gmail.com wrote:
 Hi,

 I have attached my schema.xml file too.

 And you are right. I have 50 million documents. When i use solr
 browser to search a document, it will return within 1000 to 2000 ms.

 My query looks like this:
 http://10.10.1.14:5050/solr/set_recent_shard1_replica5/select?q=subjectindent=true

 On 4/4/14, Alexandre Rafalovitch [via Lucene]
 ml-node+s472066n4129074...@n3.nabble.com wrote:


 What does your Solr query looks like (check the Solr backend log if
 you don't know)?

 And how many document is that? 50 million? Does not sound like much
 for 3 fields. And what's the definitions (schema.xml rather than
 solr.xml).

 And what happens if you issue the query directly to Solr rather than
 through the client? Is the speed much different?

 Regards,
Alex.

 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Fri, Apr 4, 2014 at 3:12 PM, Sathya sathia.blacks...@gmail.com wrote:
 Hi Alex,

 str name=id33026985/str str name=subjectComponent Audio\:A
 Shopping List/str str name=download_date2012-01-11
 09:02:42.96/str

 This is what i am  indexed in solr. I have only 3 fields in index. And
 i am just indexing id, subject and date of the news articles. Nearly 5
 crore documents. Also i have attached my solrconfig and solr.xml file.
 If u need more information, pls let me know.

 On Fri, Apr 4, 2014 at 1:15 PM, Alexandre Rafalovitch [via Lucene]
 ml-node+s472066n4129068...@n3.nabble.com wrote:

 Show a sample query string that does that (takes 6 seconds to return).
 Including all defaults you may have put in solrconfig.xml (if any).
 That might give us a hint which features you are using and what
 possible direction you could go in next. For the bonus points, enable
 debug flag and rows=1 parameter to see how big your documents
 themselves are.

 You may have issues with a particular non-cloud-friendly feature, with
 caches, with not reusing parts of your queries as 'fq', returning too
 many fields or a bunch of other things.

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Fri, Apr 4, 2014 at 2:31 PM, Sathya [hidden email] wrote:

  Hi All,
 
  Hi All, I am new to Solr. And i dont know how to increase the search
  speed
  of solrcloud. I have indexed nearly 4 GB of data. When i am searching
  a
  document using java with solrj, solr takes more 6 seconds to return a
  query
  result. Any one please help me to reduce the search query time to less
  than
  500 ms. i have allocate the 4 GB ram for solr. Please let me know for
  further details about solrcloud config.
 
 
 
  --
  View this message in context:
  http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067.html
  Sent from the Solr - User mailing list archive at Nabble.com.


 
 If you reply to this email, your message will be added to the discussion
 below:
 http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129068.html
 To unsubscribe from How to reduce the search speed of solrcloud, click
 here.
 NAML


 solrconfig.xml (101K)
 http://lucene.472066.n3.nabble.com/attachment/4129073/0/solrconfig.xml
 solr.xml (1K)
 http://lucene.472066.n3.nabble.com/attachment/4129073/1/solr.xml




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129073.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 ___
 If you reply to this email, your message will be added to the discussion
 below:
 http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129074.html

 To unsubscribe from How to reduce the search speed of solrcloud, visit
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4129067code=c2F0aGlhLmJsYWNrc3RhckBnbWFpbC5jb218NDEyOTA2N3wtMjEyNDcwMTI5OA==


 schema.xml (81K) 
 http://lucene.472066.n3.nabble.com/attachment/4129075/0/schema.xml




 --

Query and field name with wildcard

2014-04-04 Thread Croci Francesco Luigi (ID SWS)
In my index I have some fields which have the same prefix(rmDocumentTitle, 
rmDocumentClass, rmDocumentSubclass, rmDocumentArt). Apparently it is not 
possible to specify a query like this:

q = rm* : some_word

Is there a way to do this without having to write a long list of ORs?

Another question is if it is really not possible to search a word over the 
entire index. Something like this: q = * : some_word

Thank you
Francesco


Re: Query and field name with wildcard

2014-04-04 Thread Alexandre Rafalovitch
Are you using eDisMax. That gives a lot of options, including field
aliasing, including a single name to multiple fields:
http://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2F_renaming
(with example on p77 of my book
http://www.packtpub.com/apache-solr-for-indexing-data/book :-)

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, Apr 4, 2014 at 3:52 PM, Croci  Francesco Luigi (ID SWS)
fcr...@id.ethz.ch wrote:
 In my index I have some fields which have the same prefix(rmDocumentTitle, 
 rmDocumentClass, rmDocumentSubclass, rmDocumentArt). Apparently it is not 
 possible to specify a query like this:

 q = rm* : some_word

 Is there a way to do this without having to write a long list of ORs?

 Another question is if it is really not possible to search a word over the 
 entire index. Something like this: q = * : some_word

 Thank you
 Francesco


Re: How to reduce the search speed of solrcloud

2014-04-04 Thread Sathya
Hi,

Sorry, i cant get u alex. Can you please explain me(if you can).  Because
now only i entered into solr.


On Fri, Apr 4, 2014 at 2:20 PM, Alexandre Rafalovitch [via Lucene] 
ml-node+s472066n4129077...@n3.nabble.com wrote:

 Well, if the direct browser query is 1000ms and your client query is
 6seconds, then it is not Solr itself you need to worry about first.
 Something must be wrong at the client. Trying timing that bit. Maybe
 it is writing from the client to your ultimate consumer that's the
 problem.

 Regards,
Alex.
 P.s. You should probably trim your schema to get rid of all the
 example fields. Keep _version_ and _root_ but delete all the rest you
 don't actually use. Same with dynamic fields and all fieldType
 definitions you do not actually use. You can always reintroduce them
 later from the example schemas if something is missing.

 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Fri, Apr 4, 2014 at 3:41 PM, Sathya [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4129077i=0
 wrote:

  Hi,
 
  I have attached my schema.xml file too.
 
  And you are right. I have 50 million documents. When i use solr
  browser to search a document, it will return within 1000 to 2000 ms.
 
  My query looks like this:
 
 http://10.10.1.14:5050/solr/set_recent_shard1_replica5/select?q=subjectindent=true
 
  On 4/4/14, Alexandre Rafalovitch [via Lucene]
  [hidden email] http://user/SendEmail.jtp?type=nodenode=4129077i=1
 wrote:
 
 
  What does your Solr query looks like (check the Solr backend log if
  you don't know)?
 
  And how many document is that? 50 million? Does not sound like much
  for 3 fields. And what's the definitions (schema.xml rather than
  solr.xml).
 
  And what happens if you issue the query directly to Solr rather than
  through the client? Is the speed much different?
 
  Regards,
 Alex.
 
  Personal website: http://www.outerthoughts.com/
  Current project: http://www.solr-start.com/ - Accelerating your Solr
  proficiency
 
 
  On Fri, Apr 4, 2014 at 3:12 PM, Sathya [hidden 
  email]http://user/SendEmail.jtp?type=nodenode=4129077i=2
 wrote:
  Hi Alex,
 
  str name=id33026985/str str name=subjectComponent Audio\:A
  Shopping List/str str name=download_date2012-01-11
  09:02:42.96/str
 
  This is what i am  indexed in solr. I have only 3 fields in index. And
  i am just indexing id, subject and date of the news articles. Nearly 5
  crore documents. Also i have attached my solrconfig and solr.xml file.
  If u need more information, pls let me know.
 
  On Fri, Apr 4, 2014 at 1:15 PM, Alexandre Rafalovitch [via Lucene]
  [hidden email] http://user/SendEmail.jtp?type=nodenode=4129077i=3
 wrote:
 
  Show a sample query string that does that (takes 6 seconds to
 return).
  Including all defaults you may have put in solrconfig.xml (if any).
  That might give us a hint which features you are using and what
  possible direction you could go in next. For the bonus points, enable
  debug flag and rows=1 parameter to see how big your documents
  themselves are.
 
  You may have issues with a particular non-cloud-friendly feature,
 with
  caches, with not reusing parts of your queries as 'fq', returning too
  many fields or a bunch of other things.
 
  Regards,
 Alex.
  Personal website: http://www.outerthoughts.com/
  Current project: http://www.solr-start.com/ - Accelerating your Solr
  proficiency
 
 
  On Fri, Apr 4, 2014 at 2:31 PM, Sathya [hidden email] wrote:
 
   Hi All,
  
   Hi All, I am new to Solr. And i dont know how to increase the
 search
   speed
   of solrcloud. I have indexed nearly 4 GB of data. When i am
 searching
   a
   document using java with solrj, solr takes more 6 seconds to return
 a
   query
   result. Any one please help me to reduce the search query time to
 less
   than
   500 ms. i have allocate the 4 GB ram for solr. Please let me know
 for
   further details about solrcloud config.
  
  
  
   --
   View this message in context:
  
 http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067.html
   Sent from the Solr - User mailing list archive at Nabble.com.
 
 
  
  If you reply to this email, your message will be added to the
 discussion
  below:
 
 http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129068.html
  To unsubscribe from How to reduce the search speed of solrcloud,
 click
  here.
  NAML
 
 
  solrconfig.xml (101K)
  
 http://lucene.472066.n3.nabble.com/attachment/4129073/0/solrconfig.xml
  solr.xml (1K)
  http://lucene.472066.n3.nabble.com/attachment/4129073/1/solr.xml
 
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129073.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
  ___
  If you 

Re: How to reduce the search speed of solrcloud

2014-04-04 Thread Alexandre Rafalovitch
You said your request is 6 seconds when going through the SolrJ
client. But it is 1 second (1000 ms) when going directly to Solr
bypassing the SolrJ. So, the other 5 seconds must be added outside of
Solr. Concentrate on that.

Regarding your schema, you used example schema that has a lot of stuff
you do not need. Here is what a very small schema looks like:
https://github.com/arafalov/solr-indexing-book/blob/master/published/collection1/conf/schema.xml
, so you can compare. That's an example from my book. You may find the
book a fast way to get from your current state to early intermediate
(no cloud examples, though).

Contact me directly if you need a discount.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, Apr 4, 2014 at 4:11 PM, Sathya sathia.blacks...@gmail.com wrote:
 Hi,

 Sorry, i cant get u alex. Can you please explain me(if you can).  Because
 now only i entered into solr.


 On Fri, Apr 4, 2014 at 2:20 PM, Alexandre Rafalovitch [via Lucene] 
 ml-node+s472066n4129077...@n3.nabble.com wrote:

 Well, if the direct browser query is 1000ms and your client query is
 6seconds, then it is not Solr itself you need to worry about first.
 Something must be wrong at the client. Trying timing that bit. Maybe
 it is writing from the client to your ultimate consumer that's the
 problem.

 Regards,
Alex.
 P.s. You should probably trim your schema to get rid of all the
 example fields. Keep _version_ and _root_ but delete all the rest you
 don't actually use. Same with dynamic fields and all fieldType
 definitions you do not actually use. You can always reintroduce them
 later from the example schemas if something is missing.

 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Fri, Apr 4, 2014 at 3:41 PM, Sathya [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4129077i=0
 wrote:

  Hi,
 
  I have attached my schema.xml file too.
 
  And you are right. I have 50 million documents. When i use solr
  browser to search a document, it will return within 1000 to 2000 ms.
 
  My query looks like this:
 
 http://10.10.1.14:5050/solr/set_recent_shard1_replica5/select?q=subjectindent=true
 
  On 4/4/14, Alexandre Rafalovitch [via Lucene]
  [hidden email] http://user/SendEmail.jtp?type=nodenode=4129077i=1
 wrote:
 
 
  What does your Solr query looks like (check the Solr backend log if
  you don't know)?
 
  And how many document is that? 50 million? Does not sound like much
  for 3 fields. And what's the definitions (schema.xml rather than
  solr.xml).
 
  And what happens if you issue the query directly to Solr rather than
  through the client? Is the speed much different?
 
  Regards,
 Alex.
 
  Personal website: http://www.outerthoughts.com/
  Current project: http://www.solr-start.com/ - Accelerating your Solr
  proficiency
 
 
  On Fri, Apr 4, 2014 at 3:12 PM, Sathya [hidden 
  email]http://user/SendEmail.jtp?type=nodenode=4129077i=2
 wrote:
  Hi Alex,
 
  str name=id33026985/str str name=subjectComponent Audio\:A
  Shopping List/str str name=download_date2012-01-11
  09:02:42.96/str
 
  This is what i am  indexed in solr. I have only 3 fields in index. And
  i am just indexing id, subject and date of the news articles. Nearly 5
  crore documents. Also i have attached my solrconfig and solr.xml file.
  If u need more information, pls let me know.
 
  On Fri, Apr 4, 2014 at 1:15 PM, Alexandre Rafalovitch [via Lucene]
  [hidden email] http://user/SendEmail.jtp?type=nodenode=4129077i=3
 wrote:
 
  Show a sample query string that does that (takes 6 seconds to
 return).
  Including all defaults you may have put in solrconfig.xml (if any).
  That might give us a hint which features you are using and what
  possible direction you could go in next. For the bonus points, enable
  debug flag and rows=1 parameter to see how big your documents
  themselves are.
 
  You may have issues with a particular non-cloud-friendly feature,
 with
  caches, with not reusing parts of your queries as 'fq', returning too
  many fields or a bunch of other things.
 
  Regards,
 Alex.
  Personal website: http://www.outerthoughts.com/
  Current project: http://www.solr-start.com/ - Accelerating your Solr
  proficiency
 
 
  On Fri, Apr 4, 2014 at 2:31 PM, Sathya [hidden email] wrote:
 
   Hi All,
  
   Hi All, I am new to Solr. And i dont know how to increase the
 search
   speed
   of solrcloud. I have indexed nearly 4 GB of data. When i am
 searching
   a
   document using java with solrj, solr takes more 6 seconds to return
 a
   query
   result. Any one please help me to reduce the search query time to
 less
   than
   500 ms. i have allocate the 4 GB ram for solr. Please let me know
 for
   further details about solrcloud config.
  
  
  
   --
   View this message in context:
  
 

RE: tf and very short text fields

2014-04-04 Thread Markus Jelsma
Hi - In this case Walter, iirc, was looking for two things: no normalization 
and no flat TF (1f for tf(float freq)  0). We know that k1 controls TF 
saturation but in BM25Similarity you can see that k1 is multiplied by the 
encoded norm value, taking b also into account. So setting k1 to zero 
effectively disabled length normalization and results in flat or binary TF. 

Here's an example output of k1 = 0 and k1 = 0.2. Norms or enabled on the field, 
term occurs three times in the field:

28.203003 = score(doc=0,freq=1.5 = phraseFreq=1.5
), product of:
  6.4 = boost
  4.406719 = idf(docFreq=1, docCount=122)
  1.0 = tfNorm, computed from:
1.5 = phraseFreq=1.5
0.0 = parameter k1
0.75 = parameter b
8.721312 = avgFieldLength
16.0 = fieldLength




27.813797 = score(doc=0,freq=1.5 = phraseFreq=1.5
), product of:
  6.4 = boost
  4.406719 = idf(docFreq=1, docCount=122)
  0.98619986 = tfNorm, computed from:
1.5 = phraseFreq=1.5
0.2 = parameter k1
0.75 = parameter b
8.721312 = avgFieldLength
16.0 = fieldLength


You can clearly see the final TF norm being 1, despite the term frequency and 
length. Please correct my wrongs :)
Markus

 
 
-Original message-
 From:Tom Burton-West tburt...@umich.edu
 Sent: Thursday 3rd April 2014 20:18
 To: solr-user@lucene.apache.org
 Subject: Re: tf and very short text fields
 
 Hi Markus and Wunder,
 
 I'm  missing the original context, but I don't think BM25 will solve this
 particular problem.
 
 The k1 parameter sets how quickly the contribution of tf to the score falls
 off with increasing tf.   It would be helpful for making sure really long
 documents don't get too high a score, but I don't think it would help for
 very short documents without messing up its original design purpose.
 
 For BM25, if you want to turn off length normalization, you set b to 0.
  However, I don't think that will do what you want, since turning off
 normalization will mean that the score for new york, new york  will be
 twice that of the score for new york since without normalization the tf
 in new york new york is twice that of new york.
 
 I think the earlier suggestion to override tfidfsimilarity and emit 1f in
 tf() is probably the best way to switch to eliminate using tf counts,
 assumming that is really what you want.
 
 Tom
 
 
 
 
 
 
 
 
 On Tue, Apr 1, 2014 at 4:17 PM, Walter Underwood wun...@wunderwood.orgwrote:
 
  Thanks! We'll try that out and report back. I keep forgetting that I want
  to try BM25, so this is a good excuse.
 
  wunder
 
  On Apr 1, 2014, at 12:30 PM, Markus Jelsma markus.jel...@openindex.io
  wrote:
 
   Also, if i remember correctly, k1 set to zero for bm25 automatically
  omits norms in the calculation. So thats easy to play with without
  reindexing.
  
  
   Markus Jelsma markus.jel...@openindex.io schreef:Yes, override
  tfidfsimilarity and emit 1f in tf(). You can also use bm25 with k1 set to
  zero in your schema.
  
  
   Walter Underwood wun...@wunderwood.org schreef:And here is another
  peculiarity of short text fields.
  
   The movie New York, New York should not be twice as relevant for the
  query new york. Is there a way to use a binary term frequency rather than
  a count?
  
   wunder
   --
   Walter Underwood
   wun...@wunderwood.org
  
  
  
 
  --
  Walter Underwood
  wun...@wunderwood.org
 
 
 
 
 


Cannot run program svnversion when building lucene 4.7.1

2014-04-04 Thread Puneet Pawaia
Hi all.

I am trying to build lucene 4.7.1 from the sources. I can compile without
any issues but when I try to build the dist, lucene gives me
Cannot run program svnversion ... The system cannot find the specified
file.

I am compiling on Windows 7 64-bit using java version 1.7.0.45 64-bit.

Where can I get this svnversion ?

Thanks
Puneet


Solr Search on Fields name

2014-04-04 Thread anuragwalia
Hi,

Thank for giving your important time.

Problem :
I am unable to find a way how can I search Key with OR operator like if I
search Items having  RuleA OR RuleE.

Format of Indexed Data:

result name=response numFound=27 start=0 maxScore=1.0
doc
float name=score1.0/float
.
int name=RuleA4/int
int name=RuleD2/int
int name=RuleE2/int
int name=RuleF2/int

/doc
Can any one help me out how can prepare SearchQuery for key search.


Regards
Anurag 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Search-on-Fields-name-tp4129119.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Cannot run program svnversion when building lucene 4.7.1

2014-04-04 Thread Ahmet Arslan
Hi,

When you install subversion, svnversion executable comes with that too. 
Did you install any svn client for Windows?



On Friday, April 4, 2014 3:38 PM, Puneet Pawaia puneet.paw...@gmail.com wrote:
Hi all.

I am trying to build lucene 4.7.1 from the sources. I can compile without
any issues but when I try to build the dist, lucene gives me
Cannot run program svnversion ... The system cannot find the specified
file.

I am compiling on Windows 7 64-bit using java version 1.7.0.45 64-bit.

Where can I get this svnversion ?

Thanks
Puneet



Re: Solr Search on Fields name

2014-04-04 Thread Ahmet Arslan


Hi Anurag,

It seems that RuleA and RuleB are field names?

in that case try this query

q=RuleA:[* TO *] OR RuleB:[* TO *]

Ahmet


On Friday, April 4, 2014 4:15 PM, anuragwalia anuwaliaha...@gmail.com wrote:
Hi,

Thank for giving your important time.

Problem :
I am unable to find a way how can I search Key with OR operator like if I
search Items having  RuleA OR RuleE.

Format of Indexed Data:

result name=response numFound=27 start=0 maxScore=1.0
doc
float name=score1.0/float
.
int name=RuleA4/int
int name=RuleD2/int
int name=RuleE2/int
int name=RuleF2/int

/doc
Can any one help me out how can prepare SearchQuery for key search.


Regards
Anurag 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Search-on-Fields-name-tp4129119.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Full Indexing is Causing a Java Heap Out of Memory Exception

2014-04-04 Thread Candygram For Mongo
The ramBufferSizeMB was set to 6MB only on the test system to make the
system crash sooner.  In production that tag is commented out which
I believe forces the default value to be used.


On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 out of curiosity, why did you set ramBufferSizeMB to 6?

 Ahmet




 On Friday, April 4, 2014 3:27 AM, Candygram For Mongo 
 candygram.for.mo...@gmail.com wrote:
 *Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception

 *SOLR/Lucene version: *4.2.1*

 *JVM version:

 Java(TM) SE Runtime Environment (build 1.7.0_07-b11)

 Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)



 *Indexer startup command:

 set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m



 java  %JVMARGS% ^

 -Dcom.sun.management.jmxremote.port=1092 ^

 -Dcom.sun.management.jmxremote.ssl=false ^

 -Dcom.sun.management.jmxremote.authenticate=false ^

 -jar start.jar



 *SOLR indexing HTTP parameters request:

 webapp=/solr path=/dataimport
 params={clean=falsecommand=full-importwt=javabinversion=2}



 We are getting a Java heap OOM exception when indexing (updating) 27
 million records.  If we increase the Java heap memory settings the problem
 goes away but we believe the problem has not been fixed and that we will
 eventually get the same OOM exception.  We have other processes on the
 server that also require resources so we cannot continually increase the
 memory settings to resolve the OOM issue.  We are trying to find a way to
 configure the SOLR instance to reduce or preferably eliminate the
 possibility of an OOM exception.



 We can reproduce the problem on a test machine.  We set the Java heap
 memory size to 64MB to accelerate the exception.  If we increase this
 setting the same problems occurs, just hours later.  In the test
 environment, we are using the following parameters:



 JVMARGS=-XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m



 Normally we use the default solrconfig.xml file with only the following jar
 file references added:



 lib path=../../../../default/lib/common.jar /

 lib path=../../../../default/lib/webapp.jar /

 lib path=../../../../default/lib/commons-pool-1.4.jar /



 Using these values and trying to index 6 million records from the database,
 the Java Heap Out of Memory exception is thrown very quickly.



 We were able to complete a successful indexing by further modifying the
 solrconfig.xml and removing all or all but one copyfield tags from the
 schema.xml file.



 The following solrconfig.xml values were modified:



 ramBufferSizeMB6/ramBufferSizeMB



 mergePolicy class=org.apache.lucene.index.TieredMergePolicy

 int name=maxMergeAtOnce2/int

 int name=maxMergeAtOnceExplicit2/int

 int name=segmentsPerTier10/int

 int name=maxMergedSegmentMB150/int

 /mergePolicy



 autoCommit

 maxDocs15000/maxDocs  !-- This tag was maxTime, before this -- 

 openSearcherfalse/openSearcher

 /autoCommit



 Using our customized schema.xml file with two or more copyfield tags, the
 OOM exception is always thrown.  Based on the errors, the problem occurs
 when the process was trying to do the merge.  The error is provided below:



 Exception in thread Lucene Merge Thread #156
 org.apache.lucene.index.MergePolicy$MergeException:
 java.lang.OutOfMemoryError: Java heap space

 at

 org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541)

 at

 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514)

 Caused by: java.lang.OutOfMemoryError: Java heap space

 at

 org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:180)

 at

 org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:146)

 at

 org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:301)

 at
 org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:259)

 at
 org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:233)

 at
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:137)

 at
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3693)

 at
 org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3296)

 at

 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:401)

 at

 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:478)

 Mar 12, 2014 12:17:40 AM org.apache.solr.common.SolrException log

 SEVERE: auto commit error...:java.lang.IllegalStateException: this writer
 hit an OutOfMemoryError; cannot commit

 at
 org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3971)

 

Re: Does sorting skip everything having to do with relevancy?

2014-04-04 Thread Shawn Heisey
On 4/4/2014 12:48 AM, Alvaro Cabrerizo wrote:
 By default solr is using the sort parameter over the score field. So if
 you overwrite it using other sort field, yes solr will use the parameter
 you've provided. Remember, you can use multiple fields for
 sortinghttp://wiki.apache.org/solr/CommonQueryParameters#sort so
 you can make something like: sort score desc, your_field1 asc, your_field2
 desc
 
 The score of documents is calculated on every query (it does not depend on
 the sort parameter or the debugQueryParameter) and the debubQuery is only a
 mechanism for showing (or hidding) how score was calculated. If you want to
 see a document score for a particular query (apart from the debugQuery) you
 can ask for it in the solr response adding the parameter *fl=*,score* to
 your request.

These are things that I already know.

What I want to know is whether Solr has code in place that will avoid
wasting CPU cycles calculating the score that will never be displayed or
used, *especially* the complex boost parameter that's in the request
handler definition (solrconfig.xml).

str
name=boostmin(recip(abs(ms(NOW/HOUR,registered_date)),1.92901e-10,1.5,1.5),0.85)/str

Do I need to send 'boost=' as a parameter (along with my sort) to get it
to avoid that calculation?

Thanks,
Shawn



Re: Query and field name with wildcard

2014-04-04 Thread Ahmet Arslan
Hi,

bq. possible to search a word over the entire index.

You can a get list of all searchable fields (indexed=true) programmatically by 
https://wiki.apache.org/solr/LukeRequestHandler
And then you can fed this list to qf parameter of (e)dismax.

This could be implemented as a custom query parser plugin that searches a word 
over the entire index.


Ahmet


On Friday, April 4, 2014 12:08 PM, Alexandre Rafalovitch arafa...@gmail.com 
wrote:
Are you using eDisMax. That gives a lot of options, including field
aliasing, including a single name to multiple fields:
http://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2F_renaming
(with example on p77 of my book
http://www.packtpub.com/apache-solr-for-indexing-data/book :-)

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency



On Fri, Apr 4, 2014 at 3:52 PM, Croci  Francesco Luigi (ID SWS)
fcr...@id.ethz.ch wrote:
 In my index I have some fields which have the same prefix(rmDocumentTitle, 
 rmDocumentClass, rmDocumentSubclass, rmDocumentArt). Apparently it is not 
 possible to specify a query like this:

 q = rm* : some_word

 Is there a way to do this without having to write a long list of ORs?

 Another question is if it is really not possible to search a word over the 
 entire index. Something like this: q = * : some_word

 Thank you
 Francesco



Re: Full Indexing is Causing a Java Heap Out of Memory Exception

2014-04-04 Thread Ahmet Arslan
Hi,

Which database are you using? Can you send us data-config.xml? 

What happens when you use default merge policy settings?

What happens when you dump your table to Comma Separated File and fed that file 
to solr?

Ahmet

On Friday, April 4, 2014 5:10 PM, Candygram For Mongo 
candygram.for.mo...@gmail.com wrote:

The ramBufferSizeMB was set to 6MB only on the test system to make the system 
crash sooner.  In production that tag is commented out which I believe forces 
the default value to be used.




On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan iori...@yahoo.com wrote:

Hi,

out of curiosity, why did you set ramBufferSizeMB to 6? 

Ahmet





On Friday, April 4, 2014 3:27 AM, Candygram For Mongo 
candygram.for.mo...@gmail.com wrote:
*Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception

*SOLR/Lucene version: *4.2.1*


*JVM version:

Java(TM) SE Runtime Environment (build 1.7.0_07-b11)

Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)



*Indexer startup command:

set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m



java  %JVMARGS% ^

-Dcom.sun.management.jmxremote.port=1092 ^

-Dcom.sun.management.jmxremote.ssl=false ^

-Dcom.sun.management.jmxremote.authenticate=false ^

-jar start.jar



*SOLR indexing HTTP parameters request:

webapp=/solr path=/dataimport
params={clean=falsecommand=full-importwt=javabinversion=2}



We are getting a Java heap OOM exception when indexing (updating) 27
million records.  If we increase the Java heap memory settings the problem
goes away but we believe the problem has not been fixed and that we will
eventually get the same OOM exception.  We have other processes on the
server that also require resources so we cannot continually increase the
memory settings to resolve the OOM issue.  We are trying to find a way to
configure the SOLR instance to reduce or preferably eliminate the
possibility of an OOM exception.



We can reproduce the problem on a test machine.  We set the Java heap
memory size to 64MB to accelerate the exception.  If we increase this
setting the same problems occurs, just hours later.  In the test
environment, we are using the following parameters:



JVMARGS=-XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m



Normally we use the default solrconfig.xml file with only the following jar
file references added:



lib path=../../../../default/lib/common.jar /

lib path=../../../../default/lib/webapp.jar /

lib path=../../../../default/lib/commons-pool-1.4.jar /



Using these values and trying to index 6 million records from the database,
the Java Heap Out of Memory exception is thrown very quickly.



We were able to complete a successful indexing by further modifying the
solrconfig.xml and removing all or all but one copyfield tags from the
schema.xml file.



The following solrconfig.xml values were modified:



ramBufferSizeMB6/ramBufferSizeMB



mergePolicy class=org.apache.lucene.index.TieredMergePolicy

int name=maxMergeAtOnce2/int

int name=maxMergeAtOnceExplicit2/int

int name=segmentsPerTier10/int

int name=maxMergedSegmentMB150/int

/mergePolicy



autoCommit

maxDocs15000/maxDocs  !--     This tag was maxTime, before this -- 

openSearcherfalse/openSearcher

/autoCommit



Using our customized schema.xml file with two or more copyfield tags, the
OOM exception is always thrown.  Based on the errors, the problem occurs
when the process was trying to do the merge.  The error is provided below:



Exception in thread Lucene Merge Thread #156
org.apache.lucene.index.MergePolicy$MergeException:
java.lang.OutOfMemoryError: Java heap space

                at
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541)

                at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514)

Caused by: java.lang.OutOfMemoryError: Java heap space

                at
org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:180)

                at
org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:146)

                at
org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:301)

                at
org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:259)

                at
org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:233)

                at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:137)

                at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3693)

                at
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3296)

                at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:401)

                at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:478)

Mar 12, 2014 12:17:40 AM 

Re: tf and very short text fields

2014-04-04 Thread Ahmet Arslan
Hi,

Another dimple approach is: 
If you don't use phrase query or phrase boosting, you can set 
omitTermFreqAndPositions=true

Ahmet


On Friday, April 4, 2014 2:38 PM, Markus Jelsma markus.jel...@openindex.io 
wrote:
Hi - In this case Walter, iirc, was looking for two things: no normalization 
and no flat TF (1f for tf(float freq)  0). We know that k1 controls TF 
saturation but in BM25Similarity you can see that k1 is multiplied by the 
encoded norm value, taking b also into account. So setting k1 to zero 
effectively disabled length normalization and results in flat or binary TF. 

Here's an example output of k1 = 0 and k1 = 0.2. Norms or enabled on the field, 
term occurs three times in the field:

        28.203003 = score(doc=0,freq=1.5 = phraseFreq=1.5
), product of:
          6.4 = boost
          4.406719 = idf(docFreq=1, docCount=122)
          1.0 = tfNorm, computed from:
            1.5 = phraseFreq=1.5
            0.0 = parameter k1
            0.75 = parameter b
            8.721312 = avgFieldLength
            16.0 = fieldLength




        27.813797 = score(doc=0,freq=1.5 = phraseFreq=1.5
), product of:
          6.4 = boost
          4.406719 = idf(docFreq=1, docCount=122)
          0.98619986 = tfNorm, computed from:
            1.5 = phraseFreq=1.5
            0.2 = parameter k1
            0.75 = parameter b
            8.721312 = avgFieldLength
            16.0 = fieldLength


You can clearly see the final TF norm being 1, despite the term frequency and 
length. Please correct my wrongs :)
Markus




-Original message-
 From:Tom Burton-West tburt...@umich.edu
 Sent: Thursday 3rd April 2014 20:18
 To: solr-user@lucene.apache.org
 Subject: Re: tf and very short text fields
 
 Hi Markus and Wunder,
 
 I'm  missing the original context, but I don't think BM25 will solve this
 particular problem.
 
 The k1 parameter sets how quickly the contribution of tf to the score falls
 off with increasing tf.   It would be helpful for making sure really long
 documents don't get too high a score, but I don't think it would help for
 very short documents without messing up its original design purpose.
 
 For BM25, if you want to turn off length normalization, you set b to 0.
  However, I don't think that will do what you want, since turning off
 normalization will mean that the score for new york, new york  will be
 twice that of the score for new york since without normalization the tf
 in new york new york is twice that of new york.
 
 I think the earlier suggestion to override tfidfsimilarity and emit 1f in
 tf() is probably the best way to switch to eliminate using tf counts,
 assumming that is really what you want.
 
 Tom
 
 
 
 
 
 
 
 
 On Tue, Apr 1, 2014 at 4:17 PM, Walter Underwood wun...@wunderwood.orgwrote:
 
  Thanks! We'll try that out and report back. I keep forgetting that I want
  to try BM25, so this is a good excuse.
 
  wunder
 
  On Apr 1, 2014, at 12:30 PM, Markus Jelsma markus.jel...@openindex.io
  wrote:
 
   Also, if i remember correctly, k1 set to zero for bm25 automatically
  omits norms in the calculation. So thats easy to play with without
  reindexing.
  
  
   Markus Jelsma markus.jel...@openindex.io schreef:Yes, override
  tfidfsimilarity and emit 1f in tf(). You can also use bm25 with k1 set to
  zero in your schema.
  
  
   Walter Underwood wun...@wunderwood.org schreef:And here is another
  peculiarity of short text fields.
  
   The movie New York, New York should not be twice as relevant for the
  query new york. Is there a way to use a binary term frequency rather than
  a count?
  
   wunder
   --
   Walter Underwood
   wun...@wunderwood.org
  
  
  
 
  --
  Walter Underwood
  wun...@wunderwood.org
 
 
 
 



Re: How to reduce the search speed of solrcloud

2014-04-04 Thread Shawn Heisey
On 4/4/2014 1:31 AM, Sathya wrote:
 Hi All,
 
 Hi All, I am new to Solr. And i dont know how to increase the search speed
 of solrcloud. I have indexed nearly 4 GB of data. When i am searching a
 document using java with solrj, solr takes more 6 seconds to return a query
 result. Any one please help me to reduce the search query time to less than
 500 ms. i have allocate the 4 GB ram for solr. Please let me know for
 further details about solrcloud config.

How much total RAM do you have on the system, and how much total index
data is on that system (adding up all the Solr cores)?  You've already
said that you have allocated 4GB of RAM for Solr.

Later you said you had 50 million documents, and then you showed us a
URL that looks like SolrCloud.

I suspect that you don't have enough RAM left over to cache your index
effectively -- the OS Disk Cache is too small.

http://wiki.apache.org/solr/SolrPerformanceProblems

Another possible problem, also discussed on that page, is that your Java
heap is too small.

Thanks,
Shawn



Difference between [ TO *] and [* TO *] at Solr?

2014-04-04 Thread Furkan KAMACI
Hİ;

What is the difference between [ TO *] and [* TO *] at Solr? (I tested it
at 4.5.1 and numFounds are different.

Thanks;
Furkan KAMACI


Solr Search For Documents That Has Empty Content For a Given Particular Field

2014-04-04 Thread Furkan KAMACI
Hi;

How can I find the documents that has empty content for a given field. I
don't mean something like:

-field:[* TO *]

because it returns the documents that has not given particular field. I
have documents something like:

field1:some text,
field2:some text,
field :  // this is the field that I want to learn which document has
it.

Thanks;
Furkan KAMACI


Re: tf and very short text fields

2014-04-04 Thread Tom Burton-West
Thanks Marcus,

I was thinking about normalization and was absolutely wrong about setting
K1 to zero.   I should have taken a look at the algorithm and walked
through setting K=0.  (This is easier to do looking at the formula in
wikipedia http://en.wikipedia.org/wiki/Okapi_BM25 than walking though the
code.)
When you set k1 to 0 it does just what you said i.e provides binary tf.
 That part of the formula  returns 1 if the term is present and 0 if not.
Which is I think what Wunder was trying to accomplish.

Sorry about jumping in without double checking things first.

Tom


On Fri, Apr 4, 2014 at 7:38 AM, Markus Jelsma markus.jel...@openindex.iowrote:

 Hi - In this case Walter, iirc, was looking for two things: no
 normalization and no flat TF (1f for tf(float freq)  0). We know that k1
 controls TF saturation but in BM25Similarity you can see that k1 is
 multiplied by the encoded norm value, taking b also into account. So
 setting k1 to zero effectively disabled length normalization and results in
 flat or binary TF.

 Here's an example output of k1 = 0 and k1 = 0.2. Norms or enabled on the
 field, term occurs three times in the field:

 28.203003 = score(doc=0,freq=1.5 = phraseFreq=1.5
 ), product of:
   6.4 = boost
   4.406719 = idf(docFreq=1, docCount=122)
   1.0 = tfNorm, computed from:
 1.5 = phraseFreq=1.5
 0.0 = parameter k1
 0.75 = parameter b
 8.721312 = avgFieldLength
 16.0 = fieldLength




 27.813797 = score(doc=0,freq=1.5 = phraseFreq=1.5
 ), product of:
   6.4 = boost
   4.406719 = idf(docFreq=1, docCount=122)
   0.98619986 = tfNorm, computed from:
 1.5 = phraseFreq=1.5
 0.2 = parameter k1
 0.75 = parameter b
 8.721312 = avgFieldLength
 16.0 = fieldLength


 You can clearly see the final TF norm being 1, despite the term frequency
 and length. Please correct my wrongs :)
 Markus



 -Original message-
  From:Tom Burton-West tburt...@umich.edu
  Sent: Thursday 3rd April 2014 20:18
  To: solr-user@lucene.apache.org
  Subject: Re: tf and very short text fields
 
  Hi Markus and Wunder,
 
  I'm  missing the original context, but I don't think BM25 will solve this
  particular problem.
 
  The k1 parameter sets how quickly the contribution of tf to the score
 falls
  off with increasing tf.   It would be helpful for making sure really long
  documents don't get too high a score, but I don't think it would help for
  very short documents without messing up its original design purpose.
 
  For BM25, if you want to turn off length normalization, you set b to 0.
   However, I don't think that will do what you want, since turning off
  normalization will mean that the score for new york, new york  will be
  twice that of the score for new york since without normalization the tf
  in new york new york is twice that of new york.
 
  I think the earlier suggestion to override tfidfsimilarity and emit 1f
 in
  tf() is probably the best way to switch to eliminate using tf counts,
  assumming that is really what you want.
 
  Tom
 
 
 
 
 
 
 
 
  On Tue, Apr 1, 2014 at 4:17 PM, Walter Underwood wun...@wunderwood.org
 wrote:
 
   Thanks! We'll try that out and report back. I keep forgetting that I
 want
   to try BM25, so this is a good excuse.
  
   wunder
  
   On Apr 1, 2014, at 12:30 PM, Markus Jelsma markus.jel...@openindex.io
 
   wrote:
  
Also, if i remember correctly, k1 set to zero for bm25 automatically
   omits norms in the calculation. So thats easy to play with without
   reindexing.
   
   
Markus Jelsma markus.jel...@openindex.io schreef:Yes, override
   tfidfsimilarity and emit 1f in tf(). You can also use bm25 with k1 set
 to
   zero in your schema.
   
   
Walter Underwood wun...@wunderwood.org schreef:And here is another
   peculiarity of short text fields.
   
The movie New York, New York should not be twice as relevant for
 the
   query new york. Is there a way to use a binary term frequency rather
 than
   a count?
   
wunder
--
Walter Underwood
wun...@wunderwood.org
   
   
   
  
   --
   Walter Underwood
   wun...@wunderwood.org
  
  
  
  
 



Re: Solr Search For Documents That Has Empty Content For a Given Particular Field

2014-04-04 Thread Ahmet Arslan
Hi Furkan,

q=fiel:fl=field works for me (4.7.0). 

Ahmet


On Friday, April 4, 2014 5:50 PM, Furkan KAMACI furkankam...@gmail.com wrote:
Hi;

How can I find the documents that has empty content for a given field. I
don't mean something like:

-field:[* TO *]

because it returns the documents that has not given particular field. I
have documents something like:

field1:some text,
field2:some text,
field :  // this is the field that I want to learn which document has
it.

Thanks;
Furkan KAMACI



Re: Solr Search For Documents That Has Empty Content For a Given Particular Field

2014-04-04 Thread Furkan KAMACI
Hi;

II tried it before but does not work


2014-04-04 18:08 GMT+03:00 Ahmet Arslan iori...@yahoo.com:

 Hi Furkan,

 q=fiel:fl=field works for me (4.7.0).

 Ahmet


 On Friday, April 4, 2014 5:50 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
 Hi;

 How can I find the documents that has empty content for a given field. I
 don't mean something like:

 -field:[* TO *]

 because it returns the documents that has not given particular field. I
 have documents something like:

 field1:some text,
 field2:some text,
 field :  // this is the field that I want to learn which document has
 it.

 Thanks;
 Furkan KAMACI




Re: Solr Search For Documents That Has Empty Content For a Given Particular Field

2014-04-04 Thread Ahmet Arslan
Hi,

Weird, for type=string it works for me. What is the field type you are using? 


On Friday, April 4, 2014 6:25 PM, Furkan KAMACI furkankam...@gmail.com wrote:

Hi;
II tried it before but does not work



2014-04-04 18:08 GMT+03:00 Ahmet Arslan iori...@yahoo.com:

Hi Furkan,

q=fiel:fl=field works for me (4.7.0). 

Ahmet



On Friday, April 4, 2014 5:50 PM, Furkan KAMACI furkankam...@gmail.com wrote:
Hi;

How can I find the documents that has empty content for a given field. I
don't mean something like:

-field:[* TO *]

because it returns the documents that has not given particular field. I
have documents something like:

field1:some text,
field2:some text,
field :  // this is the field that I want to learn which document has
it.

Thanks;
Furkan KAMACI




Strange behavior of edismax and mm=0 with long queries (bug?)

2014-04-04 Thread Nils Kaiser
Hey,

I am currently using solr to recognize songs and people from a list of user
comments. My index stores the titles of the songs. At the moment my
application builds word ngrams and fires a search with that query, which
works well but is quite inefficient.

So my thought was to simply use the collated comments as query. So it is a
case where the query is much longer. I need to use mm=0 or mm=1.

My plan was to use edismax as the pf2 and pf3 parameters should work well
for my usecase.

However when using longer queries, I get a strange behavior which can be
seen in debugQuery.

Here is an example:

Collated Comments (used as query)

I love Henry so much. It is hard to tear your eyes away from Maria, but
watch just his feet. You'll be amazed.
sometimes pure skill can will a comp, sometimes pure joy can win... put
them both together and there is no competition
This video clip makes me smile.
Pure joy!
so good!
Who's the person that gave this a thumbs down?!? This is one of the best
routines I've ever seen. Period. And it's a competitionl! How is that
possible? They're so good it boggles my mind.
It's gorgeous. Flawless victory.
Great number! Does anybody know the name of the piece?
I believe it's called Sunny side of the street
Maria is like, the best 'follow' I've ever seen. She's so amazing.
Thanks so much Johnathan!

Song name in Index
Louis Armstrong - Sunny Side of The Street

parsedquery_toString:
+(((text:I) (text:love) (text:Henry) (text:so) (text:much.) (text:It)
(text:is) (text:hard) (text:to) (text:tear) (text:your) (text:eyes)
(text:away) (text:from) (text:Maria,) (text:but) (text:watch) (text:just)
(text:his) (text:feet.) (text:You'll) (text:be) (text:amazed.)
(text:sometimes) (text:pure) (text:skill) (text:can) (text:will) (text:a)
(text:comp,) (text:sometimes) (text:pure) (text:joy) (text:can)
(text:win...) (text:put) (text:them) (text:both) +(text:together)
+(text:there) (text:is) (text:no) (text:competition) (text:This)
(text:video) (text:clip) (text:makes) (text:me) (text:smile.) (text:Pure)
(text:joy!) (text:so) (text:good!) (text:Who's) (text:the) (text:person)
(text:that) (text:gave) (text:this) (text:a) (text:thumbs) (text:down?!?)
(text:This) (text:is) (text:one) (text:of) (text:the) (text:best)
(text:routines) (text:I've) (text:ever) (text:seen.) +(text:Period.)
+(text:it's) (text:a) (text:competitionl!) (text:How) (text:is) (text:that)
(text:possible?) (text:They're) (text:so) (text:good) (text:it)
(text:boggles) (text:my) (text:mind.) (text:It's) (text:gorgeous.)
(text:Flawless) (text:victory.) (text:Great) (text:number!) (text:Does)
(text:anybody) (text:know) (text:the) (text:name) (text:of) (text:the)
(text:piece?) (text:I) (text:believe) (text:it's) (text:called)
(text:Sunny) (text:side) (text:of) (text:the) (text:street) (text:Maria)
(text:is) (text:like,) (text:the) (text:best) (text:'follow') (text:I've)
(text:ever) (text:seen.) (text:She's) (text:so) (text:amazing.)
(text:Thanks) (text:so) (text:much) (text:Johnathan!))~1)/str

This query generates 0 results. The reason is it expects terms together,
there, Period., it's to be part of the document (see parsedquery above, all
other terms are optional, those terms are must).

Is there any reason for this behavior? If I use shorter queries it works
flawlessly and returns the document.

I've appended the whole query.

Best,

Nils
?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
  int name=status0/int
  int name=QTime11/int
/lst
result name=response numFound=0 start=0
/result
lst name=debug
  str name=rawquerystringI love Henry so much. It is hard to tear your eyes away from Maria, but watch just his feet. You'll be amazed.
sometimes pure skill can will a comp, sometimes pure joy can win... put them both together and there is no competition
This video clip makes me smile.
Pure joy!
so good!
Who's the person that gave this a thumbs down?!? This is one of the best routines I've ever seen. Period. And it's a competitionl! How is that possible? They're so good it boggles my mind.
It's gorgeous. Flawless victory.
Great number! Does anybody know the name of the piece?
I believe it's called Sunny side of the street
Maria is like, the best 'follow' I've ever seen. She's so amazing.
Thanks so much Johnathan!
/str
  str name=querystringI love Henry so much. It is hard to tear your eyes away from Maria, but watch just his feet. You'll be amazed.
sometimes pure skill can will a comp, sometimes pure joy can win... put them both together and there is no competition
This video clip makes me smile.
Pure joy!
so good!
Who's the person that gave this a thumbs down?!? This is one of the best routines I've ever seen. Period. And it's a competitionl! How is that possible? They're so good it boggles my mind.
It's gorgeous. Flawless victory.
Great number! Does anybody know the name of the piece?
I believe it's called Sunny side of the street
Maria is like, the best 'follow' I've ever seen. She's so amazing.
Thanks so much Johnathan!

Re: Solr Search For Documents That Has Empty Content For a Given Particular Field

2014-04-04 Thread Chris Hostetter

: field :  // this is the field that I want to learn which document has
: it.

How you (can) query for a field value like that is going to depend 
entirely on the FieldTYpe/Analyzer ... if it's a string field, of uses 
KeywordTokenizer then q=field: should find it -- if you use a more 
traditional analyzer then it probably didn't produce any terms for hte 
input  and from Solr's perspective a document that was indexed using 
an empty string value is exactly the same as a document that had no value 
when index.

In essenc,e your question is equivilent to asking How can i search for 
doc1, but not doc2, evne though i'm using LowerCaseAnalyzer which produces 
exactly the same indexe terms or both...

   doc1: Quick Fox
   doc2: quick fox



-Hoss
http://www.lucidworks.com/


Re: How to reduce the search speed of solrcloud

2014-04-04 Thread Sathya
Hi shawn,

I have indexed 50 million data in 5 servers. 3 servers have 8gb ram. One
have 24gb and another one have 64gb ram. I was allocate 4 gb ram to solr in
each machine. I am using solrcloud. My total index size is 50gb including 5
servers. Each server have 3 zookeepers. Still I didnt check about OS disk
cache and heap memory. I will check and let u know shawn. If anything, pls
let me know.

Thank u shawn.

On Friday, April 4, 2014, Shawn Heisey-4 [via Lucene] 
ml-node+s472066n4129150...@n3.nabble.com wrote:
 On 4/4/2014 1:31 AM, Sathya wrote:
 Hi All,

 Hi All, I am new to Solr. And i dont know how to increase the search
speed
 of solrcloud. I have indexed nearly 4 GB of data. When i am searching a
 document using java with solrj, solr takes more 6 seconds to return a
query
 result. Any one please help me to reduce the search query time to less
than
 500 ms. i have allocate the 4 GB ram for solr. Please let me know for
 further details about solrcloud config.

 How much total RAM do you have on the system, and how much total index
 data is on that system (adding up all the Solr cores)?  You've already
 said that you have allocated 4GB of RAM for Solr.

 Later you said you had 50 million documents, and then you showed us a
 URL that looks like SolrCloud.

 I suspect that you don't have enough RAM left over to cache your index
 effectively -- the OS Disk Cache is too small.

 http://wiki.apache.org/solr/SolrPerformanceProblems

 Another possible problem, also discussed on that page, is that your Java
 heap is too small.

 Thanks,
 Shawn



 
 If you reply to this email, your message will be added to the discussion
below:

http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129150.html
 To unsubscribe from How to reduce the search speed of solrcloud, click
here.
 NAML




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129173.html
Sent from the Solr - User mailing list archive at Nabble.com.

AUTO: Saravanan Chinnadurai is out of the office (returning 08/04/2014)

2014-04-04 Thread Saravanan . Chinnadurai
I am out of the office until 08/04/2014.

 Please email itsta...@actionimages.com for any urgent queries.


Note: This is an automated response to your message  Cannot run program
svnversion when building lucene 4.7.1 sent on 4/4/2014 13:38:22.

This is the only notification you will receive while this person is away.


Action Images is a division of Reuters Limited and your data will therefore be 
protected
in accordance with the Reuters Group Privacy / Data Protection notice which is 
available
in the privacy footer at www.reuters.com
Registered in England No. 145516   VAT REG: 397000555


Re: How to reduce the search speed of solrcloud

2014-04-04 Thread Anshum Gupta
I am not sure if you setup your SolrCloud right. Can you also provide me
with the version of Solr that you're running.
Also, if you could tell me about how did you setup your SolrCloud cluster.
Are the times consistent? Is this the only collection on the cluster?

Also, if I am getting it right, you have 15 ZKs running. Correct me if I'm
wrong, but if I'm not, you don't need that kind of a zk setup.


On Fri, Apr 4, 2014 at 9:39 AM, Sathya sathia.blacks...@gmail.com wrote:

 Hi shawn,

 I have indexed 50 million data in 5 servers. 3 servers have 8gb ram. One
 have 24gb and another one have 64gb ram. I was allocate 4 gb ram to solr in
 each machine. I am using solrcloud. My total index size is 50gb including 5
 servers. Each server have 3 zookeepers. Still I didnt check about OS disk
 cache and heap memory. I will check and let u know shawn. If anything, pls
 let me know.

 Thank u shawn.

 On Friday, April 4, 2014, Shawn Heisey-4 [via Lucene] 
 ml-node+s472066n4129150...@n3.nabble.com wrote:
  On 4/4/2014 1:31 AM, Sathya wrote:
  Hi All,
 
  Hi All, I am new to Solr. And i dont know how to increase the search
 speed
  of solrcloud. I have indexed nearly 4 GB of data. When i am searching a
  document using java with solrj, solr takes more 6 seconds to return a
 query
  result. Any one please help me to reduce the search query time to less
 than
  500 ms. i have allocate the 4 GB ram for solr. Please let me know for
  further details about solrcloud config.
 
  How much total RAM do you have on the system, and how much total index
  data is on that system (adding up all the Solr cores)?  You've already
  said that you have allocated 4GB of RAM for Solr.
 
  Later you said you had 50 million documents, and then you showed us a
  URL that looks like SolrCloud.
 
  I suspect that you don't have enough RAM left over to cache your index
  effectively -- the OS Disk Cache is too small.
 
  http://wiki.apache.org/solr/SolrPerformanceProblems
 
  Another possible problem, also discussed on that page, is that your Java
  heap is too small.
 
  Thanks,
  Shawn
 
 
 
  
  If you reply to this email, your message will be added to the discussion
 below:
 

 http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129150.html
  To unsubscribe from How to reduce the search speed of solrcloud, click
 here.
  NAML




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129173.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

Anshum Gupta
http://www.anshumgupta.net


SOLR Jetty Server on Windows 2003

2014-04-04 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Hi , I am trying to install solr on the Windows 2003 with Jetty server. Form 
browser everything works , but when I try to acesss from another javascript 
Code in other machine I am not getting reponse. I am using Xmlhttprequest to 
get the response from server using javascript.

Any Help...?


--Ravi


RE: SOLR Jetty Server on Windows 2003

2014-04-04 Thread Doug Turnbull
Are the requests cross domain? Is your browser giving errors about
cross domain scripting restrictions in the browser? If you're doing
cross domain browser stuff, Solr gives you the ability to do requests
over JSONP which is a sneaky hack that gets around these issues. Check
out my blog post for an example that uses angular:

http://www.opensourceconnections.com/2013/08/25/instant-search-with-solr-and-angular/



Sent from my Windows Phone From: EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions)
Sent: 4/4/2014 1:51 PM
To: solr-user@lucene.apache.org
Subject: SOLR Jetty Server on Windows 2003
Hi , I am trying to install solr on the Windows 2003 with Jetty
server. Form browser everything works , but when I try to acesss from
another javascript Code in other machine I am not getting reponse. I
am using Xmlhttprequest to get the response from server using
javascript.

Any Help...?


--Ravi


Re: Cannot run program svnversion when building lucene 4.7.1

2014-04-04 Thread Puneet Pawaia
Hi. Yes I installed Tortoise svn.
Regards
Puneet
On 4 Apr 2014 19:35, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 When you install subversion, svnversion executable comes with that too.
 Did you install any svn client for Windows?



 On Friday, April 4, 2014 3:38 PM, Puneet Pawaia puneet.paw...@gmail.com
 wrote:
 Hi all.

 I am trying to build lucene 4.7.1 from the sources. I can compile without
 any issues but when I try to build the dist, lucene gives me
 Cannot run program svnversion ... The system cannot find the specified
 file.

 I am compiling on Windows 7 64-bit using java version 1.7.0.45 64-bit.

 Where can I get this svnversion ?

 Thanks
 Puneet




Re: Cannot run program svnversion when building lucene 4.7.1

2014-04-04 Thread Ahmet Arslan
Hi,

I am not a windows user but if you installed that svnversion should be 
somewhere on disk.
Probably right next to svn. Find/locate it by file search, and add its folder 
to your path.
Once you do that you can invoke svnversion in command line.

For example here is the executables in my computer under /opt/subversion/bin: 
svn svnadminsvnlook svnservesvnversion
svn-tools   svndumpfilter   svnrdumpsvnsync



On Friday, April 4, 2014 9:18 PM, Puneet Pawaia puneet.paw...@gmail.com wrote:
Hi. Yes I installed Tortoise svn.
Regards
Puneet
On 4 Apr 2014 19:35, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 When you install subversion, svnversion executable comes with that too.
 Did you install any svn client for Windows?



 On Friday, April 4, 2014 3:38 PM, Puneet Pawaia puneet.paw...@gmail.com
 wrote:
 Hi all.

 I am trying to build lucene 4.7.1 from the sources. I can compile without
 any issues but when I try to build the dist, lucene gives me
 Cannot run program svnversion ... The system cannot find the specified
 file.

 I am compiling on Windows 7 64-bit using java version 1.7.0.45 64-bit.

 Where can I get this svnversion ?

 Thanks
 Puneet





How to see the value of long type (solr) ?

2014-04-04 Thread Lisheng Zhang
Hi,

We use solr 3.6 to index a field of long type:

fieldType name=long class=solr.TrieLongField ...

Now for debugging purpose we need to see the original value (the field is
not stored), but in luke we cannot see.

1/ is there a way to see original long type value (using luke or not) ?
2/ if we need to use lucene to search this field, what analyzer should we
use ?

Thanks very much for helps, Lisheng


Re: Full Indexing is Causing a Java Heap Out of Memory Exception

2014-04-04 Thread Candygram For Mongo
In this case we are indexing an Oracle database.

We do not include the data-config.xml in our distribution.  We store the
database information in the database.xml file.  I have attached the
database.xml file.

When we use the default merge policy settings, we get the same results.



We have not tried to dump the table to a comma separated file.  We think
that dumping this size table to disk will introduce other memory problems
with big file management. We have not tested that case.


On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 Which database are you using? Can you send us data-config.xml?

 What happens when you use default merge policy settings?

 What happens when you dump your table to Comma Separated File and fed that
 file to solr?

 Ahmet

 On Friday, April 4, 2014 5:10 PM, Candygram For Mongo 
 candygram.for.mo...@gmail.com wrote:

 The ramBufferSizeMB was set to 6MB only on the test system to make the
 system crash sooner.  In production that tag is commented out which
 I believe forces the default value to be used.




 On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,
 
 out of curiosity, why did you set ramBufferSizeMB to 6?
 
 Ahmet
 
 
 
 
 
 On Friday, April 4, 2014 3:27 AM, Candygram For Mongo 
 candygram.for.mo...@gmail.com wrote:
 *Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception
 
 *SOLR/Lucene version: *4.2.1*
 
 
 *JVM version:
 
 Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
 
 Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
 
 
 
 *Indexer startup command:
 
 set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m
 
 
 
 java  %JVMARGS% ^
 
 -Dcom.sun.management.jmxremote.port=1092 ^
 
 -Dcom.sun.management.jmxremote.ssl=false ^
 
 -Dcom.sun.management.jmxremote.authenticate=false ^
 
 -jar start.jar
 
 
 
 *SOLR indexing HTTP parameters request:
 
 webapp=/solr path=/dataimport
 params={clean=falsecommand=full-importwt=javabinversion=2}
 
 
 
 We are getting a Java heap OOM exception when indexing (updating) 27
 million records.  If we increase the Java heap memory settings the problem
 goes away but we believe the problem has not been fixed and that we will
 eventually get the same OOM exception.  We have other processes on the
 server that also require resources so we cannot continually increase the
 memory settings to resolve the OOM issue.  We are trying to find a way to
 configure the SOLR instance to reduce or preferably eliminate the
 possibility of an OOM exception.
 
 
 
 We can reproduce the problem on a test machine.  We set the Java heap
 memory size to 64MB to accelerate the exception.  If we increase this
 setting the same problems occurs, just hours later.  In the test
 environment, we are using the following parameters:
 
 
 
 JVMARGS=-XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m
 
 
 
 Normally we use the default solrconfig.xml file with only the following
 jar
 file references added:
 
 
 
 lib path=../../../../default/lib/common.jar /
 
 lib path=../../../../default/lib/webapp.jar /
 
 lib path=../../../../default/lib/commons-pool-1.4.jar /
 
 
 
 Using these values and trying to index 6 million records from the
 database,
 the Java Heap Out of Memory exception is thrown very quickly.
 
 
 
 We were able to complete a successful indexing by further modifying the
 solrconfig.xml and removing all or all but one copyfield tags from the
 schema.xml file.
 
 
 
 The following solrconfig.xml values were modified:
 
 
 
 ramBufferSizeMB6/ramBufferSizeMB
 
 
 
 mergePolicy class=org.apache.lucene.index.TieredMergePolicy
 
 int name=maxMergeAtOnce2/int
 
 int name=maxMergeAtOnceExplicit2/int
 
 int name=segmentsPerTier10/int
 
 int name=maxMergedSegmentMB150/int
 
 /mergePolicy
 
 
 
 autoCommit
 
 maxDocs15000/maxDocs  !-- This tag was maxTime, before this -- 
 
 openSearcherfalse/openSearcher
 
 /autoCommit
 
 
 
 Using our customized schema.xml file with two or more copyfield tags,
 the
 OOM exception is always thrown.  Based on the errors, the problem occurs
 when the process was trying to do the merge.  The error is provided below:
 
 
 
 Exception in thread Lucene Merge Thread #156
 org.apache.lucene.index.MergePolicy$MergeException:
 java.lang.OutOfMemoryError: Java heap space
 
 at

 org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541)
 
 at

 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514)
 
 Caused by: java.lang.OutOfMemoryError: Java heap space
 
 at

 org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:180)
 
 at

 org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:146)
 
 at

 org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:301)
 
  

[ JOB ] - Search Specialist, Bloomberg LP [ NY and London ]

2014-04-04 Thread Anirudha Jadhav
http://jobs.bloomberg.com/job/New-York-Search-Technology-Specialist-Job-NY/45497500/

http://jobs.bloomberg.com/job/London-RD-News-Search-Backend-Developer-Job/50463600/

keeping it short here , feel free to talk to me with more questions

-- 
Anirudha P. Jadhav


Re: Does sorting skip everything having to do with relevancy?

2014-04-04 Thread Alvaro Cabrerizo
Hi,

If you dont want to waste your cpu time, then comment the boost parameter
in the query parser defined in your solrconfig.xml. If you cant do that,
then you can overwrite it sending the boost parameter for example using the
constant function  (e.g.  http:///...boost=1sort=your_sort). The
parameter boost will be overwritten if it is not defined as an invariant.

Regards.


On Fri, Apr 4, 2014 at 4:12 PM, Shawn Heisey s...@elyograg.org wrote:

 On 4/4/2014 12:48 AM, Alvaro Cabrerizo wrote:
  By default solr is using the sort parameter over the score field. So if
  you overwrite it using other sort field, yes solr will use the parameter
  you've provided. Remember, you can use multiple fields for
  sortinghttp://wiki.apache.org/solr/CommonQueryParameters#sort so
  you can make something like: sort score desc, your_field1 asc,
 your_field2
  desc
 
  The score of documents is calculated on every query (it does not depend
 on
  the sort parameter or the debugQueryParameter) and the debubQuery is
 only a
  mechanism for showing (or hidding) how score was calculated. If you want
 to
  see a document score for a particular query (apart from the debugQuery)
 you
  can ask for it in the solr response adding the parameter *fl=*,score* to
  your request.

 These are things that I already know.

 What I want to know is whether Solr has code in place that will avoid
 wasting CPU cycles calculating the score that will never be displayed or
 used, *especially* the complex boost parameter that's in the request
 handler definition (solrconfig.xml).

 str

 name=boostmin(recip(abs(ms(NOW/HOUR,registered_date)),1.92901e-10,1.5,1.5),0.85)/str

 Do I need to send 'boost=' as a parameter (along with my sort) to get it
 to avoid that calculation?

 Thanks,
 Shawn




Re: Does sorting skip everything having to do with relevancy?

2014-04-04 Thread Shawn Heisey

On 4/4/2014 1:48 PM, Alvaro Cabrerizo wrote:

If you dont want to waste your cpu time, then comment the boost parameter
in the query parser defined in your solrconfig.xml. If you cant do that,
then you can overwrite it sending the boost parameter for example using the
constant function  (e.g.  http:///...boost=1sort=your_sort). The
parameter boost will be overwritten if it is not defined as an invariant.


Thank you for responding.  I know how I can override the behavior, what 
I want to find out is whether or not it's necessary to do so -- if it's 
not necessary because Solr skips it, then everything is good.  If it is 
necessary, I can open an issue in Jira asking for Solr to get smarter.  
That way everyone benefits and they don't have to do anything except 
upgrade Solr.


Thanks,
Shawn



Distributed tracing for Solr via adding HTTP headers?

2014-04-04 Thread Gregg Donovan
We have some metadata -- e.g. a request UUID -- that we log to every log
line using Log4J's MDC [1]. The UUID logging allows us to connect any log
lines we have for a given request across servers. Sort of like Zipkin [2].

Currently we're using EmbeddedSolrServer without sharding, so adding the
UUID is fairly simple, since everything is in one process and one thread.
But, we're testing a sharded HTTP implementation and running into some
difficulties getting this data passed around in a way that lets us trace
all log lines generated by a request to its UUID.

The first thing I tried was to add the UUID by adding it to the SolrParams.
This achieves the goal of getting those values logged on the shards if a
request is successful, but we miss having those values in the MDC if there
are other log lines before the final log line. E.g. an Exception in a
custom component.

My current thought is that sending HTTP headers with diagnostic information
would be very useful. Those could be placed in the MDC even before handing
off to work to SolrDispatchFilter, so that any Solr problem will have the
proper logging.

I.e. every additional header added to a Solr request gets a Solr- prefix.
On the server, we look for those headers and add them to the SLF4J MDC[3].

Here's a patch [4] that does this that we're testing out. Is this a good
idea? Would anyone else find this useful? If so, I'll open a ticket.

--Gregg

[1] http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/MDC.html
[2] http://twitter.github.io/zipkin/
[3] http://www.slf4j.org/api/org/slf4j/MDC.html
[4] https://gist.github.com/greggdonovan/9982327


Re: Full Indexing is Causing a Java Heap Out of Memory Exception

2014-04-04 Thread Candygram For Mongo
In case the attached database.xml file didn't show up, I have pasted in the
contents below:

dataConfig
dataSource
name=org_only
type=JdbcDataSource
driver=oracle.jdbc.OracleDriver
url=jdbc:oracle:thin:@test2.abc.com:1521:ORCL
user=admin
password=admin
readOnly=false
batchSize=100
/
document


entity name=full-index query=
select

NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(100)), 'null')
as SOLR_ID,

'ORCL.ADDRESS_ACCT_ALL'
as SOLR_CATEGORY,

NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(255)), ' ') as
ADDRESSALLROWID,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as
ADDRESSALLADDRTYPECD,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.LONGITUDE as varchar2(255)), ' ') as
ADDRESSALLLONGITUDE,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.LATITUDE as varchar2(255)), ' ') as
ADDRESSALLLATITUDE,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_NAME as varchar2(255)), ' ') as
ADDRESSALLADDRNAME,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.CITY as varchar2(255)), ' ') as
ADDRESSALLCITY,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.STATE as varchar2(255)), ' ') as
ADDRESSALLSTATE,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.EMAIL_ADDR as varchar2(255)), ' ') as
ADDRESSALLEMAILADDR

from ORCL.ADDRESS_ACCT_ALL
 

field column=SOLR_ID name=id /
field column=SOLR_CATEGORY name=category /
field column=ADDRESSALLROWID name=ADDRESS_ACCT_ALL.RECORD_ID_abc /
field column=ADDRESSALLADDRTYPECD
name=ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc /
field column=ADDRESSALLLONGITUDE name=ADDRESS_ACCT_ALL.LONGITUDE_abc /
field column=ADDRESSALLLATITUDE name=ADDRESS_ACCT_ALL.LATITUDE_abc /
field column=ADDRESSALLADDRNAME name=ADDRESS_ACCT_ALL.ADDR_NAME_abc /
field column=ADDRESSALLCITY name=ADDRESS_ACCT_ALL.CITY_abc /
field column=ADDRESSALLSTATE name=ADDRESS_ACCT_ALL.STATE_abc /
field column=ADDRESSALLEMAILADDR name=ADDRESS_ACCT_ALL.EMAIL_ADDR_abc
/

/entity



!-- Varaibles --
!-- '${dataimporter.last_index_time}' --
/document
/dataConfig





On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo 
candygram.for.mo...@gmail.com wrote:

 In this case we are indexing an Oracle database.

 We do not include the data-config.xml in our distribution.  We store the
 database information in the database.xml file.  I have attached the
 database.xml file.

 When we use the default merge policy settings, we get the same results.



 We have not tried to dump the table to a comma separated file.  We think
 that dumping this size table to disk will introduce other memory problems
 with big file management. We have not tested that case.


 On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 Which database are you using? Can you send us data-config.xml?

 What happens when you use default merge policy settings?

 What happens when you dump your table to Comma Separated File and fed
 that file to solr?

 Ahmet

 On Friday, April 4, 2014 5:10 PM, Candygram For Mongo 
 candygram.for.mo...@gmail.com wrote:

 The ramBufferSizeMB was set to 6MB only on the test system to make the
 system crash sooner.  In production that tag is commented out which
 I believe forces the default value to be used.




 On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,
 
 out of curiosity, why did you set ramBufferSizeMB to 6?
 
 Ahmet
 
 
 
 
 
 On Friday, April 4, 2014 3:27 AM, Candygram For Mongo 
 candygram.for.mo...@gmail.com wrote:
 *Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception
 
 *SOLR/Lucene version: *4.2.1*
 
 
 *JVM version:
 
 Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
 
 Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
 
 
 
 *Indexer startup command:
 
 set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m
 
 
 
 java  %JVMARGS% ^
 
 -Dcom.sun.management.jmxremote.port=1092 ^
 
 -Dcom.sun.management.jmxremote.ssl=false ^
 
 -Dcom.sun.management.jmxremote.authenticate=false ^
 
 -jar start.jar
 
 
 
 *SOLR indexing HTTP parameters request:
 
 webapp=/solr path=/dataimport
 params={clean=falsecommand=full-importwt=javabinversion=2}
 
 
 
 We are getting a Java heap OOM exception when indexing (updating) 27
 million records.  If we increase the Java heap memory settings the
 problem
 goes away but we believe the problem has not been fixed and that we will
 eventually get the same OOM exception.  We have other processes on the
 server that also require resources so we cannot continually increase the
 memory settings to resolve the OOM issue.  We are trying to find a way to
 configure the SOLR instance to reduce or preferably eliminate the
 possibility of an OOM exception.
 
 
 
 We can reproduce the problem on a test machine.  We set the Java heap
 memory size to 64MB to accelerate the exception.  If we increase this
 setting the same problems occurs, just hours later.  In the test
 environment, we are using the following parameters:
 
 
 
 JVMARGS=-XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m
 
 
 
 Normally we use the default solrconfig.xml file with only the following
 jar
 file references added:
 
 
 
 

Re: Does sorting skip everything having to do with relevancy?

2014-04-04 Thread Mikhail Khludnev
Hello Shawn,

I suppose SolrIndexSearcher.buildTopDocsCollector() doesn't create a
Collector which calls score() in this case. Hence, it shouldn't waste CPU.
Just my impression.
Haven't you tried to check it supplying some weird formula, which throws
exception?


On Sat, Apr 5, 2014 at 12:02 AM, Shawn Heisey s...@elyograg.org wrote:

 On 4/4/2014 1:48 PM, Alvaro Cabrerizo wrote:

 If you dont want to waste your cpu time, then comment the boost parameter
 in the query parser defined in your solrconfig.xml. If you cant do that,
 then you can overwrite it sending the boost parameter for example using
 the
 constant function  (e.g.  http:///...boost=1sort=your_sort). The
 parameter boost will be overwritten if it is not defined as an invariant.


 Thank you for responding.  I know how I can override the behavior, what I
 want to find out is whether or not it's necessary to do so -- if it's not
 necessary because Solr skips it, then everything is good.  If it is
 necessary, I can open an issue in Jira asking for Solr to get smarter.
  That way everyone benefits and they don't have to do anything except
 upgrade Solr.

 Thanks,
 Shawn




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Solr join and lucene scoring

2014-04-04 Thread Mikhail Khludnev
On Thu, Apr 3, 2014 at 1:42 PM, m...@preselect-media.com wrote:

 Hello,

 referencing to this issue:
 https://issues.apache.org/jira/browse/SOLR-4307

 Is it still not possible with the solr query time join to use scoring?

It's not implemented still.
https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/JoinQParserPlugin.java#L549


 Do I still have to write my own plugin or is there a plugin somewhere I
 could use?

 I never wrote a plugin for solr before, so I would prefer if I don't have
 to start from scratch.

The right approach from my POV is to use Lucene's join
https://github.com/apache/lucene-solr/blob/trunk/lucene/join/src/java/org/apache/lucene/search/join/JoinUtil.javain
new QParser, but solving the impedance between Lucene and Solr, might
be
tricky.




 THX,
 Moritz




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Filter query with multiple raw/literal ORs

2014-04-04 Thread Mikhail Khludnev
On Fri, Apr 4, 2014 at 4:08 AM, Yonik Seeley yo...@heliosearch.com wrote:

 Try adding a space before the first term, so the
 default lucene query parser will be used:


Yonik, I'm curious, whether it a feature?


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: Full Indexing is Causing a Java Heap Out of Memory Exception

2014-04-04 Thread Ahmet Arslan
Hi,

Can you remove auto commit for bulk import. Commit at the very end?

Ahmet



On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo 
candygram.for.mo...@gmail.com wrote:
In case the attached database.xml file didn't show up, I have pasted in the
contents below:

dataConfig
dataSource
name=org_only
type=JdbcDataSource
driver=oracle.jdbc.OracleDriver
url=jdbc:oracle:thin:@test2.abc.com:1521:ORCL
user=admin
password=admin
readOnly=false
batchSize=100
/
document


entity name=full-index query=
select

NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(100)), 'null')
as SOLR_ID,

'ORCL.ADDRESS_ACCT_ALL'
as SOLR_CATEGORY,

NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(255)), ' ') as
ADDRESSALLROWID,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as
ADDRESSALLADDRTYPECD,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.LONGITUDE as varchar2(255)), ' ') as
ADDRESSALLLONGITUDE,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.LATITUDE as varchar2(255)), ' ') as
ADDRESSALLLATITUDE,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_NAME as varchar2(255)), ' ') as
ADDRESSALLADDRNAME,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.CITY as varchar2(255)), ' ') as
ADDRESSALLCITY,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.STATE as varchar2(255)), ' ') as
ADDRESSALLSTATE,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.EMAIL_ADDR as varchar2(255)), ' ') as
ADDRESSALLEMAILADDR

from ORCL.ADDRESS_ACCT_ALL
 

field column=SOLR_ID name=id /
field column=SOLR_CATEGORY name=category /
field column=ADDRESSALLROWID name=ADDRESS_ACCT_ALL.RECORD_ID_abc /
field column=ADDRESSALLADDRTYPECD
name=ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc /
field column=ADDRESSALLLONGITUDE name=ADDRESS_ACCT_ALL.LONGITUDE_abc /
field column=ADDRESSALLLATITUDE name=ADDRESS_ACCT_ALL.LATITUDE_abc /
field column=ADDRESSALLADDRNAME name=ADDRESS_ACCT_ALL.ADDR_NAME_abc /
field column=ADDRESSALLCITY name=ADDRESS_ACCT_ALL.CITY_abc /
field column=ADDRESSALLSTATE name=ADDRESS_ACCT_ALL.STATE_abc /
field column=ADDRESSALLEMAILADDR name=ADDRESS_ACCT_ALL.EMAIL_ADDR_abc
/

/entity



!-- Varaibles --
!-- '${dataimporter.last_index_time}' --
/document
/dataConfig






On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo 
candygram.for.mo...@gmail.com wrote:

 In this case we are indexing an Oracle database.

 We do not include the data-config.xml in our distribution.  We store the
 database information in the database.xml file.  I have attached the
 database.xml file.

 When we use the default merge policy settings, we get the same results.



 We have not tried to dump the table to a comma separated file.  We think
 that dumping this size table to disk will introduce other memory problems
 with big file management. We have not tested that case.


 On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 Which database are you using? Can you send us data-config.xml?

 What happens when you use default merge policy settings?

 What happens when you dump your table to Comma Separated File and fed
 that file to solr?

 Ahmet

 On Friday, April 4, 2014 5:10 PM, Candygram For Mongo 
 candygram.for.mo...@gmail.com wrote:

 The ramBufferSizeMB was set to 6MB only on the test system to make the
 system crash sooner.  In production that tag is commented out which
 I believe forces the default value to be used.




 On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,
 
 out of curiosity, why did you set ramBufferSizeMB to 6?
 
 Ahmet
 
 
 
 
 
 On Friday, April 4, 2014 3:27 AM, Candygram For Mongo 
 candygram.for.mo...@gmail.com wrote:
 *Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception
 
 *SOLR/Lucene version: *4.2.1*
 
 
 *JVM version:
 
 Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
 
 Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
 
 
 
 *Indexer startup command:
 
 set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m
 
 
 
 java  %JVMARGS% ^
 
 -Dcom.sun.management.jmxremote.port=1092 ^
 
 -Dcom.sun.management.jmxremote.ssl=false ^
 
 -Dcom.sun.management.jmxremote.authenticate=false ^
 
 -jar start.jar
 
 
 
 *SOLR indexing HTTP parameters request:
 
 webapp=/solr path=/dataimport
 params={clean=falsecommand=full-importwt=javabinversion=2}
 
 
 
 We are getting a Java heap OOM exception when indexing (updating) 27
 million records.  If we increase the Java heap memory settings the
 problem
 goes away but we believe the problem has not been fixed and that we will
 eventually get the same OOM exception.  We have other processes on the
 server that also require resources so we cannot continually increase the
 memory settings to resolve the OOM issue.  We are trying to find a way to
 configure the SOLR instance to reduce or preferably eliminate the
 possibility of an OOM exception.
 
 
 
 We can reproduce the problem on a test machine.  We set the Java heap
 memory size to 64MB to accelerate the exception.  If we increase this
 setting the same problems occurs, just hours later.  In the test
 environment, we are using the following 

Re: Filter query with multiple raw/literal ORs

2014-04-04 Thread Yonik Seeley
On Fri, Apr 4, 2014 at 5:28 PM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
 On Fri, Apr 4, 2014 at 4:08 AM, Yonik Seeley yo...@heliosearch.com wrote:

 Try adding a space before the first term, so the
 default lucene query parser will be used:


 Yonik, I'm curious, whether it a feature?

Yep, it was completely on purpose that I required local parameters to
be left-justified.  It left an easy way to escape the normal local
params processing when looking for the query type.

For example, if you want to ensure that your custom parser is used,
and you have defType=myCustomQParser
then all you have to do is add a space before the query parameter
(which shouldn't mess up any sort of natural language query parser).

-Yonik
http://heliosearch.org - solve Solr GC pauses with off-heap filters
and fieldcache


Re: Does sorting skip everything having to do with relevancy?

2014-04-04 Thread Shawn Heisey

On 4/4/2014 3:13 PM, Mikhail Khludnev wrote:

I suppose SolrIndexSearcher.buildTopDocsCollector() doesn't create a
Collector which calls score() in this case. Hence, it shouldn't waste CPU.
Just my impression.
Haven't you tried to check it supplying some weird formula, which throws
exception?


I didn't think of that.  That's a good idea -- as long as there's not 
independent code that checks the function in addition to the code that 
actually runs it.


With the following parameters added to an edismax query that otherwise 
works, I get an exception.  It works if I change the e to 5.


sort=registered_date ascboost=sum(5,e)

I will take Alvaro's suggestion and add boost=1 to queries that use a 
sort parameter.  It's probably a good idea to file that Jira.


Thanks,
Shawn



Re: Full Indexing is Causing a Java Heap Out of Memory Exception

2014-04-04 Thread Candygram For Mongo
We would be happy to try that.  That sounds counter intuitive for the high
volume of records we have.  Can you help me understand how that might solve
our problem?



On Fri, Apr 4, 2014 at 2:34 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 Can you remove auto commit for bulk import. Commit at the very end?

 Ahmet



 On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo 
 candygram.for.mo...@gmail.com wrote:
 In case the attached database.xml file didn't show up, I have pasted in the
 contents below:

 dataConfig
 dataSource
 name=org_only
 type=JdbcDataSource
 driver=oracle.jdbc.OracleDriver
 url=jdbc:oracle:thin:@test2.abc.com:1521:ORCL
 user=admin
 password=admin
 readOnly=false
 batchSize=100
 /
 document


 entity name=full-index query=
 select

 NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(100)), 'null')
 as SOLR_ID,

 'ORCL.ADDRESS_ACCT_ALL'
 as SOLR_CATEGORY,

 NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(255)), ' ') as
 ADDRESSALLROWID,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as
 ADDRESSALLADDRTYPECD,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.LONGITUDE as varchar2(255)), ' ') as
 ADDRESSALLLONGITUDE,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.LATITUDE as varchar2(255)), ' ') as
 ADDRESSALLLATITUDE,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_NAME as varchar2(255)), ' ') as
 ADDRESSALLADDRNAME,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.CITY as varchar2(255)), ' ') as
 ADDRESSALLCITY,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.STATE as varchar2(255)), ' ') as
 ADDRESSALLSTATE,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.EMAIL_ADDR as varchar2(255)), ' ') as
 ADDRESSALLEMAILADDR

 from ORCL.ADDRESS_ACCT_ALL
  

 field column=SOLR_ID name=id /
 field column=SOLR_CATEGORY name=category /
 field column=ADDRESSALLROWID name=ADDRESS_ACCT_ALL.RECORD_ID_abc /
 field column=ADDRESSALLADDRTYPECD
 name=ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc /
 field column=ADDRESSALLLONGITUDE name=ADDRESS_ACCT_ALL.LONGITUDE_abc
 /
 field column=ADDRESSALLLATITUDE name=ADDRESS_ACCT_ALL.LATITUDE_abc /
 field column=ADDRESSALLADDRNAME name=ADDRESS_ACCT_ALL.ADDR_NAME_abc /
 field column=ADDRESSALLCITY name=ADDRESS_ACCT_ALL.CITY_abc /
 field column=ADDRESSALLSTATE name=ADDRESS_ACCT_ALL.STATE_abc /
 field column=ADDRESSALLEMAILADDR name=ADDRESS_ACCT_ALL.EMAIL_ADDR_abc
 /

 /entity



 !-- Varaibles --
 !-- '${dataimporter.last_index_time}' --
 /document
 /dataConfig






 On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo 
 candygram.for.mo...@gmail.com wrote:

  In this case we are indexing an Oracle database.
 
  We do not include the data-config.xml in our distribution.  We store the
  database information in the database.xml file.  I have attached the
  database.xml file.
 
  When we use the default merge policy settings, we get the same results.
 
 
 
  We have not tried to dump the table to a comma separated file.  We think
  that dumping this size table to disk will introduce other memory problems
  with big file management. We have not tested that case.
 
 
  On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan iori...@yahoo.com wrote:
 
  Hi,
 
  Which database are you using? Can you send us data-config.xml?
 
  What happens when you use default merge policy settings?
 
  What happens when you dump your table to Comma Separated File and fed
  that file to solr?
 
  Ahmet
 
  On Friday, April 4, 2014 5:10 PM, Candygram For Mongo 
  candygram.for.mo...@gmail.com wrote:
 
  The ramBufferSizeMB was set to 6MB only on the test system to make the
  system crash sooner.  In production that tag is commented out which
  I believe forces the default value to be used.
 
 
 
 
  On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan iori...@yahoo.com wrote:
 
  Hi,
  
  out of curiosity, why did you set ramBufferSizeMB to 6?
  
  Ahmet
  
  
  
  
  
  On Friday, April 4, 2014 3:27 AM, Candygram For Mongo 
  candygram.for.mo...@gmail.com wrote:
  *Main issue: Full Indexing is Causing a Java Heap Out of Memory
 Exception
  
  *SOLR/Lucene version: *4.2.1*
  
  
  *JVM version:
  
  Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
  
  Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
  
  
  
  *Indexer startup command:
  
  set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m
  
  
  
  java  %JVMARGS% ^
  
  -Dcom.sun.management.jmxremote.port=1092 ^
  
  -Dcom.sun.management.jmxremote.ssl=false ^
  
  -Dcom.sun.management.jmxremote.authenticate=false ^
  
  -jar start.jar
  
  
  
  *SOLR indexing HTTP parameters request:
  
  webapp=/solr path=/dataimport
  params={clean=falsecommand=full-importwt=javabinversion=2}
  
  
  
  We are getting a Java heap OOM exception when indexing (updating) 27
  million records.  If we increase the Java heap memory settings the
  problem
  goes away but we believe the problem has not been fixed and that we
 will
  eventually get the same OOM exception.  We have other processes on the
  server that also require resources so we cannot continually increase
 the
  memory settings to resolve the OOM 

Re: Full Indexing is Causing a Java Heap Out of Memory Exception

2014-04-04 Thread Candygram For Mongo
I might have forgot to mention that we are using the DataImportHandler.  I
think we know how to remove auto commit.  How would we force a commit at
the end?


On Fri, Apr 4, 2014 at 3:18 PM, Candygram For Mongo 
candygram.for.mo...@gmail.com wrote:

 We would be happy to try that.  That sounds counter intuitive for the high
 volume of records we have.  Can you help me understand how that might solve
 our problem?



 On Fri, Apr 4, 2014 at 2:34 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 Can you remove auto commit for bulk import. Commit at the very end?

 Ahmet



 On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo 
 candygram.for.mo...@gmail.com wrote:
 In case the attached database.xml file didn't show up, I have pasted in
 the
 contents below:

 dataConfig
 dataSource
 name=org_only
 type=JdbcDataSource
 driver=oracle.jdbc.OracleDriver
 url=jdbc:oracle:thin:@test2.abc.com:1521:ORCL
 user=admin
 password=admin
 readOnly=false
 batchSize=100
 /
 document


 entity name=full-index query=
 select

 NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(100)), 'null')
 as SOLR_ID,

 'ORCL.ADDRESS_ACCT_ALL'
 as SOLR_CATEGORY,

 NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(255)), ' ') as
 ADDRESSALLROWID,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as
 ADDRESSALLADDRTYPECD,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.LONGITUDE as varchar2(255)), ' ') as
 ADDRESSALLLONGITUDE,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.LATITUDE as varchar2(255)), ' ') as
 ADDRESSALLLATITUDE,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_NAME as varchar2(255)), ' ') as
 ADDRESSALLADDRNAME,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.CITY as varchar2(255)), ' ') as
 ADDRESSALLCITY,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.STATE as varchar2(255)), ' ') as
 ADDRESSALLSTATE,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.EMAIL_ADDR as varchar2(255)), ' ') as
 ADDRESSALLEMAILADDR

 from ORCL.ADDRESS_ACCT_ALL
  

 field column=SOLR_ID name=id /
 field column=SOLR_CATEGORY name=category /
 field column=ADDRESSALLROWID name=ADDRESS_ACCT_ALL.RECORD_ID_abc /
 field column=ADDRESSALLADDRTYPECD
 name=ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc /
 field column=ADDRESSALLLONGITUDE name=ADDRESS_ACCT_ALL.LONGITUDE_abc
 /
 field column=ADDRESSALLLATITUDE name=ADDRESS_ACCT_ALL.LATITUDE_abc /
 field column=ADDRESSALLADDRNAME name=ADDRESS_ACCT_ALL.ADDR_NAME_abc
 /
 field column=ADDRESSALLCITY name=ADDRESS_ACCT_ALL.CITY_abc /
 field column=ADDRESSALLSTATE name=ADDRESS_ACCT_ALL.STATE_abc /
 field column=ADDRESSALLEMAILADDR name=ADDRESS_ACCT_ALL.EMAIL_ADDR_abc
 /

 /entity



 !-- Varaibles --
 !-- '${dataimporter.last_index_time}' --
 /document
 /dataConfig






 On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo 
 candygram.for.mo...@gmail.com wrote:

  In this case we are indexing an Oracle database.
 
  We do not include the data-config.xml in our distribution.  We store the
  database information in the database.xml file.  I have attached the
  database.xml file.
 
  When we use the default merge policy settings, we get the same results.
 
 
 
  We have not tried to dump the table to a comma separated file.  We think
  that dumping this size table to disk will introduce other memory
 problems
  with big file management. We have not tested that case.
 
 
  On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan iori...@yahoo.com wrote:
 
  Hi,
 
  Which database are you using? Can you send us data-config.xml?
 
  What happens when you use default merge policy settings?
 
  What happens when you dump your table to Comma Separated File and fed
  that file to solr?
 
  Ahmet
 
  On Friday, April 4, 2014 5:10 PM, Candygram For Mongo 
  candygram.for.mo...@gmail.com wrote:
 
  The ramBufferSizeMB was set to 6MB only on the test system to make the
  system crash sooner.  In production that tag is commented out which
  I believe forces the default value to be used.
 
 
 
 
  On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan iori...@yahoo.com
 wrote:
 
  Hi,
  
  out of curiosity, why did you set ramBufferSizeMB to 6?
  
  Ahmet
  
  
  
  
  
  On Friday, April 4, 2014 3:27 AM, Candygram For Mongo 
  candygram.for.mo...@gmail.com wrote:
  *Main issue: Full Indexing is Causing a Java Heap Out of Memory
 Exception
  
  *SOLR/Lucene version: *4.2.1*
  
  
  *JVM version:
  
  Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
  
  Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
  
  
  
  *Indexer startup command:
  
  set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m
  
  
  
  java  %JVMARGS% ^
  
  -Dcom.sun.management.jmxremote.port=1092 ^
  
  -Dcom.sun.management.jmxremote.ssl=false ^
  
  -Dcom.sun.management.jmxremote.authenticate=false ^
  
  -jar start.jar
  
  
  
  *SOLR indexing HTTP parameters request:
  
  webapp=/solr path=/dataimport
  params={clean=falsecommand=full-importwt=javabinversion=2}
  
  
  
  We are getting a Java heap OOM exception when indexing (updating) 27
  million records.  If we increase the Java heap memory settings the
  problem
  goes 

Re: Full Indexing is Causing a Java Heap Out of Memory Exception

2014-04-04 Thread Ahmet Arslan
Hi,

This may not solve your problem but generally it is recommended to disable auto 
commit and transaction logs for bulk indexing.
And issue one commit at the very end. Do you tlogs enabled? I see commit 
failed in the error message thats why I am offering this.

And regarding comma separated values, with this approach you focus on just solr 
importing process. You separate data acquisition phrase. And it is very fast 
load even big csv files  http://wiki.apache.org/solr/UpdateCSV
I have never experienced OOM during indexing, I suspect data acquisition has 
role in it.

Ahmet

On Saturday, April 5, 2014 1:18 AM, Candygram For Mongo 
candygram.for.mo...@gmail.com wrote:

We would be happy to try that.  That sounds counter intuitive for the high 
volume of records we have.  Can you help me understand how that might solve our 
problem?




On Fri, Apr 4, 2014 at 2:34 PM, Ahmet Arslan iori...@yahoo.com wrote:

Hi,

Can you remove auto commit for bulk import. Commit at the very end?

Ahmet




On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo 
candygram.for.mo...@gmail.com wrote:
In case the attached database.xml file didn't show up, I have pasted in the
contents below:

dataConfig
dataSource
name=org_only
type=JdbcDataSource
driver=oracle.jdbc.OracleDriver
url=jdbc:oracle:thin:@test2.abc.com:1521:ORCL
user=admin
password=admin
readOnly=false
batchSize=100
/
document


entity name=full-index query=
select

NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(100)), 'null')
as SOLR_ID,

'ORCL.ADDRESS_ACCT_ALL'
as SOLR_CATEGORY,

NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(255)), ' ') as
ADDRESSALLROWID,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as
ADDRESSALLADDRTYPECD,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.LONGITUDE as varchar2(255)), ' ') as
ADDRESSALLLONGITUDE,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.LATITUDE as varchar2(255)), ' ') as
ADDRESSALLLATITUDE,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_NAME as varchar2(255)), ' ') as
ADDRESSALLADDRNAME,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.CITY as varchar2(255)), ' ') as
ADDRESSALLCITY,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.STATE as varchar2(255)), ' ') as
ADDRESSALLSTATE,
NVL(cast(ORCL.ADDRESS_ACCT_ALL.EMAIL_ADDR as varchar2(255)), ' ') as
ADDRESSALLEMAILADDR

from ORCL.ADDRESS_ACCT_ALL
 

field column=SOLR_ID name=id /
field column=SOLR_CATEGORY name=category /
field column=ADDRESSALLROWID name=ADDRESS_ACCT_ALL.RECORD_ID_abc /
field column=ADDRESSALLADDRTYPECD
name=ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc /
field column=ADDRESSALLLONGITUDE name=ADDRESS_ACCT_ALL.LONGITUDE_abc /
field column=ADDRESSALLLATITUDE name=ADDRESS_ACCT_ALL.LATITUDE_abc /
field column=ADDRESSALLADDRNAME name=ADDRESS_ACCT_ALL.ADDR_NAME_abc /
field column=ADDRESSALLCITY name=ADDRESS_ACCT_ALL.CITY_abc /
field column=ADDRESSALLSTATE name=ADDRESS_ACCT_ALL.STATE_abc /
field column=ADDRESSALLEMAILADDR name=ADDRESS_ACCT_ALL.EMAIL_ADDR_abc
/

/entity



!-- Varaibles --
!-- '${dataimporter.last_index_time}' --
/document
/dataConfig






On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo 
candygram.for.mo...@gmail.com wrote:

 In this case we are indexing an Oracle database.

 We do not include the data-config.xml in our distribution.  We store the
 database information in the database.xml file.  I have attached the
 database.xml file.

 When we use the default merge policy settings, we get the same results.



 We have not tried to dump the table to a comma separated file.  We think
 that dumping this size table to disk will introduce other memory problems
 with big file management. We have not tested that case.


 On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 Which database are you using? Can you send us data-config.xml?

 What happens when you use default merge policy settings?

 What happens when you dump your table to Comma Separated File and fed
 that file to solr?

 Ahmet

 On Friday, April 4, 2014 5:10 PM, Candygram For Mongo 
 candygram.for.mo...@gmail.com wrote:

 The ramBufferSizeMB was set to 6MB only on the test system to make the
 system crash sooner.  In production that tag is commented out which
 I believe forces the default value to be used.




 On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,
 
 out of curiosity, why did you set ramBufferSizeMB to 6?
 
 Ahmet
 
 
 
 
 
 On Friday, April 4, 2014 3:27 AM, Candygram For Mongo 
 candygram.for.mo...@gmail.com wrote:
 *Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception
 
 *SOLR/Lucene version: *4.2.1*
 
 
 *JVM version:
 
 Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
 
 Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
 
 
 
 *Indexer startup command:
 
 set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m
 
 
 
 java  %JVMARGS% ^
 
 -Dcom.sun.management.jmxremote.port=1092 ^
 
 -Dcom.sun.management.jmxremote.ssl=false ^
 
 -Dcom.sun.management.jmxremote.authenticate=false ^
 
 -jar start.jar
 
 
 
 *SOLR indexing HTTP 

Re: Full Indexing is Causing a Java Heap Out of Memory Exception

2014-04-04 Thread Ahmet Arslan
Hi,

To disable auto commit remove both autoCommit and autoSoftCommit 
parts/definitions from solrconfig.xml

To disable tlog remove  
   updateLog
      str name=dir${solr.ulog.dir:}/str
    /updateLog

from solrconfig.xml

To commit at the end use commit=true parameter. ?commit=truecommand=full-import
There is a checkbox for this in data import admin page.



On Saturday, April 5, 2014 1:27 AM, Candygram For Mongo 
candygram.for.mo...@gmail.com wrote:
I might have forgot to mention that we are using the DataImportHandler.  I
think we know how to remove auto commit.  How would we force a commit at
the end?


On Fri, Apr 4, 2014 at 3:18 PM, Candygram For Mongo 
candygram.for.mo...@gmail.com wrote:

 We would be happy to try that.  That sounds counter intuitive for the high
 volume of records we have.  Can you help me understand how that might solve
 our problem?



 On Fri, Apr 4, 2014 at 2:34 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 Can you remove auto commit for bulk import. Commit at the very end?

 Ahmet



 On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo 
 candygram.for.mo...@gmail.com wrote:
 In case the attached database.xml file didn't show up, I have pasted in
 the
 contents below:

 dataConfig
 dataSource
 name=org_only
 type=JdbcDataSource
 driver=oracle.jdbc.OracleDriver
 url=jdbc:oracle:thin:@test2.abc.com:1521:ORCL
 user=admin
 password=admin
 readOnly=false
 batchSize=100
 /
 document


 entity name=full-index query=
 select

 NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(100)), 'null')
 as SOLR_ID,

 'ORCL.ADDRESS_ACCT_ALL'
 as SOLR_CATEGORY,

 NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(255)), ' ') as
 ADDRESSALLROWID,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as
 ADDRESSALLADDRTYPECD,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.LONGITUDE as varchar2(255)), ' ') as
 ADDRESSALLLONGITUDE,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.LATITUDE as varchar2(255)), ' ') as
 ADDRESSALLLATITUDE,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_NAME as varchar2(255)), ' ') as
 ADDRESSALLADDRNAME,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.CITY as varchar2(255)), ' ') as
 ADDRESSALLCITY,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.STATE as varchar2(255)), ' ') as
 ADDRESSALLSTATE,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.EMAIL_ADDR as varchar2(255)), ' ') as
 ADDRESSALLEMAILADDR

 from ORCL.ADDRESS_ACCT_ALL
  

 field column=SOLR_ID name=id /
 field column=SOLR_CATEGORY name=category /
 field column=ADDRESSALLROWID name=ADDRESS_ACCT_ALL.RECORD_ID_abc /
 field column=ADDRESSALLADDRTYPECD
 name=ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc /
 field column=ADDRESSALLLONGITUDE name=ADDRESS_ACCT_ALL.LONGITUDE_abc
 /
 field column=ADDRESSALLLATITUDE name=ADDRESS_ACCT_ALL.LATITUDE_abc /
 field column=ADDRESSALLADDRNAME name=ADDRESS_ACCT_ALL.ADDR_NAME_abc
 /
 field column=ADDRESSALLCITY name=ADDRESS_ACCT_ALL.CITY_abc /
 field column=ADDRESSALLSTATE name=ADDRESS_ACCT_ALL.STATE_abc /
 field column=ADDRESSALLEMAILADDR name=ADDRESS_ACCT_ALL.EMAIL_ADDR_abc
 /

 /entity



 !-- Varaibles --
 !-- '${dataimporter.last_index_time}' --
 /document
 /dataConfig






 On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo 
 candygram.for.mo...@gmail.com wrote:

  In this case we are indexing an Oracle database.
 
  We do not include the data-config.xml in our distribution.  We store the
  database information in the database.xml file.  I have attached the
  database.xml file.
 
  When we use the default merge policy settings, we get the same results.
 
 
 
  We have not tried to dump the table to a comma separated file.  We think
  that dumping this size table to disk will introduce other memory
 problems
  with big file management. We have not tested that case.
 
 
  On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan iori...@yahoo.com wrote:
 
  Hi,
 
  Which database are you using? Can you send us data-config.xml?
 
  What happens when you use default merge policy settings?
 
  What happens when you dump your table to Comma Separated File and fed
  that file to solr?
 
  Ahmet
 
  On Friday, April 4, 2014 5:10 PM, Candygram For Mongo 
  candygram.for.mo...@gmail.com wrote:
 
  The ramBufferSizeMB was set to 6MB only on the test system to make the
  system crash sooner.  In production that tag is commented out which
  I believe forces the default value to be used.
 
 
 
 
  On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan iori...@yahoo.com
 wrote:
 
  Hi,
  
  out of curiosity, why did you set ramBufferSizeMB to 6?
  
  Ahmet
  
  
  
  
  
  On Friday, April 4, 2014 3:27 AM, Candygram For Mongo 
  candygram.for.mo...@gmail.com wrote:
  *Main issue: Full Indexing is Causing a Java Heap Out of Memory
 Exception
  
  *SOLR/Lucene version: *4.2.1*
  
  
  *JVM version:
  
  Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
  
  Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
  
  
  
  *Indexer startup command:
  
  set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m
  
  
  
  java  %JVMARGS% ^
  
  

Re: Searching multivalue fields.

2014-04-04 Thread Vijay Kokatnur
I had already tested with omitTermFreqAndPositions=false .  I still got
the same error.

Is there something that I am overlooking?

On Fri, Apr 4, 2014 at 2:45 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi Vijay,

 Add omitTermFreqAndPositions=false  attribute to fieldType definitions.

 fieldType name=string class=solr.StrField
 omitTermFreqAndPositions=false sortMissingLast=true /

fieldType name=int class=solr.TrieIntField
 omitTermFreqAndPositions=false precisionStep=0
 positionIncrementGap=0/

 You don't need termVectors  for this.

1.2: omitTermFreqAndPositions attribute introduced, true by default
 except for text fields.

 And please reply to solr user mail, so others can use the threat later on.

 Ahmet
   On Saturday, April 5, 2014 12:18 AM, Vijay Kokatnur 
 kokatnur.vi...@gmail.com wrote:
   Hey Ahmet,

 Sorry it took some time to test this.  But schema definition seem to
 conflict with SpanQuery.  I get following error when I use Spans

  field OrderLineType was indexed without position data; cannot run
 SpanTermQuery (term=11)

 I changed field definition in the schema but can't find the right
 attribute to set this.  My last attempt was with following definition

field name=OrderLineType type=string indexed=true stored=true
 multiValued=true *termVectors=true termPositions=true
 termOffsets=true*/

  Any ideas what I am doing wrong?

 Thanks,
 -Vijay

 On Wed, Mar 26, 2014 at 1:54 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi Vijay,

 After reading the documentation it seems that following query is what you
 are after. It will return OrderId:345 without matching OrderId:123

 SpanQuery q1  = new SpanTermQuery(new Term(BookingRecordId, 234));
 SpanQuery q2  = new SpanTermQuery(new Term(OrderLineType, 11));
 SpanQuery q2m new FieldMaskingSpanQuery(q2, BookingRecordId);
 Query q = new SpanNearQuery(new SpanQuery[]{q1, q2m}, -1, false);

 Ahmet



 On Wednesday, March 26, 2014 10:39 PM, Ahmet Arslan iori...@yahoo.com
 wrote:
 Hi Vijay,

 I personally don't understand joins very well. Just a guess may
 be FieldMaskingSpanQuery could be used?


 http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html


 Ahmet




 On Wednesday, March 26, 2014 9:46 PM, Vijay Kokatnur 
 kokatnur.vi...@gmail.com wrote:
 Hi,

 I am bumping this thread again one last time to see if anyone has a
 solution.

 In it's current state, our application is storing child items as multivalue
 fields.  Consider some orders, for example -


 {
 OrderId:123
 BookingRecordId : [145, 987, *234*]
 OrderLineType : [11, 12, *13*]
 .
 }
 {
 OrderId:345
 BookingRecordId : [945, 882, *234*]
 OrderLineType : [1, 12, *11*]
 .
 }
 {
 OrderId:678
 BookingRecordId : [444]
 OrderLineType : [11]
 .
 }


 Here, If you look up for an Order with BookingRecordId: 234 And
 OrderLineType:11.  You will get two orders with orderId : 123 and 345,
 which is correct.  You have two arrays in both the orders that satisfy this
 condition.

 However, for OrderId:123, the value at 3rd index of OrderLineType array is
 13 and not 11( this is for OrderId:345).  So orderId 123 should be
 excluded. This is what I am trying to achieve.

 I got some suggestions from a solr-user to use FieldsCollapsing, Join,
 Block-join or string concatenation.  None of these approaches can be used
 without re-indexing schema.

 Has anyone found a non-invasive solution for this?

 Thanks,

 -Vijay







Re: Difference between [ TO *] and [* TO *] at Solr?

2014-04-04 Thread Erick Erickson
What kind of field are you using? Not quite sure what would happen
with a date or numeric field for instance.


On Fri, Apr 4, 2014 at 10:28 AM, Furkan KAMACI furkankam...@gmail.com wrote:
 Hİ;

 What is the difference between [ TO *] and [* TO *] at Solr? (I tested it
 at 4.5.1 and numFounds are different.

 Thanks;
 Furkan KAMACI


Re: Full Indexing is Causing a Java Heap Out of Memory Exception

2014-04-04 Thread Candygram For Mongo
Guessing that the attachments won't work, I am pasting one file in each of
four separate emails.

database.xml


dataConfig
dataSource
name=org_only
type=JdbcDataSource
driver=oracle.jdbc.OracleDriver
url=jdbc:oracle:thin:@test.abcdata.com:1521:ORCL
user=admin
password=admin
readOnly=false
/
document


entity name=full-index query=
select

NVL(cast(ORACLE.ADDRESS_ALL.RECORD_ID as varchar2(100)), 'null')
as SOLR_ID,

'ORACLE.ADDRESS_ALL'
as SOLR_CATEGORY,

NVL(cast(ORACLE.ADDRESS_ALL.RECORD_ID as varchar2(255)), ' ') as
ADDRESSALLROWID,
NVL(cast(ORACLE.ADDRESS_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as
ADDRESSALLADDRTYPECD,
NVL(cast(ORACLE.ADDRESS_ALL.LONGITUDE as varchar2(255)), ' ') as
ADDRESSALLLONGITUDE,
NVL(cast(ORACLE.ADDRESS_ALL.LATITUDE as varchar2(255)), ' ') as
ADDRESSALLLATITUDE,
NVL(cast(ORACLE.ADDRESS_ALL.ADDR_NAME as varchar2(255)), ' ') as
ADDRESSALLADDRNAME,
NVL(cast(ORACLE.ADDRESS_ALL.CITY as varchar2(255)), ' ') as ADDRESSALLCITY,
NVL(cast(ORACLE.ADDRESS_ALL.STATE as varchar2(255)), ' ') as
ADDRESSALLSTATE,
NVL(cast(ORACLE.ADDRESS_ALL.EMAIL_ADDR as varchar2(255)), ' ') as
ADDRESSALLEMAILADDR

from ORACLE.ADDRESS_ALL
 

field column=SOLR_ID name=id /
field column=SOLR_CATEGORY name=category /
field column=ADDRESSALLROWID name=ADDRESS_ALL.RECORD_ID_abc /
field column=ADDRESSALLADDRTYPECD name=ADDRESS_ALL.ADDR_TYPE_CD_abc /
field column=ADDRESSALLLONGITUDE name=ADDRESS_ALL.LONGITUDE_abc /
field column=ADDRESSALLLATITUDE name=ADDRESS_ALL.LATITUDE_abc /
field column=ADDRESSALLADDRNAME name=ADDRESS_ALL.ADDR_NAME_abc /
field column=ADDRESSALLCITY name=ADDRESS_ALL.CITY_abc /
field column=ADDRESSALLSTATE name=ADDRESS_ALL.STATE_abc /
field column=ADDRESSALLEMAILADDR name=ADDRESS_ALL.EMAIL_ADDR_abc /

/entity



!-- Varaibles --
!-- '${dataimporter.last_index_time}' --
/document
/dataConfig



On Fri, Apr 4, 2014 at 4:57 PM, Candygram For Mongo 
candygram.for.mo...@gmail.com wrote:

 Does this user list allow attachments?  I have four files attached
 (database.xml, error.txt, schema.xml, solrconfig.xml).  We just ran the
 process again using the parameters you suggested, but not to a csv file.
  It errored out quickly.  We are working on the csv file run.

 Removed both autoCommit and autoSoftCommit parts/definitions from
 solrconfig.xml

 Disabled tlog by removing

updateLog
   str name=dir${solr.ulog.dir:}/str
 /updateLog

 from solrconfig.xml

 Used commit=true parameter. ?commit=truecommand=full-import


 On Fri, Apr 4, 2014 at 3:29 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 This may not solve your problem but generally it is recommended to
 disable auto commit and transaction logs for bulk indexing.
 And issue one commit at the very end. Do you tlogs enabled? I see commit
 failed in the error message thats why I am offering this.

 And regarding comma separated values, with this approach you focus on
 just solr importing process. You separate data acquisition phrase. And it
 is very fast load even big csv files
 http://wiki.apache.org/solr/UpdateCSV
 I have never experienced OOM during indexing, I suspect data acquisition
 has role in it.

 Ahmet

 On Saturday, April 5, 2014 1:18 AM, Candygram For Mongo 
 candygram.for.mo...@gmail.com wrote:

 We would be happy to try that.  That sounds counter intuitive for the
 high volume of records we have.  Can you help me understand how that might
 solve our problem?




 On Fri, Apr 4, 2014 at 2:34 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,
 
 Can you remove auto commit for bulk import. Commit at the very end?
 
 Ahmet
 
 
 
 
 On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo 
 candygram.for.mo...@gmail.com wrote:
 In case the attached database.xml file didn't show up, I have pasted in
 the
 contents below:
 
 dataConfig
 dataSource
 name=org_only
 type=JdbcDataSource
 driver=oracle.jdbc.OracleDriver
 url=jdbc:oracle:thin:@test2.abc.com:1521:ORCL
 user=admin
 password=admin
 readOnly=false
 batchSize=100
 /
 document
 
 
 entity name=full-index query=
 select
 
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(100)), 'null')
 as SOLR_ID,
 
 'ORCL.ADDRESS_ACCT_ALL'
 as SOLR_CATEGORY,
 
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(255)), ' ') as
 ADDRESSALLROWID,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as
 ADDRESSALLADDRTYPECD,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.LONGITUDE as varchar2(255)), ' ') as
 ADDRESSALLLONGITUDE,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.LATITUDE as varchar2(255)), ' ') as
 ADDRESSALLLATITUDE,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_NAME as varchar2(255)), ' ') as
 ADDRESSALLADDRNAME,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.CITY as varchar2(255)), ' ') as
 ADDRESSALLCITY,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.STATE as varchar2(255)), ' ') as
 ADDRESSALLSTATE,
 NVL(cast(ORCL.ADDRESS_ACCT_ALL.EMAIL_ADDR as varchar2(255)), ' ') as
 ADDRESSALLEMAILADDR
 
 from ORCL.ADDRESS_ACCT_ALL
  
 
 field column=SOLR_ID name=id /
 field column=SOLR_CATEGORY name=category /
 field column=ADDRESSALLROWID 

Re: Full Indexing is Causing a Java Heap Out of Memory Exception

2014-04-04 Thread Candygram For Mongo
error.txt below


Java Platform Detected x64
Java Platform Detected -XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m
 -XX:+HeapDumpOnOutOfMemoryError -XX:+CreateMinidumpOnCrash
2014-04-04 15:49:43.341:INFO:oejs.Server:jetty-8.1.8.v20121106
2014-04-04 15:49:43.353:INFO:oejdp.ScanningAppProvider:Deployment monitor
D:\AbcData\V12\application server\server\indexer\example\contexts at
interval 0
2014-04-04 15:49:43.358:INFO:oejd.DeploymentManager:Deployable added:
D:\AbcData\V12\application
server\server\indexer\example\contexts\solr-jetty-context.xml
2014-04-04 15:49:43.989:INFO:oejw.StandardDescriptorProcessor:NO JSP
Support for /solr, did not find org.apache.jasper.servlet.JspServlet
Null identity service, trying login service: null
Finding identity service: null
2014-04-04 15:49:44.011:INFO:oejsh.ContextHandler:started
o.e.j.w.WebAppContext{/solr,file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr-webapp/webapp/},D:\AbcData\V12\application
server\server\indexer\example/webapps/solr.war
2014-04-04 15:49:44.012:INFO:oejsh.ContextHandler:started
o.e.j.w.WebAppContext{/solr,file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr-webapp/webapp/},D:\AbcData\V12\application
server\server\indexer\example/webapps/solr.war
Apr 04, 2014 3:49:44 PM org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init()
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: JNDI not configured for solr (NoInitialContextEx)
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: solr home defaulted to 'solr/' (could not find system property or
JNDI)
Apr 04, 2014 3:49:44 PM org.apache.solr.core.CoreContainer$Initializer
initialize
INFO: looking for solr.xml: D:\AbcData\V12\application
server\server\indexer\example\solr\solr.xml
Apr 04, 2014 3:49:44 PM org.apache.solr.core.CoreContainer init
INFO: New CoreContainer 1879341237
Apr 04, 2014 3:49:44 PM org.apache.solr.core.CoreContainer load
INFO: Loading CoreContainer using Solr Home: 'solr/'
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader init
INFO: new SolrResourceLoader for directory: 'solr/'
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/apache-log4j-extras-1.2.17.jar'
to classloader
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/jtds-1.2.5.jar'
to classloader
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/log4j-1.2.17.jar'
to classloader
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/msbase.jar'
to classloader
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/mssqlserver.jar'
to classloader
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/msutil.jar'
to classloader
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/ojdbc6.jar'
to classloader
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/slf4j-api-1.7.5.jar'
to classloader
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/slf4j-nop-1.7.5.jar'
to classloader
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/sqljdbc4.jar'
to classloader
Apr 04, 2014 3:49:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactory getParameter
INFO: Setting socketTimeout to: 0
Apr 04, 2014 3:49:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactory getParameter
INFO: Setting urlScheme to: http://
Apr 04, 2014 3:49:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactory getParameter
INFO: Setting connTimeout to: 0
Apr 04, 2014 3:49:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactory getParameter
INFO: Setting maxConnectionsPerHost to: 20
Apr 04, 2014 3:49:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactory getParameter
INFO: Setting corePoolSize to: 0
Apr 04, 2014 3:49:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactory getParameter
INFO: Setting maximumPoolSize to: 2147483647
Apr 04, 2014 3:49:44 PM

Re: Distributed tracing for Solr via adding HTTP headers?

2014-04-04 Thread Alexandre Rafalovitch
I like the idea. No comments about implementation, leave it to others.

But if it is done, maybe somebody very familiar with logging can also
review Solr's current logging config. I suspect it is not optimized
for troubleshooting at this point.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Sat, Apr 5, 2014 at 3:16 AM, Gregg Donovan gregg...@gmail.com wrote:
 We have some metadata -- e.g. a request UUID -- that we log to every log
 line using Log4J's MDC [1]. The UUID logging allows us to connect any log
 lines we have for a given request across servers. Sort of like Zipkin [2].

 Currently we're using EmbeddedSolrServer without sharding, so adding the
 UUID is fairly simple, since everything is in one process and one thread.
 But, we're testing a sharded HTTP implementation and running into some
 difficulties getting this data passed around in a way that lets us trace
 all log lines generated by a request to its UUID.



Re: SOLR Jetty Server on Windows 2003

2014-04-04 Thread Alexandre Rafalovitch
You might be hitting
http://en.wikipedia.org/wiki/Cross-origin_resource_sharing .

Something like http://www.telerik.com/fiddler or Wireshark may allow
you to see network traffic if you don't have other means.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Sat, Apr 5, 2014 at 12:49 AM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com
wrote:
 Hi , I am trying to install solr on the Windows 2003 with Jetty server. Form 
 browser everything works , but when I try to acesss from another javascript 
 Code in other machine I am not getting reponse. I am using Xmlhttprequest to 
 get the response from server using javascript.

 Any Help...?


 --Ravi


Re: How to reduce the search speed of solrcloud

2014-04-04 Thread Alexandre Rafalovitch
And 50 million records of 3 fields each should not become 50Gb of
data. Something smells wrong there. Do you have unique IDs setup?

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Sat, Apr 5, 2014 at 12:48 AM, Anshum Gupta ans...@anshumgupta.net wrote:
 I am not sure if you setup your SolrCloud right. Can you also provide me
 with the version of Solr that you're running.
 Also, if you could tell me about how did you setup your SolrCloud cluster.
 Are the times consistent? Is this the only collection on the cluster?

 Also, if I am getting it right, you have 15 ZKs running. Correct me if I'm
 wrong, but if I'm not, you don't need that kind of a zk setup.


 On Fri, Apr 4, 2014 at 9:39 AM, Sathya sathia.blacks...@gmail.com wrote:

 Hi shawn,

 I have indexed 50 million data in 5 servers. 3 servers have 8gb ram. One
 have 24gb and another one have 64gb ram. I was allocate 4 gb ram to solr in
 each machine. I am using solrcloud. My total index size is 50gb including 5
 servers. Each server have 3 zookeepers. Still I didnt check about OS disk
 cache and heap memory. I will check and let u know shawn. If anything, pls
 let me know.

 Thank u shawn.

 On Friday, April 4, 2014, Shawn Heisey-4 [via Lucene] 
 ml-node+s472066n4129150...@n3.nabble.com wrote:
  On 4/4/2014 1:31 AM, Sathya wrote:
  Hi All,
 
  Hi All, I am new to Solr. And i dont know how to increase the search
 speed
  of solrcloud. I have indexed nearly 4 GB of data. When i am searching a
  document using java with solrj, solr takes more 6 seconds to return a
 query
  result. Any one please help me to reduce the search query time to less
 than
  500 ms. i have allocate the 4 GB ram for solr. Please let me know for
  further details about solrcloud config.
 
  How much total RAM do you have on the system, and how much total index
  data is on that system (adding up all the Solr cores)?  You've already
  said that you have allocated 4GB of RAM for Solr.
 
  Later you said you had 50 million documents, and then you showed us a
  URL that looks like SolrCloud.
 
  I suspect that you don't have enough RAM left over to cache your index
  effectively -- the OS Disk Cache is too small.
 
  http://wiki.apache.org/solr/SolrPerformanceProblems
 
  Another possible problem, also discussed on that page, is that your Java
  heap is too small.
 
  Thanks,
  Shawn
 
 
 
  
  If you reply to this email, your message will be added to the discussion
 below:
 

 http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129150.html
  To unsubscribe from How to reduce the search speed of solrcloud, click
 here.
  NAML




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129173.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --

 Anshum Gupta
 http://www.anshumgupta.net


Re: Solr Search For Documents That Has Empty Content For a Given Particular Field

2014-04-04 Thread Alexandre Rafalovitch
And one solution is to use UpdateRequestProcessor that will create a
separate binary field for presence/absence and query on that instead.

Regards,
Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, Apr 4, 2014 at 11:13 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : field :  // this is the field that I want to learn which document has
 : it.

 How you (can) query for a field value like that is going to depend
 entirely on the FieldTYpe/Analyzer ... if it's a string field, of uses
 KeywordTokenizer then q=field: should find it -- if you use a more
 traditional analyzer then it probably didn't produce any terms for hte
 input  and from Solr's perspective a document that was indexed using
 an empty string value is exactly the same as a document that had no value
 when index.

 In essenc,e your question is equivilent to asking How can i search for
 doc1, but not doc2, evne though i'm using LowerCaseAnalyzer which produces
 exactly the same indexe terms or both...

doc1: Quick Fox
doc2: quick fox



 -Hoss
 http://www.lucidworks.com/


Re: Cannot run program svnversion when building lucene 4.7.1

2014-04-04 Thread Chris Hostetter

:  I am trying to build lucene 4.7.1 from the sources. I can compile without
:  any issues but when I try to build the dist, lucene gives me
:  Cannot run program svnversion ... The system cannot find the specified
:  file.
: 
:  I am compiling on Windows 7 64-bit using java version 1.7.0.45 64-bit.

That's ... strange.

the build system *attempts* to include the svnversion info in the build 
artifacts, but it is explicitly designed to not fail if svnversion can't 
be run.

Can you pleae file a bug, note in the description your specific OS 
setup, and include as an attachment the full build logs you get from ant 
that give you this error?  ideally run ant using the -v option.


worst case scenerio: you should be able to override the svnversion.exe 
build property to some simple command that doesn't output much (not 
sure what a good command to use on windows might be - i would use 
something like whoami on linux if i din't have svn installed). 

the command would be something like this...

  ant -Dsvnversion.exe=whoami dist



-Hoss
http://www.lucidworks.com/


Re: Difference between [ TO *] and [* TO *] at Solr?

2014-04-04 Thread Jack Krupansky
And we can debate what it should or shouldn't be (and just check the 
code!) - and a clear contract is quite desirable, but this is starting to 
smell like an XY Problem - what is the user really trying to query - stated 
simply in English.


-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Friday, April 4, 2014 5:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Difference between [ TO *] and [* TO *] at Solr?

What kind of field are you using? Not quite sure what would happen
with a date or numeric field for instance.


On Fri, Apr 4, 2014 at 10:28 AM, Furkan KAMACI furkankam...@gmail.com 
wrote:

Hİ;

What is the difference between [ TO *] and [* TO *] at Solr? (I tested 
it

at 4.5.1 and numFounds are different.

Thanks;
Furkan KAMACI