Problem with DIH

2014-10-16 Thread Jay Potharaju
Hi
I 'm using DIH for updating my core. I 'm using store procedure for doing a
full/ delta imports. In order to avoid running delta imports for a long
time, i limit the rows returned to a max of 100,000 rows at a given time.
On an average the delta import runs for less than 1 minute.

For the last couple of days I have been noticing that my delta imports has
been running for couple of hours and tries to update all the records in the
core. I 'm not sure why that has been happening. I cant reproduce this
event all the time, it happens randomly.

Has anyone noticed this kind of behavior. And secondly are there any solr
logs that will tell me what is getting updated or what exactly is happening
at the DIH ?
Any suggestion appreciated.

Document size: 20 million
Solr 4.9
3 Nodes in the solr cloud.


Thanks
J


Re: Solr Synonyms, Escape space in case of multi words

2014-10-16 Thread Rajani Maski
Hi David,

  I think you should have the filter class with tokenizer specified. [As
shown below]

  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true

*tokenizerFactory=solr.KeywordTokenizerFactory/*



So your field type should be as shown below:

fieldType name=text_syn class=solr.TextField
positionIncrementGap=100
  analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true
tokenizerFactory=solr.KeywordTokenizerFactory/
  /analyzer
/fieldType


On Wed, Oct 15, 2014 at 7:25 PM, David Philip davidphilipshe...@gmail.com
wrote:

 Sorry, analysis page clip is getting trimmed off and hence the indention is
 lost.

 Here it is :

 ridemakers | ride | ridemakerz | ride | ridemark | ride | makers | makerz|
 care

 expected:

 ridemakers | ride | ridemakerz | ride | ridemark | ride | makers |
 makerz| *ride
 care*



 On Wed, Oct 15, 2014 at 7:21 PM, David Philip davidphilipshe...@gmail.com
 
 wrote:

  contd..
 
  expectation was that the ride care  should not have split into two
  tokens.
 
  It should have been as below. Please correct me/point me where I am
 wrong.
 
 
  Input : ridemakers, ride makers, ridemakerz, ride makerz, ride\mark,
 ride\
  care
 
  o/p
 
  ridemakersrideridemakerzrideridemarkridemakersmakerz
 
  *ride care*
 
 
 
 
  On Wed, Oct 15, 2014 at 7:16 PM, David Philip 
 davidphilipshe...@gmail.com
   wrote:
 
  Hi All,
 
 I remember using multi-words in synonyms in Solr 3.x version. In case
  of multi words, I was escaping space with back slash[\] and it work as
  intended.  Ex: ride\ makers, riders, rider\ guards.  Each one mapped to
  each other and so when I searched for ride makers, I obtained the search
  results for all of them. The field type was same as below. I have same
 set
  up in solr 4.10 but now the multi word space escape is getting ignored.
 It
  is tokenizing on spaces.
 
   synonyms.txt
  ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
  care
 
 
  Analysis page:
 
  ridemakersrideridemakerzrideridemarkridemakersmakerzcare
 
  Field Type
 
  fieldType name=text_syn class=solr.TextField
  positionIncrementGap=100
analyzer
  tokenizer class=solr.KeywordTokenizerFactory/
  filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt
  ignoreCase=true expand=true/
/analyzer
  /fieldType
 
 
 
  Could you please tell me what could be the issue? How do I handle
  multi-word cases?
 
 
 
 
  synonyms.txt
  ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
  care
 
 
  Thanks - David
 
 
 
 
 



Re: Get Data under Highlight Json value pair

2014-10-16 Thread bastjan

 How can I get the value , is there any option at query syntax, current I
 used h1.on and h1.fl=
 List of Fields
 highlighting:{
 :{
   CatalogSearch_en-US:[
 em
 VCM
 /em
 ],
   Name_en-US:[
 em
 VCM
 /em
  
 em
 TO
 /em
  
 em
 LAPTOP
 /em
  CABLE],
   Description_en-US:[.\n
 em
 VCM
 /em
  (
 em
 Vehicle Communication Module
 /em
 ) / VMM (
 em
 Vehicle
 /em
  Measurement 
 em
 Module
 /em
 ) to Laptop Cable.\n\nPrevious part]},
 :{
   CatalogSearch_en-US:[
 em
 VCM
 /em
  
 em
 II
 /em
 ],
   Name_en-US:[
 em
 VCM
 /em
  
 em
 II
 /em
  
 em
 DLC
 /em
  CABLE],
   Description_en-US:[.\n
 em
 VCM
 /em
  
 em
 II
 /em
  
 em
 DLC
 /em
  cable]},
 :{
   CatalogSearch_en-US:[
 em
 VCM
 /em
 ],
   Name_en-US:[8' DLC TO 
 em
 VCM
 /em
  
 em
 I
 /em
  
 em
 CABLE
 /em
 ],
   Description_en-US:[8' DLC to 
 em
 VCM
 /em
  
 em
 I
 /em
  
 em
 cable
 /em
 .]},
 
 Thanks
 
 Ravi

I know I'm a little late now ;-) Anyway, I ran into the same problem and
figured out the cause and solution: You do not set the field uniqueField
in your index, hence the key in the JSON is empty which will result in
problems when parsing the JSON string in JS, leaving only one key-value
pair. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-Data-under-Highlight-Json-value-pair-tp4149041p4164494.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How should one search on all fields? *:XX does not work

2014-10-16 Thread Jack Krupansky

Hey, what do you think this is, Elasticsearch?!!

Or... try LucidWorks Search, which supports ALL as a pseudo-field name. It 
supports * as well.


See:
https://docs.lucidworks.com/display/lweug/Field+Queries

Whether LucidWorks still supports their (my!) query parser in their new 
Fusion product is unclear - I couldn't find any reference in the doc.


-- Jack Krupansky

-Original Message- 
From: Aaron Lewis

Sent: Thursday, October 16, 2014 1:47 AM
To: solr-user@lucene.apache.org
Subject: How should one search on all fields? *:XX does not work

Hi,

I'm trying to match all fields, so I tried this:
*:XX

Is that a bad practice? It does seem to be supported either

--
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33 



Re: How to use less than and greater than in data-config file of solr

2014-10-16 Thread Ahmet Arslan
Hi,

Since it is an xml file you need to encode greater than sign. gt;

Ahmet



On Thursday, October 16, 2014 8:52 AM, madhav bahuguna 
madhav.bahug...@gmail.com wrote:
I have two tables and i want to link them using greater than and less than
condition.They have nothing in common,the only way i can link them is using
range values.Iam able to do this in Mysql but how do i do this in solr in
my data-config.xml file This is how my data-config file looks

entity name=business_colors query=SELECT business_colors_id,
business_rating_from,business_rating_to,business_text,hex_colors,rgb_colors,business_colors_modify
from business_colors where business_rating_from gt;=
'${businessmasters.business_point}' AND
business_rating_to lt; '${businessmasters.business_point}'
deltaQuery=select business_colors_id from business_colors where
business_colors_modify 
'${dih.last_index_time}'
parentDeltaQuery=select business_id from businessmasters where
business_point lt;
${business_colors.business_rating_from} AND business_point gt;=
${business_colors.business_rating_from}
field column=business_colors_id name=id/
field column=business_rating_from name=business_rating_from
indexed=true stored=true /
field column=business_rating_to name=business_rating_to
indexed=true stored=true /
field column=business_text name=business_text indexed=true
stored=true /
field column=hex_colors name=hex_colors indexed=true stored=true /
field column=rgb_colors name=rgb_colors indexed=true stored=true /
field column=business_colors_modify name=business_colors_modify
indexed=true
  stored=true/

When i click full indexing data does not get index and no error is shown.
What is wrong with this,Can any one help and advise.What i have seen is
that if i replace AND with OR it works fine or just use one condition
instead of both it works fine . Can any one advise and help How do i
achieve what i want to do.
I have also posted this question in stackoverflow
http://stackoverflow.com/questions/26397084/how-use-less-than-and-greater-than-in-data-config-file-of-solr
-- 
Regards
Madhav Bahuguna



Re: Does Solr support this?

2014-10-16 Thread Upayavira
Nope, not yet.

Someone did propose a JavascriptRequestHandler or such, which would
allow you to code such things in Javascript (obviously), but I don't
believe that has been accepted or completed yet.

Upayavira

On Thu, Oct 16, 2014, at 03:48 AM, Aaron Lewis wrote:
 Hi,
 
 I'm trying to a if first query is empty then do a second query, e.g
 
 if this returns no rows:
 title:XX AND subject:YY
 
 Then do a
 title:XX
 
 I can do that with two queries. But I'm wondering if I can merge them
 into a single one?
 
 -- 
 Best Regards,
 Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
 Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33


Re: How should one search on all fields? *:XX does not work

2014-10-16 Thread Alexandre Rafalovitch
On 16 October 2014 06:50, Jack Krupansky j...@basetechnology.com wrote:
 Hey, what do you think this is, Elasticsearch?!!

LoL. AFAIK, ElasticSearch does it by auto-copying all fields to _all_.
So, easy enough to replicate with a single copyField * - text
instruction. With the appropriate loss of precision, analyzers, etc.

I don't think eDisMax supports '*' in the fl value, does it?
Otherwise, that would be a solution.

Regards,
   Alex.


Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


Re: eDismax - boost function of multiple values

2014-10-16 Thread Jens Mayer
Hey Ahmet,

thanks for your answer.
I've read about this on the following page:
http://wiki.apache.org/solr/FunctionQuery 
Using FunctionQuery point 3:
The bf parameter actually takes a list of function queries separated by 
whitespace and each with an optional boost.

If I write it the way you suggested, the result is the same.
Only inhabitants ranked up and importance will be ignored.

greetings

 


Ahmet Arslan iori...@yahoo.com schrieb am 20:26 Dienstag, 14.Oktober 2014:
 


Hi Jens,

Where did you read that you can write it separated by white spaces?

bq and bf are both can be defined multiple times.

q=foobf=ord(inhabitants)bf=ord(importance)

Ahmet




On Tuesday, October 14, 2014 6:34 PM, Jens Mayer mjen...@yahoo.com.INVALID 
wrote:
Hey everyone,

I have a question about the boost function of solr.
The documentation say about multiple function querys that I can write it 
seperated by whitespaces.

Example: q=foobf=ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3

Now I have two fields I like to boost. Inhabitants and importance.
The field Inhabitants contain the inhabitants of citys. and the field 
importance contain a priority value - citys have the value 10, suburb the value 
5 and streets the value 1.
If I use the bf parameter I can boost inhabitants so that citys with the most 
inhabitants ranked up.

Example: q=foobf=ord(inhabitants)

The same happens if I boost importance.

Example: q=foobf=ord(importance)

But if I try to combine both so that importance and inhabitants ranked up 
only inhabitants will be ranked up and importance will be ignored.

Example: q=foobf=ord(inhabitants) ord(importance)

Knows anyone how I can fix this problem?


greetings

Boost on basis of field is present or not in found documents

2014-10-16 Thread Rahul
Where should i do changes in config files if i want to boost on the basis
of if a field is present in my found documents.

Explanation:
I have documents with fields name, address, id, number, where number may or
may not exists.
I have to rank the documents higher based on if number is not present.

I thought of writing function exists in my qf but that is not working.
I am using edismax query parser.

Thanks

-- 

Rahul Ranjan


Re: Does Solr support this?

2014-10-16 Thread Peter Keegan
I'm doing something similar with a custom search component. See SOLR-6502
https://issues.apache.org/jira/browse/SOLR-6502

On Thu, Oct 16, 2014 at 8:14 AM, Upayavira u...@odoko.co.uk wrote:

 Nope, not yet.

 Someone did propose a JavascriptRequestHandler or such, which would
 allow you to code such things in Javascript (obviously), but I don't
 believe that has been accepted or completed yet.

 Upayavira

 On Thu, Oct 16, 2014, at 03:48 AM, Aaron Lewis wrote:
  Hi,
 
  I'm trying to a if first query is empty then do a second query, e.g
 
  if this returns no rows:
  title:XX AND subject:YY
 
  Then do a
  title:XX
 
  I can do that with two queries. But I'm wondering if I can merge them
  into a single one?
 
  --
  Best Regards,
  Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
  Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33



Re: eDismax - boost function of multiple values

2014-10-16 Thread Ahmet Arslan
Hi,

I forgot one ampersand in my example. Did you add it? 
 q=foobf=ord(inhabitants)bf=ord(importance)

Ahmet



On Thursday, October 16, 2014 4:50 PM, Jens Mayer mjen...@yahoo.com.INVALID 
wrote:
Hey Ahmet,

thanks for your answer.
I've read about this on the following page:
http://wiki.apache.org/solr/FunctionQuery 
Using FunctionQuery point 3:
The bf parameter actually takes a list of function queries separated by 
whitespace and each with an optional boost.

If I write it the way you suggested, the result is the same.
Only inhabitants ranked up and importance will be ignored.

greetings







Ahmet Arslan iori...@yahoo.com schrieb am 20:26 Dienstag, 14.Oktober 2014:



Hi Jens,

Where did you read that you can write it separated by white spaces?

bq and bf are both can be defined multiple times.

q=foobf=ord(inhabitants)bf=ord(importance)

Ahmet




On Tuesday, October 14, 2014 6:34 PM, Jens Mayer mjen...@yahoo.com.INVALID 
wrote:
Hey everyone,

I have a question about the boost function of solr.
The documentation say about multiple function querys that I can write it 
seperated by whitespaces.

Example: q=foobf=ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3

Now I have two fields I like to boost. Inhabitants and importance.
The field Inhabitants contain the inhabitants of citys. and the field 
importance contain a priority value - citys have the value 10, suburb the value 
5 and streets the value 1.
If I use the bf parameter I can boost inhabitants so that citys with the most 
inhabitants ranked up.

Example: q=foobf=ord(inhabitants)

The same happens if I boost importance.

Example: q=foobf=ord(importance)

But if I try to combine both so that importance and inhabitants ranked up 
only inhabitants will be ranked up and importance will be ignored.

Example: q=foobf=ord(inhabitants) ord(importance)

Knows anyone how I can fix this problem?


greetings


Re: Boost on basis of field is present or not in found documents

2014-10-16 Thread Ahmet Arslan


Hi,

Can't you mix not, exists, if functions? 
https://cwiki.apache.org/confluence/display/solr/Function+Queries

boost=if(not(exists(number)),1,100)




On Thursday, October 16, 2014 5:13 PM, Rahul rahul1...@gmail.com wrote:
Where should i do changes in config files if i want to boost on the basis
of if a field is present in my found documents.

Explanation:
I have documents with fields name, address, id, number, where number may or
may not exists.
I have to rank the documents higher based on if number is not present.

I thought of writing function exists in my qf but that is not working.
I am using edismax query parser.

Thanks

-- 

Rahul Ranjan



Re: eDismax - boost function of multiple values

2014-10-16 Thread Garth Grimm
Spaces should work just fine.  Can you show us exactly what is happening with 
the score that leads you to the conclusion that it isn’t working?

Some testing from an example collection I have…

No boost:
http://localhost:8983/solr/collection1/select?q=text%3Abookfl=id%2Cprice%2Cyearpub%2Cscorewt=csvdefType=edismax
id,price,yearpub,score
db9780819562005,13.21,1989,0.40321594
db1562399055,17.87,2001,0.28511673
db0072519096,66.67,2008,0.28511673
db0140236392,10.88,1994,0.28511673
db04,44.99,2007,0.25200996
db07,19.77,2005,0.25200996
db0763777595,24.44,2002,0.25200996
db0879305835,43.58,2011,0.24947715
db1933550309,18.99,2004,0.24691834
db02,40.09,2009,0.21383755
Boost of just yearpub:

http://localhost:8983/solr/collection1/select?q=text%3Abookfl=id%2Cprice%2Cyearpub%2Cscorewt=csvdefType=edismaxbf=ord%28yearpub%29
id,price,yearpub,score
db0879305835,43.58,2011,11.069619
db1847195881,33.62,2010,10.635455
db02,40.09,2009,10.233932
db0072519096,66.67,2008,9.897689
db0316033723,23.1,2008,9.821208
db04,44.99,2007,9.465844
db05,44.99,2007,9.419684
db9780061336461,12.18,2007,9.398244
db07,19.77,2005,8.662797
db1933550309,18.99,2004,8.256611
boost of yearpub and price, using just a space as separator:
http://localhost:8983/solr/collection1/select?q=text%3Abookfl=id%2Cprice%2Cyearpub%2Cscorewt=csvdefType=edismaxbf=ord%28yearpub%29%20ord%28price%29
id,price,yearpub,score
db0072519096,66.67,2008,28.933228
db0879305835,43.58,2011,28.15772
db04,44.99,2007,27.414654
db05,44.99,2007,27.371819
db02,40.09,2009,27.009602
db1847195881,33.62,2010,26.636993
db9780201896831,57.43,1997,24.749598
db0767914384,37.87,1997,22.835175
db0316033723,23.1,2008,21.037462
db0763777595,24.44,2002,19.58986
Score keeps increasing with each boost.

Regards,
Garth

 Hey Ahmet,
 
 thanks for your answer.
 I've read about this on the following page:
 http://wiki.apache.org/solr/FunctionQuery 
 Using FunctionQuery point 3:
 The bf parameter actually takes a list of function queries separated by 
 whitespace and each with an optional boost.
 
 If I write it the way you suggested, the result is the same.
 Only inhabitants ranked up and importance will be ignored.
 
 greetings
 
 
 
 
 Ahmet Arslan iori...@yahoo.com schrieb am 20:26 Dienstag, 14.Oktober 2014:
 
 
 
 Hi Jens,
 
 Where did you read that you can write it separated by white spaces?
 
 bq and bf are both can be defined multiple times.
 
 q=foobf=ord(inhabitants)bf=ord(importance)
 
 Ahmet
 
 
 
 
 On Tuesday, October 14, 2014 6:34 PM, Jens Mayer mjen...@yahoo.com.INVALID 
 wrote:
 Hey everyone,
 
 I have a question about the boost function of solr.
 The documentation say about multiple function querys that I can write it 
 seperated by whitespaces.
 
 Example: q=foobf=ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3
 
 Now I have two fields I like to boost. Inhabitants and importance.
 The field Inhabitants contain the inhabitants of citys. and the field 
 importance contain a priority value - citys have the value 10, suburb the 
 value 5 and streets the value 1.
 If I use the bf parameter I can boost inhabitants so that citys with the most 
 inhabitants ranked up.
 
 Example: q=foobf=ord(inhabitants)
 
 The same happens if I boost importance.
 
 Example: q=foobf=ord(importance)
 
 But if I try to combine both so that importance and inhabitants ranked up 
 only inhabitants will be ranked up and importance will be ignored.
 
 Example: q=foobf=ord(inhabitants) ord(importance)
 
 Knows anyone how I can fix this problem?
 
 
 greetings



Add core in solr.xml | Problem with starting SOLRcloud

2014-10-16 Thread roySolr
Hello,

Our platform has 4 solr instances and 3 zookeepers(solr 4.1.0).

I want to add a new core in my solrcloud. I add the new core to the solr.xml
file:

core name=collection2 instanceDir=collection2 /

I put the config files in the directory collection2. I uploaded the new
config to zookeeper and start solr.
Solr did not start up and gives the following error:

Oct 16, 2014 4:57:06 PM org.apache.solr.cloud.ZkController publish
INFO: publishing core=collection1 state=recovering
Oct 16, 2014 4:57:06 PM org.apache.solr.cloud.ZkController publish
INFO: numShards not found on descriptor - reading it from system property
Oct 16, 2014 4:57:06 PM org.apache.solr.client.solrj.impl.HttpClientUtil
createClient
INFO: Creating new http client,
config:maxConnections=128maxConnectionsPerHost=32followRedirects=false
Oct 16, 2014 4:59:06 PM org.apache.solr.common.SolrException log
SEVERE: Error while trying to recover.
core=collection1:org.apache.solr.common.SolrException: I was asked to wait
on state recovering for 31.114.2.237:8910_solr but I still do not see the
requested state. I see state: active live:true
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:404)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:202)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:346)
at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223)

Oct 16, 2014 4:59:06 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
SEVERE: Recovery failed - trying again... (0) core=collection1
Oct 16, 2014 4:59:06 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: Wait 2.0 seconds before trying to recover again (1)
Oct 16, 2014 4:59:08 PM org.apache.solr.cloud.ZkController publish
INFO: publishing core=collection1 state=recovering
Oct 16, 2014 4:59:08 PM org.apache.solr.cloud.ZkController publish
INFO: numShards not found on descriptor - reading it from system property
Oct 16, 2014 4:59:08 PM org.apache.solr.client.solrj.impl.HttpClientUtil
createClient
INFO: Creating new http client,
config:maxConnections=128maxConnectionsPerHost=32followRedirects=false


What's wrong with my setup? Any help would be appreciated!

Roy



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Add-core-in-solr-xml-Problem-with-starting-SOLRcloud-tp4164524.html
Sent from the Solr - User mailing list archive at Nabble.com.


Custom JSON

2014-10-16 Thread Scott Dawson
Hello,
I'm trying to use the new custom JSON feature described in
https://issues.apache.org/jira/browse/SOLR-6304. I'm running Solr 4.10.1.
It seems that the new feature, or more specifically, the /update/json/docs
endpoint is not enabled out-of-the-box except in the schema-less example.
Is there some dependence of the feature on schemaless mode? I've tried
pulling the endpoint definition and related pieces of the
example-schemaless solrconfig.xml and adding those to the standard
solrconfig.xml in the main example but I've run into a cascade of issues.
Right now I'm getting a This IndexSchema is not mutable exception when I
try to post to the /update/json/docs endpoint.

My real question is -- what's the easiest way to get this feature up and
running quickly and is this documented somewhere? I'm trying to do a quick
proof-of-concept to verify that we can move from our current flat JSON
ingestion to a more natural use of structured JSON.

Thanks,
Scott Dawson


Re: Problem with DIH

2014-10-16 Thread Dan Davis
This seems a little abstract.   What I'd do is double check that the SQL is
working correctly by running the stored procedure outside of Solr and see
what you get.   You should also be able to look at the corresponding
.properties file and see the inputs used for the delta import.  If the data
import XML is called dih-example.xml, then the properties file should be
called dih-example.properties and be in the same conf directory (for the
collection).Example contents are:

#Fri Oct 10 14:53:44 EDT 2014
last_index_time=2014-10-10 14\:53\:44
healthtopic.last_index_time=2014-10-10 14\:53\:44

Again, I'm suggesting you double check that the SQL is working correctly.
If that isn't the problem, provide more details on your data import
handler, e.g. the XML with some modifications (no passwords).

On Thu, Oct 16, 2014 at 2:11 AM, Jay Potharaju jspothar...@gmail.com
wrote:

 Hi
 I 'm using DIH for updating my core. I 'm using store procedure for doing a
 full/ delta imports. In order to avoid running delta imports for a long
 time, i limit the rows returned to a max of 100,000 rows at a given time.
 On an average the delta import runs for less than 1 minute.

 For the last couple of days I have been noticing that my delta imports has
 been running for couple of hours and tries to update all the records in the
 core. I 'm not sure why that has been happening. I cant reproduce this
 event all the time, it happens randomly.

 Has anyone noticed this kind of behavior. And secondly are there any solr
 logs that will tell me what is getting updated or what exactly is happening
 at the DIH ?
 Any suggestion appreciated.

 Document size: 20 million
 Solr 4.9
 3 Nodes in the solr cloud.


 Thanks
 J



Frequent recovery of nodes in SolrCloud

2014-10-16 Thread sachinpkale
Hi,

Recently we have shifted to SolrCloud (4.10.1) from traditional Master-Slave
configuration. We have only one collection and it has only only one shard.
Cloud Cluster contains total 12 nodes (on 8 machines. On 4 machiens, we have
two instances running on each) out of which one is leader. 

Whenever I see the cluster status using http://IP:HOST/solr/#/~cloud, it
shows at least one (sometimes, it is 2-3) node status as recovering. We are
using HAProxy load balancer and there also many times, it is showing the
nodes are recovering. This is happening for all nodes in the cluster. 

What would be the problem here? How do I check this in logs?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Frequent-recovery-of-nodes-in-SolrCloud-tp4164541.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Custom JSON

2014-10-16 Thread Scott Dawson
Noble,
Thanks. You're right. I had some things incorrectly configured but now I
can put structured JSON into Solr using the out-of-the-box solrconfig.xml.

One additional question: Is there any way to query Solr and receive the
original structured JSON document in response? Or does the flattening
process that happens during indexing obliterate the original structure with
no way to reconstruct it?

Thanks again,
Scott

On Thu, Oct 16, 2014 at 2:10 PM, Noble Paul noble.p...@gmail.com wrote:

 The end point  /update/json/docs is enabled implicitly in Solr irrespective
 of the solrconfig.xml
 In schemaless the fields are created automatically by solr.

 If you have all the fields created in your schema.xml it will work .

 if you  need an id field please use a copy field to create one

 --Noble

 On Thu, Oct 16, 2014 at 8:42 PM, Scott Dawson sc.e.daw...@gmail.com
 wrote:

  Hello,
  I'm trying to use the new custom JSON feature described in
  https://issues.apache.org/jira/browse/SOLR-6304. I'm running Solr
 4.10.1.
  It seems that the new feature, or more specifically, the
 /update/json/docs
  endpoint is not enabled out-of-the-box except in the schema-less example.
  Is there some dependence of the feature on schemaless mode? I've tried
  pulling the endpoint definition and related pieces of the
  example-schemaless solrconfig.xml and adding those to the standard
  solrconfig.xml in the main example but I've run into a cascade of issues.
  Right now I'm getting a This IndexSchema is not mutable exception when
 I
  try to post to the /update/json/docs endpoint.
 
  My real question is -- what's the easiest way to get this feature up and
  running quickly and is this documented somewhere? I'm trying to do a
 quick
  proof-of-concept to verify that we can move from our current flat JSON
  ingestion to a more natural use of structured JSON.
 
  Thanks,
  Scott Dawson
 



 --
 -
 Noble Paul



Re: import solr source to eclipse

2014-10-16 Thread Dan Davis
I had a problem with the ant eclipse answer - it was unable to resolve
javax.activation for the Javadoc.  Updating
solr/contrib/dataimporthandler-extras/ivy.xml
as follows did the trick for me:

-  dependency org=javax.activation name=activation
rev=${/javax.activation/activation} conf=compile-*/
+  dependency org=javax.activation name=activation
rev=${/javax.activation/activation} conf=compile-default/

What I'm trying to do is to construct a failing Unit test for something
that I think is a bug.   But the first thing is to be able to run tests,
probably in eclipse, but the command-line might be good enough although not
ideal.


On Tue, Oct 14, 2014 at 10:38 AM, Erick Erickson erickerick...@gmail.com
wrote:

 I do exactly what Anurag mentioned, but _only_ when what
 I want to debug is, for some reason, not accessible via unit
 tests. It's very easy to do.

 It's usually much faster though to use unit tests, which you
 should be able to run from eclipse without starting a server
 at all. In IntelliJ, you just ctrl-click on the file and the menu
 gives you a choice of running or debugging the unit test, I'm
 sure Eclipse does something similar.

 There are zillions of units to choose from, and for new development
 it's a Good Thing to write the unit test first...

 Good luck!
 Erick

 On Tue, Oct 14, 2014 at 1:37 AM, Anurag Sharma anura...@gmail.com wrote:
  Another alternative is launch the jetty server from outside and attach it
  remotely from eclipse.
 
  java -Xdebug
 -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=7666
  -jar start.jar
  The above command waits until the application attach succeed.
 
 
  On Tue, Oct 14, 2014 at 12:56 PM, Rajani Maski rajinima...@gmail.com
  wrote:
 
  Configure eclipse with Jetty plugin. Create a Solr folder under your
  Solr-Java-Project and Run the project [Run as] on Jetty Server.
 
  This blog[1] may help you to configure Solr within eclipse.
 
 
  [1]
 
 http://hokiesuns.blogspot.in/2010/01/setting-up-apache-solr-in-eclipse.html
 
  On Tue, Oct 14, 2014 at 12:06 PM, Ali Nazemian alinazem...@gmail.com
  wrote:
 
   Thank you very much for your guides but how can I run solr server
 inside
   eclipse?
   Best regards.
  
   On Mon, Oct 13, 2014 at 8:02 PM, Rajani Maski rajinima...@gmail.com
   wrote:
  
Hi,
   
The best tutorial for setting up Solr[solr 4.7] in
 eclipse/intellij  is
documented in Solr In Action book, Apendix A, *Working with the Solr
codebase*
   
   
On Mon, Oct 13, 2014 at 6:45 AM, Tomás Fernández Löbbe 
tomasflo...@gmail.com wrote:
   
 The way I do this:
 From a terminal:
 svn checkout https://svn.apache.org/repos/asf/lucene/dev/trunk/
 lucene-solr-trunk
 cd lucene-solr-trunk
 ant eclipse

 ... And then, from your Eclipse import existing java project,
 and
select
 the directory where you placed lucene-solr-trunk

 On Sun, Oct 12, 2014 at 7:09 AM, Ali Nazemian 
 alinazem...@gmail.com
  
 wrote:

  Hi,
  I am going to import solr source code to eclipse for some
  development
  purpose. Unfortunately every tutorial that I found for this
 purpose
   is
  outdated and did not work. So would you please give me some hint
   about
 how
  can I import solr source code to eclipse?
  Thank you very much.
 
  --
  A.Nazemian
 

   
  
  
  
   --
   A.Nazemian
  
 



Re: Frequent recovery of nodes in SolrCloud

2014-10-16 Thread Jürgen Wagner (DVT)
Hello,
  you have one shard and 11 replicas? Hmm...

- Why you have to keep two nodes on some machines?
- Physical hardware or virtual machines?
- What is the size of this index?
- Is this all on a local network or are there links with potential
outages or failures in between?
- What is the query load?
- Have you had a look at garbage collection?
- Do you use the internal Zookeeper?
- How many nodes?
- Any observers?
- What kind of load does Zookeeper show?
- How much RAM do these nodes have available?
- Do some servers get into swapping?
- ...

How about some more details in terms of sizing and topology?

Cheers,
--Jürgen

On 16.10.2014 18:41, sachinpkale wrote:
 Hi,

 Recently we have shifted to SolrCloud (4.10.1) from traditional Master-Slave
 configuration. We have only one collection and it has only only one shard.
 Cloud Cluster contains total 12 nodes (on 8 machines. On 4 machiens, we have
 two instances running on each) out of which one is leader. 

 Whenever I see the cluster status using http://IP:HOST/solr/#/~cloud, it
 shows at least one (sometimes, it is 2-3) node status as recovering. We are
 using HAProxy load balancer and there also many times, it is showing the
 nodes are recovering. This is happening for all nodes in the cluster. 

 What would be the problem here? How do I check this in logs?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Frequent-recovery-of-nodes-in-SolrCloud-tp4164541.html
 Sent from the Solr - User mailing list archive at Nabble.com.


-- 

Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
уважением
*i.A. Jürgen Wagner*
Head of Competence Center Intelligence
 Senior Cloud Consultant

Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
E-Mail: juergen.wag...@devoteam.com
mailto:juergen.wag...@devoteam.com, URL: www.devoteam.de
http://www.devoteam.de/


Managing Board: Jürgen Hatzipantelis (CEO)
Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071




Complex boost statement

2014-10-16 Thread Corey Gerhardt
Edismax, solrnet

I'm thinking that solrnet is going to be my problem due to I can only sent one 
boost  parameter.

Is it possible to have a boost value:

if(exists(query({!v=BUS_CITY:regina}))(BUS_IS_NEARBY),20,1)

Thanks,

Corey



Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

2014-10-16 Thread S.L
Shawn,

Please find the answers to your questions.

1. Java Version :java version 1.7.0_51
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)

2.OS
CentOS Linux release 7.0.1406 (Core)

3. Everything is 64 bit , OS , Java , and CPU.

4. Java Args.
-Djava.io.tmpdir=/opt/tomcat1/temp
-Dcatalina.home=/opt/tomcat1
-Dcatalina.base=/opt/tomcat1
-Djava.endorsed.dirs=/opt/tomcat1/endorsed
-DzkHost=server1.mydomain.com:2181,server2.mydomain.com:2181,
server3.mydomain.com:2181
-DzkClientTimeout=2
-DhostContext=solr
-Dport=8081
-Dhost=server1.mydomain.com
-Dsolr.solr.home=/opt/solr/home1
-Dfile.encoding=UTF8
-Duser.timezone=UTC
-XX:+UseG1GC
-XX:MaxPermSize=128m
-XX:PermSize=64m
-Xmx2048m
-Xms128m
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Djava.util.logging.config.file=/opt/tomcat1/conf/logging.properties

5. Zookeeper ensemble has 3 zookeeper instances , which are external and
are not embedded.


6. Container : I am using Tomcat Apache Tomcat Version 7.0.42

*Additional Observations:*

I queries all docs on both replicas with distrib=falsefl=idsort=id+asc,
then compared the two lists, I could see by eyeballing the first few lines
of ids in both the lists ,I could say that even though each list has equal
number of documents i.e 96309 each , but the document ids in them seem to
be *mutually exclusive* ,  , I did not find even a single  common id in
those lists , I tried at least 15 manually ,it looks like to me that the
replicas are disjoint sets.

Thanks.



On Thu, Oct 16, 2014 at 1:41 AM, Shawn Heisey apa...@elyograg.org wrote:

 On 10/15/2014 10:24 PM, S.L wrote:

 Yes , I tried those two queries with distrib=false , I get 0 results for
 first and 1 result  for the second query( (i.e. server 3 shard 2 replica
 2)  consistently.

 However if I run the same second query (i.e. server 3 shard 2 replica 2)
 with distrib=true, I sometimes get a result and sometimes not , should'nt
 this query always return a result when its pointing to a core that seems
 to
 have that document regardless of distrib=true or false ?

 Unfortunately I dont see anything particular in the logs to point to any
 information.

 BTW you asked me to replace the request handler , I use the select request
 handler ,so I cannot replace it with anything else , is that  a problem ?


 If you send the query with distrib=true (which is the default value in
 SolrCloud), then it treats it just as if you had sent it to
 /solr/collection instead of /solr/collection_shardN_replicaN, so it's a
 full distributed query. The distrib=false is required to turn that behavior
 off and ONLY query the index on the actual core where you sent it.

 I only said to replace those things as appropriate.  Since you are using
 /select, it's no problem that you left it that way. If I were to assume
 that you used /select, but you didn't, the URLs as I wrote them might not
 have worked.

 As discussed, this means that your replicas are truly out of sync.  It's
 difficult to know what caused it, especially if you can't see anything in
 the log when you indexed the missing documents.

 We know you're on Solr 4.10.1.  This means that your Java is a 1.7
 version, since Java7 is required.

 Here's where I ask a whole lot of questions about your setup. What is the
 precise Java version, and which vendor's Java are you using?  What
 operating system is it on?  Is everything 64-bit, or is any piece (CPU, OS,
 Java) 32-bit?  On the Solr admin UI dashboard, it lists all parameters used
 when starting Java, labelled as Args.  Can you include those?  Is
 zookeeper external, or embedded in Solr?  Is it a 3-server (or more)
 ensemble?  Are you using the example jetty, or did you provide your own
 servlet container?

 We recommend 64-bit Oracle Java, the latest 1.7 version.  OpenJDK (since
 version 1.7.x) should be pretty safe as well, but IBM's Java should be
 avoided.  IBM does very aggressive runtime optimizations.  These can make
 programs run faster, but they are known to negatively affect Lucene/Solr.

 Thanks,
 Shawn




Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

2014-10-16 Thread Shawn Heisey

On 10/16/2014 6:27 PM, S.L wrote:

1. Java Version :java version 1.7.0_51
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)


I believe that build 51 is one of those that is known to have bugs 
related to Lucene.  If you can upgrade this to 67, that would be good, 
but I don't know that it's a pressing matter.  It looks like the Oracle 
JVM, which is good.



2.OS
CentOS Linux release 7.0.1406 (Core)

3. Everything is 64 bit , OS , Java , and CPU.

4. Java Args.
 -Djava.io.tmpdir=/opt/tomcat1/temp
 -Dcatalina.home=/opt/tomcat1
 -Dcatalina.base=/opt/tomcat1
 -Djava.endorsed.dirs=/opt/tomcat1/endorsed
 -DzkHost=server1.mydomain.com:2181,server2.mydomain.com:2181,
server3.mydomain.com:2181
 -DzkClientTimeout=2
 -DhostContext=solr
 -Dport=8081
 -Dhost=server1.mydomain.com
 -Dsolr.solr.home=/opt/solr/home1
 -Dfile.encoding=UTF8
 -Duser.timezone=UTC
 -XX:+UseG1GC
 -XX:MaxPermSize=128m
 -XX:PermSize=64m
 -Xmx2048m
 -Xms128m
 -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
 -Djava.util.logging.config.file=/opt/tomcat1/conf/logging.properties


I would not use the G1 collector myself, but with the heap at only 2GB, 
I don't know that it matters all that much.  Even a worst-case 
collection probably is not going to take more than a few seconds, and 
you've already increased the zookeeper client timeout.


http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning


5. Zookeeper ensemble has 3 zookeeper instances , which are external and
are not embedded.


6. Container : I am using Tomcat Apache Tomcat Version 7.0.42

*Additional Observations:*

I queries all docs on both replicas with distrib=falsefl=idsort=id+asc,
then compared the two lists, I could see by eyeballing the first few lines
of ids in both the lists ,I could say that even though each list has equal
number of documents i.e 96309 each , but the document ids in them seem to
be *mutually exclusive* ,  , I did not find even a single  common id in
those lists , I tried at least 15 manually ,it looks like to me that the
replicas are disjoint sets.


Are you sure you hit both replicas of the same shard number?  If you 
are, then it sounds like something is going wrong with your document 
routing, or maybe your clusterstate is really messed up.  Recreating the 
collection from scratch and doing a full reindex might be a good plan 
... assuming this is possible for you.  You could create a whole new 
collection, and then when you're ready to switch, delete the original 
collection and create an alias so your app can still use the old name.


How much total RAM do you have on these systems, and how large are those 
index shards?  With a shard having 96K documents, it sounds like your 
whole index is probably just shy of 300K documents.


Thanks,
Shawn



Re: import solr source to eclipse

2014-10-16 Thread Erick Erickson
Sorry, not an Eclipse guy, I'll have to wait for them to chime in...

Kudos for trying to construct a unit test illustrating the error
though, that'll be a great help!

Erick

On Thu, Oct 16, 2014 at 4:14 PM, Dan Davis dansm...@gmail.com wrote:
 I had a problem with the ant eclipse answer - it was unable to resolve
 javax.activation for the Javadoc.  Updating
 solr/contrib/dataimporthandler-extras/ivy.xml
 as follows did the trick for me:

 -  dependency org=javax.activation name=activation
 rev=${/javax.activation/activation} conf=compile-*/
 +  dependency org=javax.activation name=activation
 rev=${/javax.activation/activation} conf=compile-default/

 What I'm trying to do is to construct a failing Unit test for something
 that I think is a bug.   But the first thing is to be able to run tests,
 probably in eclipse, but the command-line might be good enough although not
 ideal.


 On Tue, Oct 14, 2014 at 10:38 AM, Erick Erickson erickerick...@gmail.com
 wrote:

 I do exactly what Anurag mentioned, but _only_ when what
 I want to debug is, for some reason, not accessible via unit
 tests. It's very easy to do.

 It's usually much faster though to use unit tests, which you
 should be able to run from eclipse without starting a server
 at all. In IntelliJ, you just ctrl-click on the file and the menu
 gives you a choice of running or debugging the unit test, I'm
 sure Eclipse does something similar.

 There are zillions of units to choose from, and for new development
 it's a Good Thing to write the unit test first...

 Good luck!
 Erick

 On Tue, Oct 14, 2014 at 1:37 AM, Anurag Sharma anura...@gmail.com wrote:
  Another alternative is launch the jetty server from outside and attach it
  remotely from eclipse.
 
  java -Xdebug
 -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=7666
  -jar start.jar
  The above command waits until the application attach succeed.
 
 
  On Tue, Oct 14, 2014 at 12:56 PM, Rajani Maski rajinima...@gmail.com
  wrote:
 
  Configure eclipse with Jetty plugin. Create a Solr folder under your
  Solr-Java-Project and Run the project [Run as] on Jetty Server.
 
  This blog[1] may help you to configure Solr within eclipse.
 
 
  [1]
 
 http://hokiesuns.blogspot.in/2010/01/setting-up-apache-solr-in-eclipse.html
 
  On Tue, Oct 14, 2014 at 12:06 PM, Ali Nazemian alinazem...@gmail.com
  wrote:
 
   Thank you very much for your guides but how can I run solr server
 inside
   eclipse?
   Best regards.
  
   On Mon, Oct 13, 2014 at 8:02 PM, Rajani Maski rajinima...@gmail.com
   wrote:
  
Hi,
   
The best tutorial for setting up Solr[solr 4.7] in
 eclipse/intellij  is
documented in Solr In Action book, Apendix A, *Working with the Solr
codebase*
   
   
On Mon, Oct 13, 2014 at 6:45 AM, Tomás Fernández Löbbe 
tomasflo...@gmail.com wrote:
   
 The way I do this:
 From a terminal:
 svn checkout https://svn.apache.org/repos/asf/lucene/dev/trunk/
 lucene-solr-trunk
 cd lucene-solr-trunk
 ant eclipse

 ... And then, from your Eclipse import existing java project,
 and
select
 the directory where you placed lucene-solr-trunk

 On Sun, Oct 12, 2014 at 7:09 AM, Ali Nazemian 
 alinazem...@gmail.com
  
 wrote:

  Hi,
  I am going to import solr source code to eclipse for some
  development
  purpose. Unfortunately every tutorial that I found for this
 purpose
   is
  outdated and did not work. So would you please give me some hint
   about
 how
  can I import solr source code to eclipse?
  Thank you very much.
 
  --
  A.Nazemian
 

   
  
  
  
   --
   A.Nazemian
  
 



Re: Frequent recovery of nodes in SolrCloud

2014-10-16 Thread Erick Erickson
And what is your zookeeper timeout? When it's too short that can lead
to this behavior.

Best,
Erick

On Thu, Oct 16, 2014 at 4:34 PM, Jürgen Wagner (DVT)
juergen.wag...@devoteam.com wrote:
 Hello,
   you have one shard and 11 replicas? Hmm...

 - Why you have to keep two nodes on some machines?
 - Physical hardware or virtual machines?
 - What is the size of this index?
 - Is this all on a local network or are there links with potential outages
 or failures in between?
 - What is the query load?
 - Have you had a look at garbage collection?
 - Do you use the internal Zookeeper?
 - How many nodes?
 - Any observers?
 - What kind of load does Zookeeper show?
 - How much RAM do these nodes have available?
 - Do some servers get into swapping?
 - ...

 How about some more details in terms of sizing and topology?

 Cheers,
 --Jürgen


 On 16.10.2014 18:41, sachinpkale wrote:

 Hi,

 Recently we have shifted to SolrCloud (4.10.1) from traditional Master-Slave
 configuration. We have only one collection and it has only only one shard.
 Cloud Cluster contains total 12 nodes (on 8 machines. On 4 machiens, we have
 two instances running on each) out of which one is leader.

 Whenever I see the cluster status using http://IP:HOST/solr/#/~cloud, it
 shows at least one (sometimes, it is 2-3) node status as recovering. We are
 using HAProxy load balancer and there also many times, it is showing the
 nodes are recovering. This is happening for all nodes in the cluster.

 What would be the problem here? How do I check this in logs?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Frequent-recovery-of-nodes-in-SolrCloud-tp4164541.html
 Sent from the Solr - User mailing list archive at Nabble.com.



 --

 Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
 уважением
 i.A. Jürgen Wagner
 Head of Competence Center Intelligence
  Senior Cloud Consultant

 Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
 Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
 E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de

 
 Managing Board: Jürgen Hatzipantelis (CEO)
 Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
 Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071




Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

2014-10-16 Thread S.L
Shawn ,


   1. I will upgrade to 67 JVM  shortly .
   2. This is  a new collection as , I was facing a similar issue in 4.7
   and based on Erick's recommendation I updated to 4.10.1 and created a new
   collection.
   3. Yes, I am hitting the replicas of the same shard and I see the lists
   are completely non overlapping.I am using CloudSolrServer to add the
   documents.
   4. I have a 3 physical node cluster , with each having 16GB in memory.
   5. I also have a custom request handler defined in my solrconfig.xml as
   below , however I am not using that and I am only using the default select
   handler, but my MyCustomHandler class has been been added to the source and
   included in the build , but not being used for any requests yet.

  requestHandler name=/mycustomselect class=solr.MyCustomHandler
startup=lazy
lst name=defaults
  str name=dfsuggestAggregate/str

  str name=spellcheck.dictionarydirect/str
  !--str name=spellcheck.dictionarywordbreak/str--
  str name=spellcheckon/str
  str name=spellcheck.extendedResultstrue/str
  str name=spellcheck.count10/str
  str name=spellcheck.alternativeTermCount5/str
  str name=spellcheck.maxResultsForSuggest5/str
  str name=spellcheck.collatetrue/str
  str name=spellcheck.collateExtendedResultstrue/str
  str name=spellcheck.maxCollationTries10/str
  str name=spellcheck.maxCollations5/str
/lst
arr name=last-components
  strspellcheck/str
/arr
  /requestHandler


5. The clusterstate.json is copied below

{dyCollection1:{
shards:{
  shard1:{
range:8000-d554,
state:active,
replicas:{
  core_node3:{
state:active,
core:dyCollection1_shard1_replica1,
node_name:server3.mydomain.com:8082_solr,
base_url:http://server3.mydomain.com:8082/solr},
  core_node4:{
state:active,
core:dyCollection1_shard1_replica2,
node_name:server2.mydomain.com:8081_solr,
base_url:http://server2.mydomain.com:8081/solr;,
leader:true}}},
  shard2:{
range:d555-2aa9,
state:active,
replicas:{
  core_node1:{
state:active,
core:dyCollection1_shard2_replica1,
node_name:server1.mydomain.com:8081_solr,
base_url:http://server1.mydomain.com:8081/solr;,
leader:true},
  core_node6:{
state:active,
core:dyCollection1_shard2_replica2,
node_name:server3.mydomain.com:8081_solr,
base_url:http://server3.mydomain.com:8081/solr}}},
  shard3:{
range:2aaa-7fff,
state:active,
replicas:{
  core_node2:{
state:active,
core:dyCollection1_shard3_replica2,
node_name:server1.mydomain.com:8082_solr,
base_url:http://server1.mydomain.com:8082/solr;,
leader:true},
  core_node5:{
state:active,
core:dyCollection1_shard3_replica1,
node_name:server2.mydomain.com:8082_solr,
base_url:http://server2.mydomain.com:8082/solr,
maxShardsPerNode:1,
router:{name:compositeId},
replicationFactor:2,
autoAddReplicas:false}}

  Thanks!

On Thu, Oct 16, 2014 at 9:02 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 10/16/2014 6:27 PM, S.L wrote:

 1. Java Version :java version 1.7.0_51
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)


 I believe that build 51 is one of those that is known to have bugs related
 to Lucene.  If you can upgrade this to 67, that would be good, but I don't
 know that it's a pressing matter.  It looks like the Oracle JVM, which is
 good.

  2.OS
 CentOS Linux release 7.0.1406 (Core)

 3. Everything is 64 bit , OS , Java , and CPU.

 4. Java Args.
  -Djava.io.tmpdir=/opt/tomcat1/temp
  -Dcatalina.home=/opt/tomcat1
  -Dcatalina.base=/opt/tomcat1
  -Djava.endorsed.dirs=/opt/tomcat1/endorsed
  -DzkHost=server1.mydomain.com:2181,server2.mydomain.com:2181,
 server3.mydomain.com:2181
  -DzkClientTimeout=2
  -DhostContext=solr
  -Dport=8081
  -Dhost=server1.mydomain.com
  -Dsolr.solr.home=/opt/solr/home1
  -Dfile.encoding=UTF8
  -Duser.timezone=UTC
  -XX:+UseG1GC
  -XX:MaxPermSize=128m
  -XX:PermSize=64m
  -Xmx2048m
  -Xms128m
  -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
  -Djava.util.logging.config.file=/opt/tomcat1/conf/logging.properties


 I would not use the G1 collector myself, but with the heap at only 2GB, I
 don't know that it matters all that much.  Even a worst-case collection
 probably is not going to take more than a few seconds, and you've already
 increased the zookeeper client timeout.

 http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

  5. 

Re: Frequent recovery of nodes in SolrCloud

2014-10-16 Thread Sachin Kale
- Why you have to keep two nodes on some machines?
- These are very powerful machines (32-Core, 64GB) and our index size
is 1GB. We are allocating 7GB to JVM, so we thought it would be OK to have
two instances on the same machine.

- Physical hardware or virtual machines?
- Physical hardware

- What is the size of this index?
- 1GB

- Is this all on a local network or are there links with potential outages
or failures in between?
- local network

- What is the query load?
- 10K requests per minute.

- Have you had a look at garbage collection?
- GC time is generally 5-10%. I have attached a screenshot.

- Do you use the internal Zookeeper?
   - No. We have setup external Zookeeper ensemble with 3 instances.
Following is the ZooKeeper configuration:

tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=192.168.70.27:2888:3888
server.2=192.168.70.64:2889:3889
server.3=192.168.70.26:2889:3889

Also, in solr.xml, we have zkClientTimeout set to 3.

- How many nodes?
- 3
- Any observers?
- I don't know what observers are. Can you please explain?

- What kind of load does Zookeeper show?
- Load is normal I guess. Need to double-check.

- How much RAM do these nodes have available?
   - Each SOLR node has 7GB allocated. For ZooKeeper, we have not allocated
the memory explicitly.

- Do some servers get into swapping?
- Not sure. How do I check that?


On Fri, Oct 17, 2014 at 2:04 AM, Jürgen Wagner (DVT) 
juergen.wag...@devoteam.com wrote:

  Hello,
   you have one shard and 11 replicas? Hmm...

 - Why you have to keep two nodes on some machines?
 - Physical hardware or virtual machines?
 - What is the size of this index?
 - Is this all on a local network or are there links with potential outages
 or failures in between?
 - What is the query load?
 - Have you had a look at garbage collection?
 - Do you use the internal Zookeeper?
 - How many nodes?
 - Any observers?
 - What kind of load does Zookeeper show?
 - How much RAM do these nodes have available?
 - Do some servers get into swapping?
 - ...

 How about some more details in terms of sizing and topology?

 Cheers,
 --Jürgen


 On 16.10.2014 18:41, sachinpkale wrote:

 Hi,

 Recently we have shifted to SolrCloud (4.10.1) from traditional Master-Slave
 configuration. We have only one collection and it has only only one shard.
 Cloud Cluster contains total 12 nodes (on 8 machines. On 4 machiens, we have
 two instances running on each) out of which one is leader.

 Whenever I see the cluster status using http://IP:HOST/solr/#/~cloud, it
 shows at least one (sometimes, it is 2-3) node status as recovering. We are
 using HAProxy load balancer and there also many times, it is showing the
 nodes are recovering. This is happening for all nodes in the cluster.

 What would be the problem here? How do I check this in logs?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Frequent-recovery-of-nodes-in-SolrCloud-tp4164541.html
 Sent from the Solr - User mailing list archive at Nabble.com.



 --

 Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
 уважением
 *i.A. Jürgen Wagner*
 Head of Competence Center Intelligence
  Senior Cloud Consultant

 Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
 Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
 E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de
 --
 Managing Board: Jürgen Hatzipantelis (CEO)
 Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
 Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071





Re: Frequent recovery of nodes in SolrCloud

2014-10-16 Thread Sachin Kale
From ZooKeeper side, we have following configuration:
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=192.168.70.27:2888:3888
server.2=192.168.70.64:2889:3889
server.3=192.168.70.26:2889:3889

Also, in solr.xml, we have zkClientTimeout set to 3.

On Fri, Oct 17, 2014 at 7:27 AM, Erick Erickson erickerick...@gmail.com
wrote:

 And what is your zookeeper timeout? When it's too short that can lead
 to this behavior.

 Best,
 Erick

 On Thu, Oct 16, 2014 at 4:34 PM, Jürgen Wagner (DVT)
 juergen.wag...@devoteam.com wrote:
  Hello,
you have one shard and 11 replicas? Hmm...
 
  - Why you have to keep two nodes on some machines?
  - Physical hardware or virtual machines?
  - What is the size of this index?
  - Is this all on a local network or are there links with potential
 outages
  or failures in between?
  - What is the query load?
  - Have you had a look at garbage collection?
  - Do you use the internal Zookeeper?
  - How many nodes?
  - Any observers?
  - What kind of load does Zookeeper show?
  - How much RAM do these nodes have available?
  - Do some servers get into swapping?
  - ...
 
  How about some more details in terms of sizing and topology?
 
  Cheers,
  --Jürgen
 
 
  On 16.10.2014 18:41, sachinpkale wrote:
 
  Hi,
 
  Recently we have shifted to SolrCloud (4.10.1) from traditional
 Master-Slave
  configuration. We have only one collection and it has only only one
 shard.
  Cloud Cluster contains total 12 nodes (on 8 machines. On 4 machiens, we
 have
  two instances running on each) out of which one is leader.
 
  Whenever I see the cluster status using http://IP:HOST/solr/#/~cloud,
 it
  shows at least one (sometimes, it is 2-3) node status as recovering. We
 are
  using HAProxy load balancer and there also many times, it is showing the
  nodes are recovering. This is happening for all nodes in the cluster.
 
  What would be the problem here? How do I check this in logs?
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Frequent-recovery-of-nodes-in-SolrCloud-tp4164541.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
  --
 
  Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
  уважением
  i.A. Jürgen Wagner
  Head of Competence Center Intelligence
   Senior Cloud Consultant
 
  Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
  Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864
 1543
  E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de
 
  
  Managing Board: Jürgen Hatzipantelis (CEO)
  Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
  Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
 
 



Re: Frequent recovery of nodes in SolrCloud

2014-10-16 Thread Sachin Kale
Also, the PingRequestHandler is configured as:

requestHandler name=/admin/ping class=solr.PingRequestHandler
str name=healthcheckFileserver-enabled.txt/str/requestHandler


On Fri, Oct 17, 2014 at 9:07 AM, Sachin Kale sachinpk...@gmail.com wrote:

 From ZooKeeper side, we have following configuration:
 tickTime=2000
 dataDir=/var/lib/zookeeper
 clientPort=2181
 initLimit=5
 syncLimit=2
 server.1=192.168.70.27:2888:3888
 server.2=192.168.70.64:2889:3889
 server.3=192.168.70.26:2889:3889

 Also, in solr.xml, we have zkClientTimeout set to 3.

 On Fri, Oct 17, 2014 at 7:27 AM, Erick Erickson erickerick...@gmail.com
 wrote:

 And what is your zookeeper timeout? When it's too short that can lead
 to this behavior.

 Best,
 Erick

 On Thu, Oct 16, 2014 at 4:34 PM, Jürgen Wagner (DVT)
 juergen.wag...@devoteam.com wrote:
  Hello,
you have one shard and 11 replicas? Hmm...
 
  - Why you have to keep two nodes on some machines?
  - Physical hardware or virtual machines?
  - What is the size of this index?
  - Is this all on a local network or are there links with potential
 outages
  or failures in between?
  - What is the query load?
  - Have you had a look at garbage collection?
  - Do you use the internal Zookeeper?
  - How many nodes?
  - Any observers?
  - What kind of load does Zookeeper show?
  - How much RAM do these nodes have available?
  - Do some servers get into swapping?
  - ...
 
  How about some more details in terms of sizing and topology?
 
  Cheers,
  --Jürgen
 
 
  On 16.10.2014 18:41, sachinpkale wrote:
 
  Hi,
 
  Recently we have shifted to SolrCloud (4.10.1) from traditional
 Master-Slave
  configuration. We have only one collection and it has only only one
 shard.
  Cloud Cluster contains total 12 nodes (on 8 machines. On 4 machiens, we
 have
  two instances running on each) out of which one is leader.
 
  Whenever I see the cluster status using http://IP:HOST/solr/#/~cloud,
 it
  shows at least one (sometimes, it is 2-3) node status as recovering. We
 are
  using HAProxy load balancer and there also many times, it is showing the
  nodes are recovering. This is happening for all nodes in the cluster.
 
  What would be the problem here? How do I check this in logs?
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Frequent-recovery-of-nodes-in-SolrCloud-tp4164541.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
  --
 
  Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
  уважением
  i.A. Jürgen Wagner
  Head of Competence Center Intelligence
   Senior Cloud Consultant
 
  Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
  Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864
 1543
  E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de
 
  
  Managing Board: Jürgen Hatzipantelis (CEO)
  Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
  Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
 
 





Re: Custom JSON

2014-10-16 Thread Noble Paul
The original json is is not stored the fields are extracted and the data is
thrown away

On Fri, Oct 17, 2014 at 1:18 AM, Scott Dawson sc.e.daw...@gmail.com wrote:

 Noble,
 Thanks. You're right. I had some things incorrectly configured but now I
 can put structured JSON into Solr using the out-of-the-box solrconfig.xml.

 One additional question: Is there any way to query Solr and receive the
 original structured JSON document in response? Or does the flattening
 process that happens during indexing obliterate the original structure with
 no way to reconstruct it?

 Thanks again,
 Scott

 On Thu, Oct 16, 2014 at 2:10 PM, Noble Paul noble.p...@gmail.com wrote:

  The end point  /update/json/docs is enabled implicitly in Solr
 irrespective
  of the solrconfig.xml
  In schemaless the fields are created automatically by solr.
 
  If you have all the fields created in your schema.xml it will work .
 
  if you  need an id field please use a copy field to create one
 
  --Noble
 
  On Thu, Oct 16, 2014 at 8:42 PM, Scott Dawson sc.e.daw...@gmail.com
  wrote:
 
   Hello,
   I'm trying to use the new custom JSON feature described in
   https://issues.apache.org/jira/browse/SOLR-6304. I'm running Solr
  4.10.1.
   It seems that the new feature, or more specifically, the
  /update/json/docs
   endpoint is not enabled out-of-the-box except in the schema-less
 example.
   Is there some dependence of the feature on schemaless mode? I've tried
   pulling the endpoint definition and related pieces of the
   example-schemaless solrconfig.xml and adding those to the standard
   solrconfig.xml in the main example but I've run into a cascade of
 issues.
   Right now I'm getting a This IndexSchema is not mutable exception
 when
  I
   try to post to the /update/json/docs endpoint.
  
   My real question is -- what's the easiest way to get this feature up
 and
   running quickly and is this documented somewhere? I'm trying to do a
  quick
   proof-of-concept to verify that we can move from our current flat JSON
   ingestion to a more natural use of structured JSON.
  
   Thanks,
   Scott Dawson
  
 
 
 
  --
  -
  Noble Paul
 




-- 
-
Noble Paul