Solr 4.2 update/extract adding unknown field, can we change field type from string to text

2013-09-03 Thread Jai
hi,

while indexing document with unknown fields, its adding unknown fields in
schema but its always guessing it as string type. is it possible to specify
default field type for unknown fields to some other type, like text so that
it gets tokenized? also can we specify other properties by default like
indexed/stored/multivalued?

PS am using solr4.2.

Thanks alot.
Jai


Re: SolrCloud - Path must not end with / character

2013-09-03 Thread Prasi S
The issue is resolved. I have given all the path inside tomcat as relative
paths( solr home, solr war). That was the creating the problem.


On Mon, Sep 2, 2013 at 2:19 PM, Prasi S prasi1...@gmail.com wrote:

 Does this have anyting to do with tomcat? I cannot go back as we already
 fixed with tomcat.

 Any suggestions pls. The same setup , if i copy and run it on a different
 machine, it works fine. Am not sure what is missing. Is it because of some
 system parameter getting set?


 On Fri, Aug 30, 2013 at 9:11 PM, Jared Griffith 
 jgriff...@picsauditing.com wrote:

 I was getting the same errors when trying to implement SolrCloud with
 Tomcat.  I eventually gave up until something came out of this thread.
 This all works if you just ditch Tomcat and go with the native Jetty
 server.


 On Fri, Aug 30, 2013 at 6:28 AM, Prasi S prasi1...@gmail.com wrote:

  Also, this fails with the default solr 4.4 downlaoded configuration too
 
 
  On Fri, Aug 30, 2013 at 4:19 PM, Prasi S prasi1...@gmail.com wrote:
 
   Below is the script i run
  
   START /MAX
   F:\SolrCloud\zookeeper\zk-server-1\zookeeper-3.4.5\bin\zkServer.cmd
  
  
   START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/*
   org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost localhost:2182
 -confdir
   solr-conf -confname solrconf1
  
  
  
   START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/*
   org.apache.solr.cloud.ZkCLI -cmd linkconfig -zkhost 127.0.0.1:2182
 -collection
  firstcollection -confname solrconf1 -solrhome ../tomcat1/solr1
  
  
  
   START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/*
   org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost localhost:2182
 -confdir
   solr-conf -confname solrconf2
  
  
  
  
   START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/*
   org.apache.solr.cloud.ZkCLI -cmd linkconfig -zkhost 127.0.0.1:2182
 -collection
  seccollection -confname solrconf2 -solrhome ../tomcat1/solr1
  
  
  
   START /MAX F:\solrcloud\tomcat1\bin\startup.bat
  
  
  
   START /MAX F:\solrcloud\tomcat2\bin\startup.bat
  
  
   On Fri, Aug 30, 2013 at 4:07 PM, Prasi S prasi1...@gmail.com wrote:
  
   Im still clueless on where the issue could be. There is no much
   information in the solr logs.
  
   i had a running version of cloud in another server. I have copied the
   same to this server, and started zookeeper, then ran teh below
 commands,
  
   java -classpath .;solr-lib/* org.apache.solr.cloud.ZkCLI -cmd
 upconfig
   -zkhost localhost:2181 -confdir solr-conf -confname solrconfindex
  
   java -classpath .;solr-lib/* org.apache.solr.cloud.ZkCLI -cmd
 linkconfig
   -zkhost 127.0.0.1:2181 -collection colindexer -confname
 solrconfindex
   -solrhome ../tomcat1/solr1
  
   After this, when i started tomcat, the first tomcat starts fine. When
  the
   second tomcat is started, i get the above exception and it stops.
 Tehn
  the
   first tomcat also shows teh same exception.
  
  
  
  
   On Thu, Aug 29, 2013 at 7:18 PM, Mark Miller markrmil...@gmail.com
  wrote:
  
   Yeah, you see this when the core could not be created. Check the
 logs
  to
   see if you can find something more useful.
  
   I ran into this again the other day - it's something we should fix.
 You
   see the same thing in the UI when a core cannot be created and it
  gives you
   no hint about the problem and is confusing.
  
   - Mark
  
   On Aug 29, 2013, at 5:23 AM, sathish_ix skandhasw...@inautix.co.in
 
   wrote:
  
Hi ,
   
Check your configuration files uploaded into zookeeper is valid
 and
  no
   error
in config files uploaded.
I think due to this error, solr core will not be created.
   
Thanks,
Sathish
   
   
   
--
View this message in context:
  
 
 http://lucene.472066.n3.nabble.com/SolrCloud-Path-must-not-end-with-character-tp4087159p4087182.html
Sent from the Solr - User mailing list archive at Nabble.com.
  
  
  
  
 



 --

 Jared Griffith
 Linux Administrator, PICS Auditing, LLC
 P: (949) 936-4574
 C: (909) 653-7814

 http://www.picsauditing.com

 17701 Cowan #140 | Irvine, CA | 92614

 Join PICS on LinkedIn and Twitter!

 https://twitter.com/PICSAuditingLLC





Problem with Synonyms

2013-09-03 Thread Christian Loock

Hello,

this is my first time writing at this mailing lost, so hello everyone.

I am having issues with synonyms.

I added the synonym to one of my field types:

|fieldType  name=text_general  class=solr.TextField  
positionIncrementGap=100
  analyzer  type=index
tokenizer  class=solr.StandardTokenizerFactory/
filter  class=solr.StopFilterFactory  ignoreCase=true  words=stopwords.txt  
enablePositionIncrements=true/
filter  class=solr.LowerCaseFilterFactory/
filter  class=solr.SynonymFilterFactory  synonyms=synonyms.txt  
ignoreCase=true  expand=true/
filter  class=solr.NGramFilterFactory  minGramSize=1  
maxGramSize=15/
  /analyzer
  analyzer  type=query
tokenizer  class=solr.StandardTokenizerFactory/
filter  class=solr.StopFilterFactory  ignoreCase=true  words=stopwords.txt  
enablePositionIncrements=true/
filter  class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType


I also added some Synonyms to my Synonyms.txt which is in the conf folder of my 
core.

However, when i look into the analyzer, the content wont be replaced and I cant 
figure out where the problem lies.

Can anybody help here? Im totally stuck...
|




Update field properties via Schema Rest API ?

2013-09-03 Thread bengates
Hello,

I'm pretty new to Solr, as a PHP developer. 
I'm still reading the tutorials for getting started with Solr, adding and
indexing data. I'm still using the example/start.jar, as I still didn't
succeed to config a true (production-ready) Solr instance. But doesn't
matter.

As I can't deal with Java, Tomcat etc, I just want to do the maximum things
with the REST API, for not having to edit any file.

I have needs in adding and editing fields frequently, so I use the Schema
Rest API.
However, the wiki http://wiki.apache.org/solr/SchemaRESTAPI explains how to
add fields, but not to update or delete them.

Can you help me ?

I really need to control (e.g. update) the properties of my fields (indexed,
stored, multiValued, etc) via the REST API, without having to edit any file
each time I need an update.

Thanks,
Ben



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Update-field-properties-via-Schema-Rest-API-tp4087907.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr cloud and DIH, indexation runs only on one shard.

2013-09-03 Thread jerome . dupont

Hello again,

I still trying to index a with solr cloud and dih. I can index but it seems
that indexation is done on only 1 shard. (my goal was to parallelze that to
go fast)
This my conf:
I have 2 tomcat instances,
One with zookeeper embedded in solr 4.4.0 started and 1 shard (port 8080)
The other with the second shard. (port 9180)
In my admin interface, I see 2 shards, each one is leader


When I launch the dih, documents are indexed. But only the shard1 is
working.
http://localhost:8080/solr-0.4.0-pfd/noticesBIBcollection/dataimportMNb?command=full-importentity=noticebiboptimize=trueindent=trueclean=truecommit=trueverbose=falsedebug=falsewt=jsonrows=1000


In my first shard, I see messages coming from my indexation process:
DEBUG 2013-09-03 11:48:57,801 Thread-12
org.apache.solr.handler.dataimport.URLDataSource  (92) - Accessing URL:
file:/X:/3/7/002/37002118.xml
DEBUG 2013-09-03 11:48:57,832 Thread-12
org.apache.solr.handler.dataimport.URLDataSource  (92) - Accessing URL:
file:/X:/3/7/002/37002120.xml
DEBUG 2013-09-03 11:48:57,966 Thread-12
org.apache.solr.handler.dataimport.LogTransformer  (58) - Notice fichier:
3/7/002/37002120.xml
DEBUG 2013-09-03 11:48:57,966 Thread-12 fr.bnf.solr.BnfDateTransformer
(696) - NN=37002120

In the second instance, I just have this kind of logs, at it was receiving
notifications from zookeeper of new updates
INFO 2013-09-03 11:48:57,323 http-9180-7
org.apache.solr.update.processor.LogUpdateProcessor  (198) - [noticesBIB]
webapp=/solr-0.4.0-pfd path=/update params=
{distrib.from=http://172.20.48.237:8080/solr-0.4.0-pfd/noticesBIB/update.distrib=TOLEADERwt=javabinversion=2}
 {add=[37001748 (1445149264874307584), 37001757 (1445149264879550464),
37001764 (1445149264883744768), 37001786 (1445149264887939072), 37001817
(1445149264891084800), 37001819 (1445149264896327680), 37001837
(1445149264900521984), 37001861 (1445149264903667712), 37001869
(1445149264907862016), 37001963 (1445149264912056320)]} 0 41

I supposed there was a confusion between cores names and collection name,
and I tried to change the name of the collection, but it solved nothing.
When I come to dih interfaces, in shard1, I see indexation processing, and
on shard 2 no information available

Is there something specia to do to distributre indexation process?
Should I run zookeeper on both instances (even if it's not mandatory?
...
Regards
Jerome



Fermeture annuelle des sites François-Mitterrand et Richelieu du 2 au 15 
septembre 2013 Avant d'imprimer, pensez à l'environnement. 

Re: Solr 4.2 update/extract adding unknown field, can we change field type from string to text

2013-09-03 Thread Shalin Shekhar Mangar
You can use the dynamic fields feature of Solr to map unknown field
names to types.

For example, a dynamic field named as *_s i.e. any field name ending
with _s can be mapped to string and so on. In your cases, if your
field names do not follow a set pattern, then you can even specify a
dynamic field as * and map it to text type.

See https://cwiki.apache.org/confluence/display/solr/Dynamic+Fields

On Tue, Sep 3, 2013 at 12:00 PM, Jai jai4l...@gmail.com wrote:
 hi,

 while indexing document with unknown fields, its adding unknown fields in
 schema but its always guessing it as string type. is it possible to specify
 default field type for unknown fields to some other type, like text so that
 it gets tokenized? also can we specify other properties by default like
 indexed/stored/multivalued?

 PS am using solr4.2.

 Thanks alot.
 Jai



-- 
Regards,
Shalin Shekhar Mangar.


Re: solr cloud and DIH, indexation runs only on one shard.

2013-09-03 Thread Shalin Shekhar Mangar
DataImportHandler does not parallelize indexing at all. It is a single
threaded indexer which runs on a single node. However, the documents
themselves are routed to the correct shard by SolrCloud. Therefore,
what you are observing on your servers is normal.

If you want to parallelize indexing then you can either:
a) Use SolrJ or an external client and write the indexing code yourself, or
b) Setup DIH in such a way that each shard indexes a disjoint subset
of data. This way, you can fire DIH full import on multiple
shard/nodes simultaneously.

One way of achieving (b) is by using request parameters to substitute
placeholders in your DIH configuration. See
http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters

On Tue, Sep 3, 2013 at 3:25 PM,  jerome.dup...@bnf.fr wrote:

 Hello again,

 I still trying to index a with solr cloud and dih. I can index but it seems
 that indexation is done on only 1 shard. (my goal was to parallelze that to
 go fast)
 This my conf:
 I have 2 tomcat instances,
 One with zookeeper embedded in solr 4.4.0 started and 1 shard (port 8080)
 The other with the second shard. (port 9180)
 In my admin interface, I see 2 shards, each one is leader


 When I launch the dih, documents are indexed. But only the shard1 is
 working.
 http://localhost:8080/solr-0.4.0-pfd/noticesBIBcollection/dataimportMNb?command=full-importentity=noticebiboptimize=trueindent=trueclean=truecommit=trueverbose=falsedebug=falsewt=jsonrows=1000


 In my first shard, I see messages coming from my indexation process:
 DEBUG 2013-09-03 11:48:57,801 Thread-12
 org.apache.solr.handler.dataimport.URLDataSource  (92) - Accessing URL:
 file:/X:/3/7/002/37002118.xml
 DEBUG 2013-09-03 11:48:57,832 Thread-12
 org.apache.solr.handler.dataimport.URLDataSource  (92) - Accessing URL:
 file:/X:/3/7/002/37002120.xml
 DEBUG 2013-09-03 11:48:57,966 Thread-12
 org.apache.solr.handler.dataimport.LogTransformer  (58) - Notice fichier:
 3/7/002/37002120.xml
 DEBUG 2013-09-03 11:48:57,966 Thread-12 fr.bnf.solr.BnfDateTransformer
 (696) - NN=37002120

 In the second instance, I just have this kind of logs, at it was receiving
 notifications from zookeeper of new updates
 INFO 2013-09-03 11:48:57,323 http-9180-7
 org.apache.solr.update.processor.LogUpdateProcessor  (198) - [noticesBIB]
 webapp=/solr-0.4.0-pfd path=/update params=
 {distrib.from=http://172.20.48.237:8080/solr-0.4.0-pfd/noticesBIB/update.distrib=TOLEADERwt=javabinversion=2}
  {add=[37001748 (1445149264874307584), 37001757 (1445149264879550464),
 37001764 (1445149264883744768), 37001786 (1445149264887939072), 37001817
 (1445149264891084800), 37001819 (1445149264896327680), 37001837
 (1445149264900521984), 37001861 (1445149264903667712), 37001869
 (1445149264907862016), 37001963 (1445149264912056320)]} 0 41

 I supposed there was a confusion between cores names and collection name,
 and I tried to change the name of the collection, but it solved nothing.
 When I come to dih interfaces, in shard1, I see indexation processing, and
 on shard 2 no information available

 Is there something specia to do to distributre indexation process?
 Should I run zookeeper on both instances (even if it's not mandatory?
 ...
 Regards
 Jerome



 Fermeture annuelle des sites François-Mitterrand et Richelieu du 2 au 15 
 septembre 2013 Avant d'imprimer, pensez à l'environnement.



-- 
Regards,
Shalin Shekhar Mangar.


Re: Problem with Synonyms

2013-09-03 Thread pravesh
SOLR has a nice analysis page. You can use it to get insight what is
happening after each filter is applied at index/search time


Regards
Pravesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-Synonyms-tp4087905p4087915.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr cloud and DIH, indexation runs only on one shard.

2013-09-03 Thread YouPeng Yang
Hi jerome.dupont

  please check what is the updateHandler in your solrconfig.xml


updateRequestProcessorChain name=sample

processor class=solr.LogUpdateProcessorFactory /

processor class=solr.NoOpDistributingUpdateProcessorFactory/ -- by
default,it is  solr.NoOpDistributingUpdateProcessorFactor

processor class=solr.RunUpdateProcessorFactory /

/updateRequestProcessorChain



requestHandler

 name=/dataimport

 class=org.apache.solr.handler.dataimport.DataImportHandler

lst name=defaults

str name=configdb-data-config.xml/str

str name=update.chainsample/str

/lst

  /requestHandler





2013/9/3 jerome.dup...@bnf.fr


 Hello again,

 I still trying to index a with solr cloud and dih. I can index but it seems
 that indexation is done on only 1 shard. (my goal was to parallelze that to
 go fast)
 This my conf:
 I have 2 tomcat instances,
 One with zookeeper embedded in solr 4.4.0 started and 1 shard (port 8080)
 The other with the second shard. (port 9180)
 In my admin interface, I see 2 shards, each one is leader


 When I launch the dih, documents are indexed. But only the shard1 is
 working.

 http://localhost:8080/solr-0.4.0-pfd/noticesBIBcollection/dataimportMNb?command=full-importentity=noticebiboptimize=trueindent=trueclean=truecommit=trueverbose=falsedebug=falsewt=jsonrows=1000


 In my first shard, I see messages coming from my indexation process:
 DEBUG 2013-09-03 11:48:57,801 Thread-12
 org.apache.solr.handler.dataimport.URLDataSource  (92) - Accessing URL:
 file:/X:/3/7/002/37002118.xml
 DEBUG 2013-09-03 11:48:57,832 Thread-12
 org.apache.solr.handler.dataimport.URLDataSource  (92) - Accessing URL:
 file:/X:/3/7/002/37002120.xml
 DEBUG 2013-09-03 11:48:57,966 Thread-12
 org.apache.solr.handler.dataimport.LogTransformer  (58) - Notice fichier:
 3/7/002/37002120.xml
 DEBUG 2013-09-03 11:48:57,966 Thread-12 fr.bnf.solr.BnfDateTransformer
 (696) - NN=37002120

 In the second instance, I just have this kind of logs, at it was receiving
 notifications from zookeeper of new updates
 INFO 2013-09-03 11:48:57,323 http-9180-7
 org.apache.solr.update.processor.LogUpdateProcessor  (198) - [noticesBIB]
 webapp=/solr-0.4.0-pfd path=/update params=
 {distrib.from=
 http://172.20.48.237:8080/solr-0.4.0-pfd/noticesBIB/update.distrib=TOLEADERwt=javabinversion=2
 }
  {add=[37001748 (1445149264874307584), 37001757 (1445149264879550464),
 37001764 (1445149264883744768), 37001786 (1445149264887939072), 37001817
 (1445149264891084800), 37001819 (1445149264896327680), 37001837
 (1445149264900521984), 37001861 (1445149264903667712), 37001869
 (1445149264907862016), 37001963 (1445149264912056320)]} 0 41

 I supposed there was a confusion between cores names and collection name,
 and I tried to change the name of the collection, but it solved nothing.
 When I come to dih interfaces, in shard1, I see indexation processing, and
 on shard 2 no information available

 Is there something specia to do to distributre indexation process?
 Should I run zookeeper on both instances (even if it's not mandatory?
 ...
 Regards
 Jerome



 Fermeture annuelle des sites François-Mitterrand et Richelieu du 2 au 15
 septembre 2013 Avant d'imprimer, pensez à l'environnement.


Re: Update field properties via Schema Rest API ?

2013-09-03 Thread Shalin Shekhar Mangar
The Schema REST API is a new feature and supports only adding fields
(and that too since Solr 4.4). It doesn't support modifying fields
yet.

On Tue, Sep 3, 2013 at 2:39 PM, bengates benga...@aliceadsl.fr wrote:
 Hello,

 I'm pretty new to Solr, as a PHP developer.
 I'm still reading the tutorials for getting started with Solr, adding and
 indexing data. I'm still using the example/start.jar, as I still didn't
 succeed to config a true (production-ready) Solr instance. But doesn't
 matter.

 As I can't deal with Java, Tomcat etc, I just want to do the maximum things
 with the REST API, for not having to edit any file.

 I have needs in adding and editing fields frequently, so I use the Schema
 Rest API.
 However, the wiki http://wiki.apache.org/solr/SchemaRESTAPI explains how to
 add fields, but not to update or delete them.

 Can you help me ?

 I really need to control (e.g. update) the properties of my fields (indexed,
 stored, multiValued, etc) via the REST API, without having to edit any file
 each time I need an update.

 Thanks,
 Ben



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Update-field-properties-via-Schema-Rest-API-tp4087907.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Regards,
Shalin Shekhar Mangar.


Starting Solr in Tomcat with specifying ZK host(s)

2013-09-03 Thread maephisto
Hi,
I've setup a ZK instance and also deployed Solr in Tomcat7 on a different
instance in Amazon EC2.
Afterwards I tried starting tomcat specifying the ZK host IP, like so:

sudo service tomcat7 start -DzkHost=zk ip:2181 -DnumShards=3
-Dcollection.configName=myconf
-Dbootstrap_confdir=/usr/share/solr/example/solr/collection1/conf

Solr loads fine, but is not in the cloud. 

Any idea what am i doing wrong here?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Starting-Solr-in-Tomcat-with-specifying-ZK-host-s-tp4087916.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Update field properties via Schema Rest API ?

2013-09-03 Thread bengates
Hello,

Thanks for your quick reply.

This is what I feared.
Do you know if this is planned for Solr 4.5 or Solr 5.0 ? 
I didn't see anything about it in the roadmap.

Thank you,
Ben



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Update-field-properties-via-Schema-Rest-API-tp4087907p4087920.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem with Synonyms

2013-09-03 Thread Christian Loock

Am 03.09.2013 12:11, schrieb pravesh:

SOLR has a nice analysis page. You can use it to get insight what is
happening after each filter is applied at index/search time


Regards
Pravesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-Synonyms-tp4087905p4087915.html
Sent from the Solr - User mailing list archive at Nabble.com.

Yeah thats the thing.

It applies the Synonym Filter, but nothing really happens.

Is there a way to see if the SF load my Synonym file?


Re: db-data-config.xml ?

2013-09-03 Thread Shalin Shekhar Mangar
Did you find any other exceptions in the logs?

When I pasted the script section of your data config into my test
setup, I got an error saying that there is an unclosed string literal
in line 6

On Tue, Sep 3, 2013 at 12:23 AM, Kunzman, Doug dkunz...@usgs.gov wrote:
 Hi -

 I'm new to Solr and am trying to combine a script:and  RegExTransformer in
 a db-dataconfig.xml that is used to ingest data into Solr.  Can anyone be
 of any help?

 There is definitly a comma between my script:add , and addRegexTransfomer
 lines.

 Any help would be appreciated.

 My db-data-config.xml looks like this?
 dataConfig
   dataSource type=JdbcDataSource driver=org.postgresql.Driver
 url=jdbc:postgresql://localhost:/test?netTimeoutForStreamingResults=24000
 autoReconnect=true user=postgres password= batchSize =10
 responseBuffering=adaptive/
 script![CDATA[
 function add(row){
 var latlon_s= row.get('longitude')+','+row.get('latitude');
 var provider = row.get('provider');
 var pointPath = '/'+
 row.get('longitude')+','+row.get('latitude')+'/'+row.get('basis_of_record');
 if ('NatureServe'.equalsIgnoreCase(provider) || 'USDA
 PLANTS'.equalsIgnoreCase(provider)) {
  pointPath += '/centroid';
  }
 row.put('latlon_s', latlon_s);
 row.put('pointPath_s',pointPath);
 var provider_id = row.get('provider_id');
 var resource_id = row.get('resource_id_s');
 var hierarchy = row.get('hierarchy_string');
 row.put('hierarchy_homonym_string', '-' + hierarchy + '-');
 row.put('BISONResourceID', '/' + provider_id + '/' + resource_id +'/');
 return row;
 }
 ]]/script

 document name=itis_to_portal.occurence
  !--entity name=occurrence  pk=id transformer=script:add
 query=select id, scientific_name, latitude, longitude, year,
 basis_of_record, provider_id,resource_id_s, occurrence_date,  tsns,
  parent_tsn, hierarchy_string, collector, ambiguous, statecomputedfips,
 countycomputedfips from itis_to_portal.solr transformer=RegexTransformer
 --
  entity name=occurrence  pk=id query=select id, scientific_name,
 latitude, longitude, year, basis_of_record, provider_id,resource_id_s,
 occurrence_date,  tsns,  parent_tsn, hierarchy_string, collector,
 ambiguous, statecomputedfips, countycomputedfips from itis_to_portal.solr
 transformer=RegexTransformer,script:add 

 and at runtime import I'm getting the following error message,
 SEVERE: Full Import failed:java.lang.RuntimeException:
 java.lang.RuntimeException:
 org.apache.solr.handler.dataimport.DataImportHandlerException: Could not
 invoke method :addRegexTransformer

 Thanks,
 Doug



-- 
Regards,
Shalin Shekhar Mangar.


Re: Apostrophes in fields

2013-09-03 Thread devendra W
in my case - the fields with apostrophe not returned in results

When I search for --  dev it shows me following results
dev
dev's
devendra

but when I search for -- dev'   (dev with apo only)
Nothing comes out as result ? 

What could be the workaround ?


Thanks
Devendra



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Apostrophes-in-fields-tp475058p4087910.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: phonetic search

2013-09-03 Thread Erick Erickson
Hmmm, seems like it should work.

First thing I'd try is using the admin interface and look at the analysis
page to
see how the input is tokenized both at index and search time, that's
sometimes
surprising.

Second, again using the browser, attach debug=query to the URL. That will
echo back what the query actually parsed to. Combined with the analysis
page this last bit of information is often enough to figure it out. If that
doesn't
show it, please cut/paste the results back here

You can do the same from your SolrJ program, but the admin UI is usually
faster.

Best
Erick


On Mon, Sep 2, 2013 at 3:25 PM, Sergio Stateri stat...@gmail.com wrote:

 Thanks Erick,

 I´m trying to looking for english texts now. I put a field type like this:

 fieldtype name=myPhonetic stored=false indexed=true
 class=solr.TextField 
   analyzer
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone
 inject=true/
   /analyzer
 /fieldtype
 ...
 field name=descricaoRoteiroPhonetic type=myPhonetic indexed=true
 required=true stored=true/


 Then I´m trying to find CITY, like this:

 SolrQuery query = new SolrQuery();
 query.setQuery((descricaoRoteiroPhonetic:CITY) );
 QueryResponse rsp = server.query( query );
 QueryResponse rsp = server.query( query );


 But there is not results, and I have a lot of documents with CITY in the
 descricaoRoteiroPhonetic field.
 Do you know what I´m doing wrong?

 Thanks a lot.

 Sergio Stateri Junior.


 2013/9/2 Erick Erickson erickerick...@gmail.com

  What you need to do is include one of the phonetic
  filters in your analysis chain, see:
 
 
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PhoneticFilterFactory
  All you've done with the stemmer is make
  things like (sorry, English examples are all I can do)
  running, runner etc. be indexed and searched as run,
  not phonetic processing
 
  There are several variants, each uses a different
  algorithms at the link above. Not sure what to tweak
  for handling Brazilian Portuguese though...
 
  Best
  Erick
 
 
  On Mon, Sep 2, 2013 at 1:41 PM, Sergio Stateri stat...@gmail.com
 wrote:
 
   Please,
  
   How can I make I phonetic search in Solr with portuguese (brazilian)
   language?
  
   I tryied including this field type:
  
   fieldType name=brazilianPhonetic class=solr.TextField
   sortMissingLast=true omitNorms=true
   analyzer type=index
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.BrazilianStemFilterFactory/
   /analyzer
   /fieldType
   ...
   field name=descricaoRoteiroPhonetic type=brazilianPhonetic
   multiValued=true indexed=true required=true stored=true/
  
  
   But this didn´t work. I have no idea about how to make a phonetic
 search.
   I´m using Solr 4.
  
  
  
   Thanks in advance,
   --
   Sergio Stateri Jr.
   stat...@gmail.com
  
 



 --
 Sergio Stateri Jr.
 stat...@gmail.com



Re: Update field properties via Schema Rest API ?

2013-09-03 Thread Erick Erickson
Is editing a text file really all that onerous? You can edit the
schema.xml file with any editor you're comfortable with and
issue the core RELOAD command in the interim.

Best
Erick


On Tue, Sep 3, 2013 at 6:20 AM, bengates benga...@aliceadsl.fr wrote:

 Hello,

 Thanks for your quick reply.

 This is what I feared.
 Do you know if this is planned for Solr 4.5 or Solr 5.0 ?
 I didn't see anything about it in the roadmap.

 Thank you,
 Ben



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Update-field-properties-via-Schema-Rest-API-tp4087907p4087920.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Problem with Synonyms

2013-09-03 Thread Erick Erickson
Please explain exactly what but nothing
really happens means. Do you mean that
you see the SF in the analysis page but there
are no substitutions? Or you don't get
search results? Or???

You have to reload the core after making changes
at minimum, you can restart the Solr instance
if you're paranoid. And you have to re-index
for changes in the index part of the analysis
chain to take effect.

Best
Erick


On Tue, Sep 3, 2013 at 6:33 AM, Christian Loock c...@vkf-renzel.de wrote:

 Am 03.09.2013 12:11, schrieb pravesh:

  SOLR has a nice analysis page. You can use it to get insight what is
 happening after each filter is applied at index/search time


 Regards
 Pravesh



 --
 View this message in context: http://lucene.472066.n3.**
 nabble.com/Problem-with-**Synonyms-tp4087905p4087915.**htmlhttp://lucene.472066.n3.nabble.com/Problem-with-Synonyms-tp4087905p4087915.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 Yeah thats the thing.

 It applies the Synonym Filter, but nothing really happens.

 Is there a way to see if the SF load my Synonym file?



Re: Problem with Synonyms

2013-09-03 Thread Christian Loock

The SF part is in the analysis page but nothing is substituted.

I reloaded, removed and readded the core, reindexednothing worked :(

I wonder if the SF actually uses the correct file for synonyms. I have 
it laying in the conf folder of the core. Is that correct?


Am 03.09.2013 13:32, schrieb Erick Erickson:

Please explain exactly what but nothing
really happens means. Do you mean that
you see the SF in the analysis page but there
are no substitutions? Or you don't get
search results? Or???

You have to reload the core after making changes
at minimum, you can restart the Solr instance
if you're paranoid. And you have to re-index
for changes in the index part of the analysis
chain to take effect.

Best
Erick


On Tue, Sep 3, 2013 at 6:33 AM, Christian Loock c...@vkf-renzel.de wrote:


Am 03.09.2013 12:11, schrieb pravesh:

  SOLR has a nice analysis page. You can use it to get insight what is

happening after each filter is applied at index/search time


Regards
Pravesh



--
View this message in context: http://lucene.472066.n3.**
nabble.com/Problem-with-**Synonyms-tp4087905p4087915.**htmlhttp://lucene.472066.n3.nabble.com/Problem-with-Synonyms-tp4087905p4087915.html
Sent from the Solr - User mailing list archive at Nabble.com.


Yeah thats the thing.

It applies the Synonym Filter, but nothing really happens.

Is there a way to see if the SF load my Synonym file?





Memory usage during aggregation - SolrCloud with very large numbers of facet terms.

2013-09-03 Thread Jackson, Andrew
Hi,

 

We have a large, sharded SolrCloud index of 300 million documents which
we use to explore our web archives. We want to facet on fields that have
very large numbers of distinct values, e.g. host names and domain names
of pages and links. Thus, overall, we expect to have millions of
distinct terms for those fields. We also want to sort on other fields
(e.g. date of harvest).

 

We have experimented with various RAM and facet configurations, and are
currently finding facet.method=enum + minDf to be more stable than fc.
We currently have eight shards, and although the queries are slow, we
are finding individual shards to be fairly reliable with a few GB of RAM
(about 5GB per shard right now). This seems to be consistent with
guidelines for estimating RAM usage (e.g.
http://stackoverflow.com/questions/4499630/solr-faceted-navigation-on-la
rge-index).

 

However, the Solr instance we direct our  client query to is consuming
significantly more RAM (10GB) and is still failing after a few queries
when it runs out of heap space. This is presumably due to the role it
plays, aggregating the results from each shard. Is there any way we can
estimate the amount of RAM that server will need?

 

Alternatively, given our dataset, should be we pursuing a different
approach? Should we re-index with the facet partition size set to
something smaller (e.g. 10,000 rather than Integer.MAX_VALUE)? Should we
be using facet.method=fc and buying more RAM?

 

 

Best wishes,

Andy Jackson

 

--

Dr Andrew N Jackson

Web Archiving Technical Lead

The British Library

 

Tel: 01937 546602

Mobile: 07765 897948

Web: www.webarchive.org.uk http://www.webarchive.org.uk/ 

Twitter: @UKWebArchive

 



Re: Measuring SOLR performance

2013-09-03 Thread Dmitry Kan
Hi Roman,

Thanks, the --additionalSolrParams was just what I wanted and works fine.

BTW, if you have some special bug tracking forum for the tool, I'm happy
to submit questions / bug reports there. Otherwise, this email list is ok
(for me at least).

One other thing I have noticed in the err logs was a series of messages of
this sort upon generating the perf test report. Seems to be jmeter related
(the err messages disappear, if extra lib dir is present under ext
directory).

java.lang.Throwable: Could not access
/home/dmitry/projects/lab/solrjmeter7/solrjmeter/jmeter/lib/ext/lib
at
kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109)
at kg.apc.cmd.UniversalRunner.clinit(UniversalRunner.java:55)
at
kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109)
at kg.apc.cmd.UniversalRunner.clinit(UniversalRunner.java:55)

at
kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109)
at kg.apc.cmd.UniversalRunner.clinit(UniversalRunner.java:55)



On Tue, Sep 3, 2013 at 2:50 AM, Roman Chyla roman.ch...@gmail.com wrote:

 Hi Dmitry,

 If it is something you want to pass with every request (which is my use
 case), you can pass it as additional solr params, eg.

 python solrjmeter

 --additionalSolrParams=fq=other_field:bar+facet=true+facet.field=facet_field_name
 

 the string should be url encoded.

 If it is something that changes with every request, you should modify the
 jmeter test. If you open/load it with jmeter GUI, in the HTTP request
 processor you can define other additional fields to pass with the request.
 These values can come from the CSV file, you'll see an example how to use
 that when you open the test difinition file.

 Cheers,

   roman




 On Mon, Sep 2, 2013 at 3:12 PM, Dmitry Kan solrexp...@gmail.com wrote:

  Hi Erick,
 
  Agree, this is perfectly fine to mix them in solr. But my question is
 about
  solrjmeter input query format. Just couldn't find a suitable example on
 the
  solrjmeter's github.
 
  Dmitry
 
 
 
  On Mon, Sep 2, 2013 at 5:40 PM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   filter and facet queries can be freely intermixed, it's not a problem.
   What problem are you seeing when you try this?
  
   Best,
   Erick
  
  
   On Mon, Sep 2, 2013 at 7:46 AM, Dmitry Kan solrexp...@gmail.com
 wrote:
  
Hi Roman,
   
What's the format for running the facet+filter queries?
   
Would something like this work:
   
field:foo  =50  fq=other_field:bar facet=true
   facet.field=facet_field_name
   
   
Thanks,
Dmitry
   
   
   
On Fri, Aug 23, 2013 at 2:34 PM, Dmitry Kan solrexp...@gmail.com
   wrote:
   
 Hi Roman,

 With adminPath=/admin or adminPath=/admin/cores, no.
  Interestingly
 enough, though, I can access
 http://localhost:8983/solr/statements/admin/system

 But I can access http://localhost:8983/solr/admin/cores, only when
   with
 adminPath=/admin/cores (which suggests that this is the right
 value
   to
be
 used for cores), and not with adminPath=/admin.

 Bottom line, these core configuration is not self-evident.

 Dmitry




 On Fri, Aug 23, 2013 at 4:18 AM, Roman Chyla 
 roman.ch...@gmail.com
wrote:

 Hi Dmitry,
 So it seems solrjmeter should not assume the adminPath - and
 perhaps
needs
 to be passed as an argument. When you set the adminPath, are you
  able
   to
 access localhost:8983/solr/statements/admin/cores ?

 roman


 On Wed, Aug 21, 2013 at 7:36 AM, Dmitry Kan solrexp...@gmail.com
 
wrote:

  Hi Roman,
 
  I have noticed a difference with different solr.xml config
  contents.
It
 is
  probably legit, but thought to let you know (tests run on fresh
 checkout as
  of today).
 
  As mentioned before, I have two cores configured in solr.xml. If
  the
 file
  is:
 
  [code]
  solr persistent=false
 
!--
adminPath: RequestHandler path to manage cores.
  If 'null' (or absent), cores will not be manageable via
  request
 handler
--
cores adminPath=/admin/cores host=${host:}
  hostPort=${jetty.port:8983} hostContext=${hostContext:solr}
  core name=metadata instanceDir=metadata /
  core name=statements instanceDir=statements /
/cores
  /solr
  [/code]
 
  then the instruction:
 
  python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -q
  ./queries/demo/demo.queries -s localhost -p 8983 -a
  --durationInSecs
60
 -R
  cms -t /solr/statements -e statements -U 100
 
  works just fine. If however the solr.xml has adminPath set to
   /admin
  solrjmeter produces an error:
 
  [error]
  **ERROR**
File solrjmeter.py, line 1386, in module
  main(sys.argv)
File solrjmeter.py, line 1278, 

Re: Update field properties via Schema Rest API ?

2013-09-03 Thread bengates
Hello Erick,

Thank you for your reply.

Unfortunately, yes it is.

I work with a company that has a catalog with many new attributes every day,
and sometimes the existing ones change. For instance, one attribute may live
with the unit for months (e.g. screen_size =32 cm) and one day my provider
change it to an integer (screen_size = 32), making it easier to create
ranges.

Besides I want my business users to be able to add and edit new features on
their products, and my php middle-end app just should communicate with solr
without me.

We actually work with a solution that works that way (users do everything,
the middle-app controls the users' inputs and deals with the back-end), and
dealing with the open-source solr is really hard if that essential feature
isn't provided... :(

That's why I was very happy when Solr 4.4 introduced the add field by the
REST API, that works very well, but I was disapointed that any new field
couldn't be edited (just indexed and stored true / false would be
amazing) without I have to edit any file.

So I really hope this part of the API will soon be completed :)

Best regards,
Ben



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Update-field-properties-via-Schema-Rest-API-tp4087907p4087951.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Memory usage during aggregation - SolrCloud with very large numbers of facet terms.

2013-09-03 Thread Michael Ryan
 However, the Solr instance we direct our  client query to is consuming 
 significantly more RAM (10GB) and is still failing after a few queries when 
 it runs out of heap space. This is presumably due to the role it plays, 
 aggregating the results from each shard.

That seems quite odd... What facet parameters are you using in the query? I 
could imagine memory issues if you're using facet.limit=-1, or some very large 
number.

-Michael


SolrCloud - shard containing an invalid host:port

2013-09-03 Thread Marc des Garets

Hi,

I have setup SolrCloud with tomcat. I use solr 4.1.

I have zookeeper running on 192.168.1.10.
A tomcat running solr_myidx on 192.168.1.10 on port 8080.
A tomcat running solr_myidx on 192.168.1.11 on port 8080.

My solr.xml is like this:
?xml version=1.0 encoding=UTF-8 ?
solr persistent=true collection.configName=myidx
  cores adminPath=/admin/cores defaultCoreName=collection1 
hostPort=8080 hostContext=solr_myidx zkClientTimeout=2

core name=collection1 instanceDir=./
  /cores
/solr

I have tomcat starting with: -Dbootstrap_conf=true 
-DzkHost=192.168.1.10:2181


Both tomcat startup all good but when I go to the Cloud tab in the solr 
admin, I see the following:


collection1 -- shard1 -- 192.168.1.10:8983/solr
  192.168.1.11:8080/solr_ugc
  192.168.1.10:8080/solr_ugc

I don't know what is 192.168.1.10:8983/solr doing there. Do you know how 
I can remove it?


It's causing the following error when I try to query the index:
SEVERE: Error while trying to recover. 
core=collection1:org.apache.solr.client.solrj.SolrServerException: 
Server refused connection at: http://192.168.10.206:8983/solr


Thanks,
Marc


Re: Starting Solr in Tomcat with specifying ZK host(s)

2013-09-03 Thread maephisto
When i try to deploy using jetty, everything works fine, and the solr
instance gets in the cloud

sudo java -Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf -DzkHost=zk ip:2181 -DnumShards=3 -jar
start.jar



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Starting-Solr-in-Tomcat-with-specifying-ZK-host-s-tp4087916p4087962.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Memory usage during aggregation - SolrCloud with very large numbers of facet terms.

2013-09-03 Thread Jackson, Andrew
The default facet.limit is 10, but it's set to 50 for most of the
facets. I've included the query parameters below. In case it makes any
difference, there are quite a lot of facet fields with large numbers of
terms, and the queries are being generated by the Sarnia Drupal module.

Thanks,
Andy

---
lst name=params
str name=f.links_public_suffixes.facet.limit50/str
str name=f.postcode_district.facet.limit50/str
str name=facettrue/str
str name=sortwayback_date asc/str
str name=f.content_type_served.facet.limit50/str
str name=f.sentiment.facet.limit50/str
str name=facet.limit10/str
str name=f.content_encoding.facet.limit50/str
str name=f.last_modified_year.facet.limit50/str
str name=f.links_hosts.facet.limit50/str
str name=facet.methodenum/str
str name=f.author.facet.limit50/str
str name=fl*,score/str
str name=f.content_type_full.facet.limit50/str
str name=f.content_type.facet.limit50/str
str name=f.content_type_ext.facet.limit50/str
arr name=facet.field
strsentiment/str
strprivate_suffix/str
strpublic_suffix/str
strpostcode_district/str
strcontent_type_ext/str
strcontent_type_full/str
str{!ex=content_type_norm}content_type_norm/str
strcontent_type_served/str
strcontent_type/str
strcontent_language/str
strauthor/str
strcontent_encoding/str
strcontent_ffb/str
str{!ex=crawl_year}crawl_year/str
strdomain/str
strlinks_public_suffixes/str
strlinks_private_suffixes/str
strlinks_hosts/str
strgenerator/str
strlast_modified_year/str
/arr
str name=qtstandard/str
str name=facet.enum.cache.minDf500/str
str name=facet.missingfalse/str
str name=f.crawl_year.facet.limit50/str
str name=facet.mincount1/str
str name=f.content_language.facet.limit50/str
str name=json.nlmap/str
str name=wtxml/str
str name=f.private_suffix.facet.limit50/str
str name=rows20/str
str name=f.content_ffb.facet.limit50/str
str name=f.generator.facet.limit50/str
str name=f.links_private_suffixes.facet.limit50/str
str name=f.domain.facet.limit50/str
str name=facet.sortcount/str
str name=start0/str
str name=q*:*/str
str name=f.public_suffix.facet.limit50/str
str name=f.content_type_norm.facet.limit50/str
/lst
---

 -Original Message-
 From: Michael Ryan [mailto:mr...@moreover.com]
 Sent: 03 September 2013 13:41
 To: solr-user@lucene.apache.org
 Subject: RE: Memory usage during aggregation - SolrCloud with very
large
 numbers of facet terms.
 
  However, the Solr instance we direct our  client query to is
consuming
 significantly more RAM (10GB) and is still failing after a few queries
when it
 runs out of heap space. This is presumably due to the role it plays,
 aggregating the results from each shard.
 
 That seems quite odd... What facet parameters are you using in the
query? I
 could imagine memory issues if you're using facet.limit=-1, or some
very
 large number.
 
 -Michael


Re: Measuring SOLR performance

2013-09-03 Thread Roman Chyla
Hi Dmitry,

Thanks for the feedback. Yes, it is indeed jmeter issue (or rather, the
issue of the plugin we use to generate charts). You may want to use the
github for whatever comes next

https://github.com/romanchyla/solrjmeter/issues

Cheers,

  roman


On Tue, Sep 3, 2013 at 7:54 AM, Dmitry Kan solrexp...@gmail.com wrote:

 Hi Roman,

 Thanks, the --additionalSolrParams was just what I wanted and works fine.

 BTW, if you have some special bug tracking forum for the tool, I'm happy
 to submit questions / bug reports there. Otherwise, this email list is ok
 (for me at least).

 One other thing I have noticed in the err logs was a series of messages of
 this sort upon generating the perf test report. Seems to be jmeter related
 (the err messages disappear, if extra lib dir is present under ext
 directory).

 java.lang.Throwable: Could not access
 /home/dmitry/projects/lab/solrjmeter7/solrjmeter/jmeter/lib/ext/lib
 at
 kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109)
 at kg.apc.cmd.UniversalRunner.clinit(UniversalRunner.java:55)
 at
 kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109)
 at kg.apc.cmd.UniversalRunner.clinit(UniversalRunner.java:55)

 at
 kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109)
 at kg.apc.cmd.UniversalRunner.clinit(UniversalRunner.java:55)



 On Tue, Sep 3, 2013 at 2:50 AM, Roman Chyla roman.ch...@gmail.com wrote:

  Hi Dmitry,
 
  If it is something you want to pass with every request (which is my use
  case), you can pass it as additional solr params, eg.
 
  python solrjmeter
 
 
 --additionalSolrParams=fq=other_field:bar+facet=true+facet.field=facet_field_name
  
 
  the string should be url encoded.
 
  If it is something that changes with every request, you should modify the
  jmeter test. If you open/load it with jmeter GUI, in the HTTP request
  processor you can define other additional fields to pass with the
 request.
  These values can come from the CSV file, you'll see an example how to use
  that when you open the test difinition file.
 
  Cheers,
 
roman
 
 
 
 
  On Mon, Sep 2, 2013 at 3:12 PM, Dmitry Kan solrexp...@gmail.com wrote:
 
   Hi Erick,
  
   Agree, this is perfectly fine to mix them in solr. But my question is
  about
   solrjmeter input query format. Just couldn't find a suitable example on
  the
   solrjmeter's github.
  
   Dmitry
  
  
  
   On Mon, Sep 2, 2013 at 5:40 PM, Erick Erickson 
 erickerick...@gmail.com
   wrote:
  
filter and facet queries can be freely intermixed, it's not a
 problem.
What problem are you seeing when you try this?
   
Best,
Erick
   
   
On Mon, Sep 2, 2013 at 7:46 AM, Dmitry Kan solrexp...@gmail.com
  wrote:
   
 Hi Roman,

 What's the format for running the facet+filter queries?

 Would something like this work:

 field:foo  =50  fq=other_field:bar facet=true
facet.field=facet_field_name


 Thanks,
 Dmitry



 On Fri, Aug 23, 2013 at 2:34 PM, Dmitry Kan solrexp...@gmail.com
wrote:

  Hi Roman,
 
  With adminPath=/admin or adminPath=/admin/cores, no.
   Interestingly
  enough, though, I can access
  http://localhost:8983/solr/statements/admin/system
 
  But I can access http://localhost:8983/solr/admin/cores, only
 when
with
  adminPath=/admin/cores (which suggests that this is the right
  value
to
 be
  used for cores), and not with adminPath=/admin.
 
  Bottom line, these core configuration is not self-evident.
 
  Dmitry
 
 
 
 
  On Fri, Aug 23, 2013 at 4:18 AM, Roman Chyla 
  roman.ch...@gmail.com
 wrote:
 
  Hi Dmitry,
  So it seems solrjmeter should not assume the adminPath - and
  perhaps
 needs
  to be passed as an argument. When you set the adminPath, are you
   able
to
  access localhost:8983/solr/statements/admin/cores ?
 
  roman
 
 
  On Wed, Aug 21, 2013 at 7:36 AM, Dmitry Kan 
 solrexp...@gmail.com
  
 wrote:
 
   Hi Roman,
  
   I have noticed a difference with different solr.xml config
   contents.
 It
  is
   probably legit, but thought to let you know (tests run on
 fresh
  checkout as
   of today).
  
   As mentioned before, I have two cores configured in solr.xml.
 If
   the
  file
   is:
  
   [code]
   solr persistent=false
  
 !--
 adminPath: RequestHandler path to manage cores.
   If 'null' (or absent), cores will not be manageable via
   request
  handler
 --
 cores adminPath=/admin/cores host=${host:}
   hostPort=${jetty.port:8983}
 hostContext=${hostContext:solr}
   core name=metadata instanceDir=metadata /
   core name=statements instanceDir=statements /
 /cores
   /solr
   [/code]
  
   

Solr 4.3: Recovering from Too many values for UnInvertedField faceting on field

2013-09-03 Thread Dennis Schafroth
We are harvesting and indexing bibliographic data, thus having many distinct 
author names in our index. While testing Solr 4 I believe I had pushed a single 
core to 100 million records (91GB of data) and everything was working fine and 
fast. After adding a little more to the index, then following started to happen:

17328668 [searcherExecutor-4-thread-1] WARN org.apache.solr.core.SolrCore – 
Approaching too many values for UnInvertedField faceting on field 
'author_exact' : bucket size=16726546
17328701 [searcherExecutor-4-thread-1] INFO org.apache.solr.core.SolrCore – 
UnInverted multi-valued field 
{field=author_exact,memSize=336715415,tindexSize=5001903,time=31595,phase1=31465,nTerms=12048027,bigTerms=0,termInstances=57751332,uses=0}
18103757 [searcherExecutor-4-thread-1] ERROR org.apache.solr.core.SolrCore – 
org.apache.solr.common.SolrException: Too many values for UnInvertedField 
faceting on field author_exact
at org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:181)
at 
org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:664)

I can see that we reached a limit of bucket size. Is there a way to adjust 
this? The index also seem to explode in size (217GB).

Thinking that I had reached a limit for what a single core could handle in 
terms of facet, I deleted records in the index, but even now at 1/3 (32 
million) it will still fails with above error. I have optimised with 
expungeDeleted=true. The index is  somewhat larger (76GB) than I would have 
expected.

While we can still use the index and get facets back using enum method on that 
field, I would still like a way to fix the index if possible. Any suggestions? 

cheers, 
:-Dennis

Re: solr cloud and DIH, indexation runs only on one shard.

2013-09-03 Thread jerome . dupont

It works

I've done what you said:
_ In my request to get list of documents, I add a where clause filtering on
the select getting the documents to index:
where noticebib.numnoticebib LIKE '%${dataimporter.request.suffixeNotice}'
_ And I called my dih on each shard with the parameter suffixeNotice=2 or
suffixeNotice=1

Each shard indexed its part on the same time. (more or less 1000 do each
one).

When I execute a select on the collection, I get more or less 2000
documents.

No my goad is to merge indexes, but that's another story.

Another possiblity would have been to play with rows and start parameters,
but it supooses 2 things
_ to know the number of documents
_ add an order by clause to make sure the subsets of document are disjoints
(and even in that case, I'm not completly sure, because the source database
can change)

Thanks very much !!

Jerôme



Fermeture annuelle des sites François-Mitterrand et Richelieu du 2 au 15 
septembre 2013 Avant d'imprimer, pensez à l'environnement. 

Re: dataimporter tika doesn't extract certain div

2013-09-03 Thread Shalin Shekhar Mangar
I don't know much about Tika but in the example data-config.xml that
you posted, the xpath attribute on the field text won't work
because the xpath attribute is used only by a XPathEntityProcessor.

On Thu, Aug 29, 2013 at 10:20 PM, Andreas Owen a...@conx.ch wrote:
 I want tika to only index the content in div id=content.../div for the 
 field text. unfortunately it's indexing the hole page. Can't xpath do this?

 data-config.xml:

 dataConfig
 dataSource type=BinFileDataSource name=data/
 dataSource type=BinURLDataSource name=dataUrl/
 dataSource type=URLDataSource name=main/
 document
 entity name=rec processor=XPathEntityProcessor 
 url=http://127.0.0.1/tkb/internet/docImportUrl.xml; forEach=/docs/doc 
 dataSource=main !--transformer=script:GenerateId--
 field column=title xpath=//title /
 field column=id xpath=//id /
 field column=file xpath=//file /
 field column=path xpath=//path /
 field column=url xpath=//url /
 field column=Author xpath=//author /

 entity name=tika processor=TikaEntityProcessor 
 url=${rec.path}${rec.file} dataSource=dataUrl onError=skip 
 htmlMapper=identity format=html 
 field column=text xpath=//div[@id='content'] /

 /entity
 /entity
 /document
 /dataConfig



-- 
Regards,
Shalin Shekhar Mangar.


Re: Can we used CloudSolrServer for searching data

2013-09-03 Thread Shalin Shekhar Mangar
CloudSolrServer can only be used if you are actually using SolrCloud
(i.e. a ZooKeeper aware setup). If you only have a multi-core setup,
then you can use LBHttpSolrServer.

See http://wiki.apache.org/solr/LBHttpSolrServer

On Tue, Aug 27, 2013 at 2:11 PM, Dharmendra Jaiswal
dharmendra.jais...@gmail.com wrote:
 Hello,

 I am using multi-core mechnism with Solr4.4.0. And each core is dedicated to
 a
 particular client (each core is a collection)

 Like If we search data from SiteA, it will provide search result from CoreA
 And if we search data from SiteB, it will provide search result from CoreB
 and similar case with other client.

 Right now i am using HttpSolrServer (SolrJ API) for connecting with Solr for
 search.
 As per my understanding it will try to connect directly to a particular Solr
 instance for searching and if that node will be down searching will fail.
 please let me know if my assumption is wrong.

 My query is that is it possible to connect with Solr using CloudSolrServer
 instead of HTTPSolrServer for searching. so that in case one node will be
 down cloud solr server will pick data from other instance of Solr.

 Any pointer and link will be helpful. it will be better if some one shared
 me some example related to connection using ClouSolrServer.

 Note: I am Using Windows machine for deployment of Solr. And we are indexing
 data from database using DIH

 Thanks,
 Dharmendra jaiswal



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Can-we-used-CloudSolrServer-for-searching-data-tp4086766.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Regards,
Shalin Shekhar Mangar.


Dynamic Query Analyzer

2013-09-03 Thread Daniel Rosher
Hi,

We have a need to specify a different query analyzer depending on input
parameters dynamically.

We need this so that we can use different stopword lists at query time.

Would any one know how I might be able to achieve this in solr?

I'm aware of the solution to specify different field types, each with a
different query analyzer, but I'd like not to have to index the field
multiple times.

Many thanks
Dab


Re: SolrCloud Set up

2013-09-03 Thread Jared Griffith
I think I have it all sorted out.  There are some weird network issues here
where my test set up is, so that may have been part of the over all issue.
Timeouts wouldn't have fixed this issue, that's for sure.


On Sat, Aug 31, 2013 at 7:17 AM, Erick Erickson erickerick...@gmail.comwrote:

 bq: Though I am seeing some funkiness that I wasn't seeing with Solr 
 Zookeeper running together

 Then I suspect you've set something up inconsistently, _or_ you need to
 extend some timeouts because SolrCloud is being run with separate ZKs by
 quite a few people so I'd be surprised if it were anything except config
 issues.

 If you _do_ uncover something in that realm other than timeouts and such,
 we need to know

 Best,
 Erick




 On Fri, Aug 30, 2013 at 2:15 PM, Jared Griffith
 jgriff...@picsauditing.comwrote:

  That's what I was thinking.  Though I am seeing some funkiness that I
  wasn't seeing with Solr  Zookeeper running together.
 
 
  On Fri, Aug 30, 2013 at 9:40 AM, Shawn Heisey s...@elyograg.org wrote:
 
   On 8/30/2013 9:43 AM, Jared Griffith wrote:
  
   One last thing.  Is there any real benefit in running SolrCloud and
   Zookeeper separate?   I am seeing some funkiness with the separation
 of
   the
   two, funkiness I wasn't seeing when running SolrCloud + Zookeeper
  together
   as outlined in the Wiki.
  
  
   For a robust install, you want zookeeper to be a separate process.  It
  can
   run on the same server as Solr, but the embedded zookeeper (-DzkRun)
  should
   not be used except for dev and proof of concept work.
  
   The reason is simple.  Zookeeper is the central coordinator for
  SolrCloud.
In order for it to remain stable, it should not be restarted without
  good
   reason.  If you are running zookeeper as part of Solr, then you will be
   affecting zookeeper operation anytime you restart that instance of
 Solr.
  
   Making changes to your Solr setup often requires that you restart Solr.
This includes upgrading Solr and changing some aspects of its
   configuration.  Some configuration aspects can be changed with just a
   collection reload, but others require a full application restart.
  
   Thanks,
   Shawn
  
  
 
 
  --
 
  Jared Griffith
  Linux Administrator, PICS Auditing, LLC
  P: (949) 936-4574
  C: (909) 653-7814
 
  http://www.picsauditing.com
 
  17701 Cowan #140 | Irvine, CA | 92614
 
  Join PICS on LinkedIn and Twitter!
 
  https://twitter.com/PICSAuditingLLC
 




-- 

Jared Griffith
Linux Administrator, PICS Auditing, LLC
P: (949) 936-4574
C: (909) 653-7814

http://www.picsauditing.com

17701 Cowan #140 | Irvine, CA | 92614

Join PICS on LinkedIn and Twitter!

https://twitter.com/PICSAuditingLLC


DIH + Solr Cloud

2013-09-03 Thread Alejandro Calbazana
Hi,

Quick question about data import handlers in Solr cloud.  Does anyone use
more than one instance to support the DIH process?  Or is the typical setup
to have one box setup as only the DIH and keep this responsibility outside
of the Solr cloud environment?  I'm just trying to get picture of his this
is typically deployed.

Thanks!

Alejandro


Re: SolrCloud - Path must not end with / character

2013-09-03 Thread Jared Griffith
Interesting because I was getting the issue when I was passing the full
path (without the trailing / ) to Tomcat.


On Mon, Sep 2, 2013 at 11:34 PM, Prasi S prasi1...@gmail.com wrote:

 The issue is resolved. I have given all the path inside tomcat as relative
 paths( solr home, solr war). That was the creating the problem.


 On Mon, Sep 2, 2013 at 2:19 PM, Prasi S prasi1...@gmail.com wrote:

  Does this have anyting to do with tomcat? I cannot go back as we already
  fixed with tomcat.
 
  Any suggestions pls. The same setup , if i copy and run it on a different
  machine, it works fine. Am not sure what is missing. Is it because of
 some
  system parameter getting set?
 
 
  On Fri, Aug 30, 2013 at 9:11 PM, Jared Griffith 
  jgriff...@picsauditing.com wrote:
 
  I was getting the same errors when trying to implement SolrCloud with
  Tomcat.  I eventually gave up until something came out of this thread.
  This all works if you just ditch Tomcat and go with the native Jetty
  server.
 
 
  On Fri, Aug 30, 2013 at 6:28 AM, Prasi S prasi1...@gmail.com wrote:
 
   Also, this fails with the default solr 4.4 downlaoded configuration
 too
  
  
   On Fri, Aug 30, 2013 at 4:19 PM, Prasi S prasi1...@gmail.com wrote:
  
Below is the script i run
   
START /MAX
F:\SolrCloud\zookeeper\zk-server-1\zookeeper-3.4.5\bin\zkServer.cmd
   
   
START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/*
org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost localhost:2182
  -confdir
solr-conf -confname solrconf1
   
   
   
START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/*
org.apache.solr.cloud.ZkCLI -cmd linkconfig -zkhost 127.0.0.1:2182
  -collection
   firstcollection -confname solrconf1 -solrhome ../tomcat1/solr1
   
   
   
START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/*
org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost localhost:2182
  -confdir
solr-conf -confname solrconf2
   
   
   
   
START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/*
org.apache.solr.cloud.ZkCLI -cmd linkconfig -zkhost 127.0.0.1:2182
  -collection
   seccollection -confname solrconf2 -solrhome ../tomcat1/solr1
   
   
   
START /MAX F:\solrcloud\tomcat1\bin\startup.bat
   
   
   
START /MAX F:\solrcloud\tomcat2\bin\startup.bat
   
   
On Fri, Aug 30, 2013 at 4:07 PM, Prasi S prasi1...@gmail.com
 wrote:
   
Im still clueless on where the issue could be. There is no much
information in the solr logs.
   
i had a running version of cloud in another server. I have copied
 the
same to this server, and started zookeeper, then ran teh below
  commands,
   
java -classpath .;solr-lib/* org.apache.solr.cloud.ZkCLI -cmd
  upconfig
-zkhost localhost:2181 -confdir solr-conf -confname solrconfindex
   
java -classpath .;solr-lib/* org.apache.solr.cloud.ZkCLI -cmd
  linkconfig
-zkhost 127.0.0.1:2181 -collection colindexer -confname
  solrconfindex
-solrhome ../tomcat1/solr1
   
After this, when i started tomcat, the first tomcat starts fine.
 When
   the
second tomcat is started, i get the above exception and it stops.
  Tehn
   the
first tomcat also shows teh same exception.
   
   
   
   
On Thu, Aug 29, 2013 at 7:18 PM, Mark Miller 
 markrmil...@gmail.com
   wrote:
   
Yeah, you see this when the core could not be created. Check the
  logs
   to
see if you can find something more useful.
   
I ran into this again the other day - it's something we should
 fix.
  You
see the same thing in the UI when a core cannot be created and it
   gives you
no hint about the problem and is confusing.
   
- Mark
   
On Aug 29, 2013, at 5:23 AM, sathish_ix 
 skandhasw...@inautix.co.in
  
wrote:
   
 Hi ,

 Check your configuration files uploaded into zookeeper is valid
  and
   no
error
 in config files uploaded.
 I think due to this error, solr core will not be created.

 Thanks,
 Sathish



 --
 View this message in context:
   
  
 
 http://lucene.472066.n3.nabble.com/SolrCloud-Path-must-not-end-with-character-tp4087159p4087182.html
 Sent from the Solr - User mailing list archive at Nabble.com.
   
   
   
   
  
 
 
 
  --
 
  Jared Griffith
  Linux Administrator, PICS Auditing, LLC
  P: (949) 936-4574
  C: (909) 653-7814
 
  http://www.picsauditing.com
 
  17701 Cowan #140 | Irvine, CA | 92614
 
  Join PICS on LinkedIn and Twitter!
 
  https://twitter.com/PICSAuditingLLC
 
 
 




-- 

Jared Griffith
Linux Administrator, PICS Auditing, LLC
P: (949) 936-4574
C: (909) 653-7814

http://www.picsauditing.com

17701 Cowan #140 | Irvine, CA | 92614

Join PICS on LinkedIn and Twitter!

https://twitter.com/PICSAuditingLLC


Re: Solr 4.2 update/extract adding unknown field, can we change field type from string to text

2013-09-03 Thread Chris Hostetter

Your email is vague in terms of what you are actually *doing* and what 
behavior you are seeing.  

Providing specific details like This is my schema.xml and this is my 
solrconfig.xml; when i POST this file to this URL i get this result and i 
would instead like to get this result is useful for other people to 
provide you with meaningful help...

https://wiki.apache.org/solr/UsingMailingLists

My best guess is that you are refering specifically to the behavior of 
ExtractingRequestHandler and the fields it tries to include in documents 
that are exstracted, and how those fileds are indexed -- in which case you 
can use the uprefix option to add a prefix to the name of all fields 
generated by Tika that aren't already in your schema, and you can then 
define a dynamicField matching hat prefix to ontrol every aspect of the 
resulting fields...

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika#UploadingDatawithSolrCellusingApacheTika-InputParameters


-Hoss


Re: SolrCloud Set up

2013-09-03 Thread Erick Erickson
Ah, thanks for the closure, it's always nice to know. I used to work
with a guy who had a list of network fallacies, that amounted to
you can't trust them fully

Erick


On Tue, Sep 3, 2013 at 12:12 PM, Jared Griffith
jgriff...@picsauditing.comwrote:

 I think I have it all sorted out.  There are some weird network issues here
 where my test set up is, so that may have been part of the over all issue.
 Timeouts wouldn't have fixed this issue, that's for sure.


 On Sat, Aug 31, 2013 at 7:17 AM, Erick Erickson erickerick...@gmail.com
 wrote:

  bq: Though I am seeing some funkiness that I wasn't seeing with Solr 
  Zookeeper running together
 
  Then I suspect you've set something up inconsistently, _or_ you need to
  extend some timeouts because SolrCloud is being run with separate ZKs by
  quite a few people so I'd be surprised if it were anything except config
  issues.
 
  If you _do_ uncover something in that realm other than timeouts and such,
  we need to know
 
  Best,
  Erick
 
 
 
 
  On Fri, Aug 30, 2013 at 2:15 PM, Jared Griffith
  jgriff...@picsauditing.comwrote:
 
   That's what I was thinking.  Though I am seeing some funkiness that I
   wasn't seeing with Solr  Zookeeper running together.
  
  
   On Fri, Aug 30, 2013 at 9:40 AM, Shawn Heisey s...@elyograg.org
 wrote:
  
On 8/30/2013 9:43 AM, Jared Griffith wrote:
   
One last thing.  Is there any real benefit in running SolrCloud and
Zookeeper separate?   I am seeing some funkiness with the separation
  of
the
two, funkiness I wasn't seeing when running SolrCloud + Zookeeper
   together
as outlined in the Wiki.
   
   
For a robust install, you want zookeeper to be a separate process.
  It
   can
run on the same server as Solr, but the embedded zookeeper (-DzkRun)
   should
not be used except for dev and proof of concept work.
   
The reason is simple.  Zookeeper is the central coordinator for
   SolrCloud.
 In order for it to remain stable, it should not be restarted without
   good
reason.  If you are running zookeeper as part of Solr, then you will
 be
affecting zookeeper operation anytime you restart that instance of
  Solr.
   
Making changes to your Solr setup often requires that you restart
 Solr.
 This includes upgrading Solr and changing some aspects of its
configuration.  Some configuration aspects can be changed with just a
collection reload, but others require a full application restart.
   
Thanks,
Shawn
   
   
  
  
   --
  
   Jared Griffith
   Linux Administrator, PICS Auditing, LLC
   P: (949) 936-4574
   C: (909) 653-7814
  
   http://www.picsauditing.com
  
   17701 Cowan #140 | Irvine, CA | 92614
  
   Join PICS on LinkedIn and Twitter!
  
   https://twitter.com/PICSAuditingLLC
  
 



 --

 Jared Griffith
 Linux Administrator, PICS Auditing, LLC
 P: (949) 936-4574
 C: (909) 653-7814

 http://www.picsauditing.com

 17701 Cowan #140 | Irvine, CA | 92614

 Join PICS on LinkedIn and Twitter!

 https://twitter.com/PICSAuditingLLC



Re: distributed query result order tie break question

2013-09-03 Thread Chris Hostetter

: like to understand how the ordering is defined so that I can compute an
: integer that is sorted in the same way.  For example (shard id  24) |
: docid or something like that.

If you want to ensure a consistent ordering, you have to index a 
(unique) value that you use as a secondary sort -- you can't trust the 
internal docids will remain unchanged.


-Hoss


Re: SolrCloud Set up

2013-09-03 Thread Walter Underwood
Those are the Fallacies of Distributed Computing from L. Peter Deutsch. The 
first fallacy is The network is reliable.

http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing

wunder

On Sep 3, 2013, at 10:26 AM, Erick Erickson wrote:

 Ah, thanks for the closure, it's always nice to know. I used to work
 with a guy who had a list of network fallacies, that amounted to
 you can't trust them fully
 
 Erick
 
 
 On Tue, Sep 3, 2013 at 12:12 PM, Jared Griffith
 jgriff...@picsauditing.comwrote:
 
 I think I have it all sorted out.  There are some weird network issues here
 where my test set up is, so that may have been part of the over all issue.
 Timeouts wouldn't have fixed this issue, that's for sure.
 
 
 On Sat, Aug 31, 2013 at 7:17 AM, Erick Erickson erickerick...@gmail.com
 wrote:
 
 bq: Though I am seeing some funkiness that I wasn't seeing with Solr 
 Zookeeper running together
 
 Then I suspect you've set something up inconsistently, _or_ you need to
 extend some timeouts because SolrCloud is being run with separate ZKs by
 quite a few people so I'd be surprised if it were anything except config
 issues.
 
 If you _do_ uncover something in that realm other than timeouts and such,
 we need to know
 
 Best,
 Erick
 
 
 
 
 On Fri, Aug 30, 2013 at 2:15 PM, Jared Griffith
 jgriff...@picsauditing.comwrote:
 
 That's what I was thinking.  Though I am seeing some funkiness that I
 wasn't seeing with Solr  Zookeeper running together.
 
 
 On Fri, Aug 30, 2013 at 9:40 AM, Shawn Heisey s...@elyograg.org
 wrote:
 
 On 8/30/2013 9:43 AM, Jared Griffith wrote:
 
 One last thing.  Is there any real benefit in running SolrCloud and
 Zookeeper separate?   I am seeing some funkiness with the separation
 of
 the
 two, funkiness I wasn't seeing when running SolrCloud + Zookeeper
 together
 as outlined in the Wiki.
 
 
 For a robust install, you want zookeeper to be a separate process.
 It
 can
 run on the same server as Solr, but the embedded zookeeper (-DzkRun)
 should
 not be used except for dev and proof of concept work.
 
 The reason is simple.  Zookeeper is the central coordinator for
 SolrCloud.
 In order for it to remain stable, it should not be restarted without
 good
 reason.  If you are running zookeeper as part of Solr, then you will
 be
 affecting zookeeper operation anytime you restart that instance of
 Solr.
 
 Making changes to your Solr setup often requires that you restart
 Solr.
 This includes upgrading Solr and changing some aspects of its
 configuration.  Some configuration aspects can be changed with just a
 collection reload, but others require a full application restart.
 
 Thanks,
 Shawn
 
 
 
 
 --
 
 Jared Griffith
 Linux Administrator, PICS Auditing, LLC
 P: (949) 936-4574
 C: (909) 653-7814
 
 http://www.picsauditing.com
 
 17701 Cowan #140 | Irvine, CA | 92614
 
 Join PICS on LinkedIn and Twitter!
 
 https://twitter.com/PICSAuditingLLC
 
 
 
 
 
 --
 
 Jared Griffith
 Linux Administrator, PICS Auditing, LLC
 P: (949) 936-4574
 C: (909) 653-7814
 
 http://www.picsauditing.com
 
 17701 Cowan #140 | Irvine, CA | 92614
 
 Join PICS on LinkedIn and Twitter!
 
 https://twitter.com/PICSAuditingLLC
 

--
Walter Underwood
wun...@wunderwood.org





Re: Dynamic Query Analyzer

2013-09-03 Thread Roman Chyla
You don't need to index fields several times, you can index is just into
one field, and use the different query analyzers just to build the query.
We're doing this for authors, for example - if query language says
=author:einstein, the query parser knows this field should be analyzed
differently (that is the part of your application logic, of your query
language semantics - so it can vary).

The parser will change the 'author' to 'nosynonym_author', this means
'nosynonym_author' analyzer to be used for analysis phase, and after the
query has been prepared, we 'simply' change the query field from
'nosynonym_author' into 'author'. Seems complex, but it is actually easy.
But it depends on what a query parser you can/want to use. I use this:
https://issues.apache.org/jira/browse/LUCENE-5014

roman




On Tue, Sep 3, 2013 at 11:41 AM, Daniel Rosher rosh...@gmail.com wrote:

 Hi,

 We have a need to specify a different query analyzer depending on input
 parameters dynamically.

 We need this so that we can use different stopword lists at query time.

 Would any one know how I might be able to achieve this in solr?

 I'm aware of the solution to specify different field types, each with a
 different query analyzer, but I'd like not to have to index the field
 multiple times.

 Many thanks
 Dab



Re: Problem parsing suggest response

2013-09-03 Thread Chris Hostetter

: 2. The items at and l are not preceded by name.

you're getting back a list of items, the odd items (at, l) are 
strings, and the even items are more complex objects associated with those 
strings

: Can I interfere with the structure?

You can choose how the JSON Writer represents the internal structures of 
pairs contained inside of a NamedList using the json.nl option.
 
by default it is json.nl is flat (the alternative list mentioned above) 
but you can also arrarr (list of 2 item lists) or map which is souds 
like what you are looking for -- however it's important to realize that 
the the map option can in some situations generate the same key 
multiple times depending on the situation / internal data.  This is valid 
JSON, but many client libraries can't handle it, or handle it in a way 
that users don't like -- hence it is not hte default.

https://wiki.apache.org/solr/SolJSON#JSON_specific_parameters


-Hoss


Re: Dynamic Query Analyzer

2013-09-03 Thread Jack Krupansky
Sounds like it would be better for you to preprocess the query in your 
application layer. Your requirements seem too open-ended to wire into 
Solr.


But, to be sure, please elaborate exactly what sort of variations you need 
in query analysis.


-- Jack Krupansky

-Original Message- 
From: Daniel Rosher

Sent: Tuesday, September 03, 2013 11:41 AM
To: solr-user
Subject: Dynamic Query Analyzer

Hi,

We have a need to specify a different query analyzer depending on input
parameters dynamically.

We need this so that we can use different stopword lists at query time.

Would any one know how I might be able to achieve this in solr?

I'm aware of the solution to specify different field types, each with a
different query analyzer, but I'd like not to have to index the field
multiple times.

Many thanks
Dab 



Re: SolrCloud Set up

2013-09-03 Thread Jared Griffith
Thankfully it's none of those but more than likely a bad DHCP server
(Windows) or client (or combo there of) that is causing the network to
freak out.  I'll try adjusting the timeouts up to see if it will alleviate
this.

I am seeing that when I try to restart the solr instances sometimes they
seem to not join the cluster at all (nothing in the logs about issues).
Even after restarting the nodes that are reporting down a couple of times,
they never join the cluster again.

I have 3 zookeeper instances on 3 separate physical machines, and 4 solr
instances running on the same machine.



On Tue, Sep 3, 2013 at 10:38 AM, Walter Underwood wun...@wunderwood.orgwrote:

 Those are the Fallacies of Distributed Computing from L. Peter Deutsch.
 The first fallacy is The network is reliable.

 http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing

 wunder

 On Sep 3, 2013, at 10:26 AM, Erick Erickson wrote:

  Ah, thanks for the closure, it's always nice to know. I used to work
  with a guy who had a list of network fallacies, that amounted to
  you can't trust them fully
 
  Erick
 
 
  On Tue, Sep 3, 2013 at 12:12 PM, Jared Griffith
  jgriff...@picsauditing.comwrote:
 
  I think I have it all sorted out.  There are some weird network issues
 here
  where my test set up is, so that may have been part of the over all
 issue.
  Timeouts wouldn't have fixed this issue, that's for sure.
 
 
  On Sat, Aug 31, 2013 at 7:17 AM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  bq: Though I am seeing some funkiness that I wasn't seeing with Solr 
  Zookeeper running together
 
  Then I suspect you've set something up inconsistently, _or_ you need to
  extend some timeouts because SolrCloud is being run with separate ZKs
 by
  quite a few people so I'd be surprised if it were anything except
 config
  issues.
 
  If you _do_ uncover something in that realm other than timeouts and
 such,
  we need to know
 
  Best,
  Erick
 
 
 
 
  On Fri, Aug 30, 2013 at 2:15 PM, Jared Griffith
  jgriff...@picsauditing.comwrote:
 
  That's what I was thinking.  Though I am seeing some funkiness that I
  wasn't seeing with Solr  Zookeeper running together.
 
 
  On Fri, Aug 30, 2013 at 9:40 AM, Shawn Heisey s...@elyograg.org
  wrote:
 
  On 8/30/2013 9:43 AM, Jared Griffith wrote:
 
  One last thing.  Is there any real benefit in running SolrCloud and
  Zookeeper separate?   I am seeing some funkiness with the separation
  of
  the
  two, funkiness I wasn't seeing when running SolrCloud + Zookeeper
  together
  as outlined in the Wiki.
 
 
  For a robust install, you want zookeeper to be a separate process.
  It
  can
  run on the same server as Solr, but the embedded zookeeper (-DzkRun)
  should
  not be used except for dev and proof of concept work.
 
  The reason is simple.  Zookeeper is the central coordinator for
  SolrCloud.
  In order for it to remain stable, it should not be restarted without
  good
  reason.  If you are running zookeeper as part of Solr, then you will
  be
  affecting zookeeper operation anytime you restart that instance of
  Solr.
 
  Making changes to your Solr setup often requires that you restart
  Solr.
  This includes upgrading Solr and changing some aspects of its
  configuration.  Some configuration aspects can be changed with just a
  collection reload, but others require a full application restart.
 
  Thanks,
  Shawn
 
 
 
 
  --
 
  Jared Griffith
  Linux Administrator, PICS Auditing, LLC
  P: (949) 936-4574
  C: (909) 653-7814
 
  http://www.picsauditing.com
 
  17701 Cowan #140 | Irvine, CA | 92614
 
  Join PICS on LinkedIn and Twitter!
 
  https://twitter.com/PICSAuditingLLC
 
 
 
 
 
  --
 
  Jared Griffith
  Linux Administrator, PICS Auditing, LLC
  P: (949) 936-4574
  C: (909) 653-7814
 
  http://www.picsauditing.com
 
  17701 Cowan #140 | Irvine, CA | 92614
 
  Join PICS on LinkedIn and Twitter!
 
  https://twitter.com/PICSAuditingLLC
 

 --
 Walter Underwood
 wun...@wunderwood.org






-- 

Jared Griffith
Linux Administrator, PICS Auditing, LLC
P: (949) 936-4574
C: (909) 653-7814

http://www.picsauditing.com

17701 Cowan #140 | Irvine, CA | 92614

Join PICS on LinkedIn and Twitter!

https://twitter.com/PICSAuditingLLC


Re: SolrCloud - shard containing an invalid host:port

2013-09-03 Thread Daniel Collins
Was it a test instance that you created 8983 is the default port, so
possibly you started an instance before you had the ports setup properly,
and it registered in zookeeper as a valid instance.  You can use the Core
API to UNLOAD it (if it is still running), if it isn't running anymore, I
have yet to find a way to remove something from ZK We normally end up
wiping zoo_data and bouncing everything at that point, instances should
re-register themselves as they start up.  But that is the sledgehammer to
crack a walnut approach. :)


On 3 September 2013 13:55, Marc des Garets m...@ttux.net wrote:

 Hi,

 I have setup SolrCloud with tomcat. I use solr 4.1.

 I have zookeeper running on 192.168.1.10.
 A tomcat running solr_myidx on 192.168.1.10 on port 8080.
 A tomcat running solr_myidx on 192.168.1.11 on port 8080.

 My solr.xml is like this:
 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=true collection.configName=myidx
   cores adminPath=/admin/cores defaultCoreName=collection1
 hostPort=8080 hostContext=solr_myidx zkClientTimeout=2
 core name=collection1 instanceDir=./
   /cores
 /solr

 I have tomcat starting with: -Dbootstrap_conf=true -DzkHost=
 192.168.1.10:2181

 Both tomcat startup all good but when I go to the Cloud tab in the solr
 admin, I see the following:

 collection1 -- shard1 -- 192.168.1.10:8983/solr
   192.168.1.11:8080/solr_ugc
   192.168.1.10:8080/solr_ugc

 I don't know what is 192.168.1.10:8983/solr doing there. Do you know how
 I can remove it?

 It's causing the following error when I try to query the index:
 SEVERE: Error while trying to recover. core=collection1:org.apache.**
 solr.client.solrj.**SolrServerException: Server refused connection at:
 http://192.168.10.206:8983/**solr http://192.168.10.206:8983/solr

 Thanks,
 Marc



Re: SolrCloud Set up

2013-09-03 Thread Erick Erickson
Yep, that's the one, thanks...


On Tue, Sep 3, 2013 at 1:38 PM, Walter Underwood wun...@wunderwood.orgwrote:

 Those are the Fallacies of Distributed Computing from L. Peter Deutsch.
 The first fallacy is The network is reliable.

 http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing

 wunder

 On Sep 3, 2013, at 10:26 AM, Erick Erickson wrote:

  Ah, thanks for the closure, it's always nice to know. I used to work
  with a guy who had a list of network fallacies, that amounted to
  you can't trust them fully
 
  Erick
 
 
  On Tue, Sep 3, 2013 at 12:12 PM, Jared Griffith
  jgriff...@picsauditing.comwrote:
 
  I think I have it all sorted out.  There are some weird network issues
 here
  where my test set up is, so that may have been part of the over all
 issue.
  Timeouts wouldn't have fixed this issue, that's for sure.
 
 
  On Sat, Aug 31, 2013 at 7:17 AM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  bq: Though I am seeing some funkiness that I wasn't seeing with Solr 
  Zookeeper running together
 
  Then I suspect you've set something up inconsistently, _or_ you need to
  extend some timeouts because SolrCloud is being run with separate ZKs
 by
  quite a few people so I'd be surprised if it were anything except
 config
  issues.
 
  If you _do_ uncover something in that realm other than timeouts and
 such,
  we need to know
 
  Best,
  Erick
 
 
 
 
  On Fri, Aug 30, 2013 at 2:15 PM, Jared Griffith
  jgriff...@picsauditing.comwrote:
 
  That's what I was thinking.  Though I am seeing some funkiness that I
  wasn't seeing with Solr  Zookeeper running together.
 
 
  On Fri, Aug 30, 2013 at 9:40 AM, Shawn Heisey s...@elyograg.org
  wrote:
 
  On 8/30/2013 9:43 AM, Jared Griffith wrote:
 
  One last thing.  Is there any real benefit in running SolrCloud and
  Zookeeper separate?   I am seeing some funkiness with the separation
  of
  the
  two, funkiness I wasn't seeing when running SolrCloud + Zookeeper
  together
  as outlined in the Wiki.
 
 
  For a robust install, you want zookeeper to be a separate process.
  It
  can
  run on the same server as Solr, but the embedded zookeeper (-DzkRun)
  should
  not be used except for dev and proof of concept work.
 
  The reason is simple.  Zookeeper is the central coordinator for
  SolrCloud.
  In order for it to remain stable, it should not be restarted without
  good
  reason.  If you are running zookeeper as part of Solr, then you will
  be
  affecting zookeeper operation anytime you restart that instance of
  Solr.
 
  Making changes to your Solr setup often requires that you restart
  Solr.
  This includes upgrading Solr and changing some aspects of its
  configuration.  Some configuration aspects can be changed with just a
  collection reload, but others require a full application restart.
 
  Thanks,
  Shawn
 
 
 
 
  --
 
  Jared Griffith
  Linux Administrator, PICS Auditing, LLC
  P: (949) 936-4574
  C: (909) 653-7814
 
  http://www.picsauditing.com
 
  17701 Cowan #140 | Irvine, CA | 92614
 
  Join PICS on LinkedIn and Twitter!
 
  https://twitter.com/PICSAuditingLLC
 
 
 
 
 
  --
 
  Jared Griffith
  Linux Administrator, PICS Auditing, LLC
  P: (949) 936-4574
  C: (909) 653-7814
 
  http://www.picsauditing.com
 
  17701 Cowan #140 | Irvine, CA | 92614
 
  Join PICS on LinkedIn and Twitter!
 
  https://twitter.com/PICSAuditingLLC
 

 --
 Walter Underwood
 wun...@wunderwood.org






Re: Starting Solr in Tomcat with specifying ZK host(s)

2013-09-03 Thread Shawn Heisey
On 9/3/2013 4:13 AM, maephisto wrote:
 I've setup a ZK instance and also deployed Solr in Tomcat7 on a different
 instance in Amazon EC2.
 Afterwards I tried starting tomcat specifying the ZK host IP, like so:
 
 sudo service tomcat7 start -DzkHost=zk ip:2181 -DnumShards=3
 -Dcollection.configName=myconf
 -Dbootstrap_confdir=/usr/share/solr/example/solr/collection1/conf
 
 Solr loads fine, but is not in the cloud. 

The tomcat init script likely does not pay attention to anything that
you put on the commandline other than a command (like start/stop/status)
for the service.  The java command is buried in that script.

It works with jetty because you are running java directly, not a script.

Helping you with tomcat is outside the scope of this mailing list, but
you may be able to modify the JAVA_OPTS environment variable in a file
with a name like one of the following:

/etc/default/tomcat7
/etc/sysconfig/tomcat7

Many init scripts for packaged software will load environment
information from a central user-modifiable config file.  If this
information is not directly usable to you, please consult a tomcat
mailing list, IRC channel, or other support avenue.

Although Solr does usually work with tomcat, there is no official
testing.  Solr is only tested using the Jetty that is bundled with it.

Side note: I hope you realize that if you're only connecting to one
zookeeper instance, then SolrCloud will not function if that zookeeper
instance goes down.  You need three instances minimum (running on
separate hardware) for robust operation, and Solr must know about all of
them.

Thanks,
Shawn



Re: Apostrophes in fields

2013-09-03 Thread Shawn Heisey
On 9/3/2013 3:59 AM, devendra W wrote:
 in my case - the fields with apostrophe not returned in results

Don't use special characters in field names.  If it wouldn't work as an
variable name, function name (or other identifier) in a typical
programming language (Java, C, Perl), then it will probably cause you
problems with a field name.

This basically means: 7-bit ASCII only.  Starts with a letter, contains
only letters, numbers, and the underscore.

Most punctuation other than the underscore has a special meaning to
Solr.  Using extended characters (UTF-8, or those beyond 7-bit ASCII)
*might* work, but it's fairly easy to screw that up and use the wrong
character set, so it's better if you just don't do it.

Thanks,
Shawn



Solr Cloud hangs when replicating updates

2013-09-03 Thread Kevin Osborn
I was having problems updating SolrCloud with a large batch of records. The
records are coming in bursts with lulls between updates.

At first, I just tried large updates of 100,000 records at a time.
Eventually, this caused Solr to hang. When hung, I can still query Solr.
But I cannot do any deletes or other updates to the index.

At first, my updates were going as SolrJ CSV posts. I have also tried local
file updates and had similar results. I finally slowed things down to just
use SolrJ's Update feature, which is basically just JavaBin. I am also
sending over just 100 at a time in 10 threads. Again, it eventually hung.

Sometimes, Solr hangs in the first couple of chunks. Other times, it hangs
right away.

These are my commit settings:

autoCommit
   maxTime15000/maxTime
   maxDocs5000/maxDocs
   openSearcherfalse/openSearcher
 /autoCommit
autoSoftCommit
 maxTime3/maxTime
   /autoSoftCommit

I have tried quite a few variations with the same results. I also tried
various JVM settings with the same results. The only variable seems to be
that reducing the cluster size from 2 to 1 is the only thing that helps.

I also did a jstack trace. I did not see any explicit deadlocks, but I did
see quite a few threads in WAITING or TIMED_WAITING. It is typically
something like this:

  java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x00074039a450 (a
java.util.concurrent.Semaphore$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
at
org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
at
org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
at
org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
at
org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
at
org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:139)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:474)
at
org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395)
at
org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44)
at
org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364)
at org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)

It basically appears that Solr gets stuck while trying to acquire a
semaphore that never becomes available.

Anyone have any ideas? This is definitely causing major problems for us.

-- 
*KEVIN OSBORN*
LEAD SOFTWARE ENGINEER
CNET Content Solutions
OFFICE 949.399.8714
CELL 

Re: Apostrophes in fields

2013-09-03 Thread Jack Krupansky

Show us your full field type with analyzer.

I suspect that the problem is that one of the index-time filters is turning 
dev's into devs (WDF does that), but at query-time there is no filter 
that removes a trailing apostrophe.


Use the Solr Admin UI Analysis page to see home dev's gets indexed and how 
dev' gets analyzed at query time.



-- Jack Krupansky

-Original Message- 
From: devendra W

Sent: Tuesday, September 03, 2013 5:59 AM
To: solr-user@lucene.apache.org
Subject: Re: Apostrophes in fields

in my case - the fields with apostrophe not returned in results

When I search for --  dev it shows me following results
dev
dev's
devendra

but when I search for -- dev'   (dev with apo only)
Nothing comes out as result ?

What could be the workaround ?


Thanks
Devendra



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Apostrophes-in-fields-tp475058p4087910.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Change the score of a document based on the *value* of a multifield using dismax

2013-09-03 Thread David Smiley (@MITRE.org)
If you want to alter the score in a customized way based on indexed text data
on a per-value basis then index Lucene payloads, and use PayloadTermQuery. 
See the javadocs for PayloadTermQuery in particular and follow the
references.  This is a bit dated but read this:
http://searchhub.org/2009/08/05/getting-started-with-payloads/

You can get this done.  Almost anything is doable if you have sufficient
time and determination.

~ David



-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Change-the-score-of-a-document-based-on-the-value-of-a-multifield-tp4087503p4088086.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: distributed query result order tie break question

2013-09-03 Thread Michael Sokolov

On 09/03/2013 12:50 PM, Chris Hostetter wrote:

: like to understand how the ordering is defined so that I can compute an
: integer that is sorted in the same way.  For example (shard id  24) |
: docid or something like that.

If you want to ensure a consistent ordering, you have to index a
(unique) value that you use as a secondary sort -- you can't trust the
internal docids will remain unchanged.

Thanks, Hoss - that was the conclusion that I was coming to. It's good 
to have it confirmed.


-Mike


Re: Solr 4.3: Recovering from Too many values for UnInvertedField faceting on field

2013-09-03 Thread Greg Preston
Our index is too large to uninvert on the fly, so we've been looking
into using DocValues to keep a particular field uninverted at index
time.  See http://wiki.apache.org/solr/DocValues

I don't know if this will solve your problem, but it might be worth
trying it out.

-Greg


On Tue, Sep 3, 2013 at 7:04 AM, Dennis Schafroth den...@indexdata.com wrote:
 We are harvesting and indexing bibliographic data, thus having many distinct 
 author names in our index. While testing Solr 4 I believe I had pushed a 
 single core to 100 million records (91GB of data) and everything was working 
 fine and fast. After adding a little more to the index, then following 
 started to happen:

 17328668 [searcherExecutor-4-thread-1] WARN org.apache.solr.core.SolrCore – 
 Approaching too many values for UnInvertedField faceting on field 
 'author_exact' : bucket size=16726546
 17328701 [searcherExecutor-4-thread-1] INFO org.apache.solr.core.SolrCore – 
 UnInverted multi-valued field 
 {field=author_exact,memSize=336715415,tindexSize=5001903,time=31595,phase1=31465,nTerms=12048027,bigTerms=0,termInstances=57751332,uses=0}
 18103757 [searcherExecutor-4-thread-1] ERROR org.apache.solr.core.SolrCore – 
 org.apache.solr.common.SolrException: Too many values for UnInvertedField 
 faceting on field author_exact
 at org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:181)
 at 
 org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:664)

 I can see that we reached a limit of bucket size. Is there a way to adjust 
 this? The index also seem to explode in size (217GB).

 Thinking that I had reached a limit for what a single core could handle in 
 terms of facet, I deleted records in the index, but even now at 1/3 (32 
 million) it will still fails with above error. I have optimised with 
 expungeDeleted=true. The index is  somewhat larger (76GB) than I would have 
 expected.

 While we can still use the index and get facets back using enum method on 
 that field, I would still like a way to fix the index if possible. Any 
 suggestions?

 cheers,
 :-Dennis


SolrCloud 4.x hangs under high update volume

2013-09-03 Thread Tim Vaillancourt
Hey guys,

I am looking into an issue we've been having with SolrCloud since the
beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0
yet). I've noticed other users with this same issue, so I'd really like to
get to the bottom of it.

Under a very, very high rate of updates (2000+/sec), after 1-12 hours we
see stalled transactions that snowball to consume all Jetty threads in the
JVM. This eventually causes the JVM to hang with most threads waiting on
the condition/stack provided at the bottom of this message. At this point
SolrCloud instances then start to see their neighbors (who also have all
threads hung) as down w/Connection Refused, and the shards become down
in state. Sometimes a node or two survives and just returns 503s no server
hosting shard errors.

As a workaround/experiment, we have tuned the number of threads sending
updates to Solr, as well as the batch size (we batch updates from client -
solr), and the Soft/Hard autoCommits, all to no avail. Turning off
Client-to-Solr batching (1 update = 1 call to Solr), which also did not
help. Certain combinations of update threads and batch sizes seem to
mask/help the problem, but not resolve it entirely.

Our current environment is the following:
- 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
- 3 x Zookeeper instances, external Java 7 JVM.
- 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and
a replica of 1 shard).
- Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good
day.
- 5000 max jetty threads (well above what we use when we are healthy),
Linux-user threads ulimit is 6000.
- Occurs under Jetty 8 or 9 (many versions).
- Occurs under Java 1.6 or 1.7 (several minor versions).
- Occurs under several JVM tunings.
- Everything seems to point to Solr itself, and not a Jetty or Java version
(I hope I'm wrong).

The stack trace that is holding up all my Jetty QTP threads is the
following, which seems to be waiting on a lock that I would very much like
to understand further:

java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x0007216e68d8 (a
java.util.concurrent.Semaphore$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
at
org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
at
org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
at
org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
at
org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
at
org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1486)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1096)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:432)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1030)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:201)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)
at

mm, tie, qs, ps and CJKBigramFilter and edismax and dismax

2013-09-03 Thread Naomi Dushay
When I have a field using CJKBigramFilter,  parsed CJK chars have a different 
parsedQuery than  non-CJK  queries.

  (旧小说 is 3 chars, so 2 bigrams)

args sent in:   q={!qf=bi_fld}旧小说pf=pf2=pf3=

 debugQuery
   str name=rawquerystring{!qf=bi_fld}旧小说/str
   str name=querystring{!qf=bi_fld}旧小说/str
   str name=parsedquery(+DisjunctionMaxQuerybi_fld:旧小 
bi_fld:小说)~2))~0.01) ())/no_coord/str
   str name=parsedquery_toString+(((bi_fld:旧小 bi_fld:小说)~2))~0.01 ()/str


If i use a non-CJK query string, with the same field:

args sent in:  q={!qf=bi_fld}foo barpf=pf2=pf3=

debugQuery:
   str name=rawquerystring{!qf=bi_fld}foo bar/str
   str name=querystring{!qf=bi_fld}foo bar/str
   str name=parsedquery(+((DisjunctionMaxQuery((bi_fld:foo)~0.01) 
DisjunctionMaxQuery((bi_fld:bar)~0.01))~2))/no_coord/str
   str name=parsedquery_toString+(((bi_fld:foo)~0.01 
(bi_fld:bar)~0.01)~2)/str


Why are the  parsedquery_toString   formula different?  And is there any 
difference in the actual relevancy formula?

How can you tell the difference between the MinNrShouldMatch and a qs or ps or 
tie value, if they are all represented as ~n  in the parsedQuery string?


To try to get a handle on qs, ps, tie and mm:

 args:  q={!qf=bi_fld pf=bi_fld}a b c dqs=5ps=4

debugQuery:
  str name=rawquerystring{!qf=bi_fld pf=bi_fld}a b c d/str
  str name=querystring{!qf=bi_fld pf=bi_fld}a b c d/str
  str name=parsedquery(+((DisjunctionMaxQuery((bi_fld:a b~5)~0.01) 
DisjunctionMaxQuery((bi_fld:c)~0.01) DisjunctionMaxQuery((bi_fld:d)~0.01))~3) 
DisjunctionMaxQuery((bi_fld:c d~4)~0.01))/no_coord/str
  str name=parsedquery_toString+(((bi_fld:a b~5)~0.01 (bi_fld:c)~0.01 
(bi_fld:d)~0.01)~3) (bi_fld:c d~4)~0.01/str


I get that qs, the query slop, is for explicit phrases in the query, so a b~5 
   makes sense.   I also get that ps is for boosting of phrases, so I get  
(bi_fld:c d~4) … but where is   (cjk_uni_pub_search:a b c d~4)  ?


Using dismax (instead of edismax):

args:   q={!dismax  qf=bi_fld pf=bi_fld}a b c dqs=5ps=4

debugQuery:
  str name=rawquerystring{!dismax qf=bi_fld pf=bi_fld}a b c d/str
  str name=querystring{!dismax qf=bi_fld pf=bi_fld}a b c d/str
  str name=parsedquery(+((DisjunctionMaxQuery((bi_fld:a b~5)~0.01) 
DisjunctionMaxQuery((bi_fld:c)~0.01) DisjunctionMaxQuery((bi_fld:d)~0.01))~3) 
DisjunctionMaxQuery((bi_fld:a b c d~4)~0.01))/no_coord/str
  str name=parsedquery_toString+(((bi_fld:a b~5)~0.01 (bi_fld:c)~0.01 
(bi_fld:d)~0.01)~3) (bi_fld:a b c d~4)~0.01/str


So is this an edismax bug?



FYI,   I am running Solr 4.4. I have fields defined like so:
fieldtype name=text_cjk_bi class=solr.TextField 
positionIncrementGap=1 autoGeneratePhraseQueries=false
  analyzer
tokenizer class=solr.ICUTokenizerFactory /
filter class=solr.CJKWidthFilterFactory/
filter class=solr.ICUTransformFilterFactory id=Traditional-Simplified/
filter class=solr.ICUTransformFilterFactory id=Katakana-Hiragana/
filter class=solr.ICUFoldingFilterFactory/
filter class=solr.CJKBigramFilterFactory han=true hiragana=true 
katakana=true hangul=true outputUnigrams=false /
  /analyzer
/fieldtype

The request handler uses edismax:

requestHandler name=search class=solr.SearchHandler default=true
lst name=defaults
str name=defTypeedismax/str
str name=q.alt:/str
str name=mm6-1 690%/str
int name=qs1/int
int name=ps0/int

Re: mm, tie, qs, ps and CJKBigramFilter and edismax and dismax

2013-09-03 Thread Naomi Dushay
Re the relevancy changes I note below for edismax, there are already some 
issues filed:

pertaining to the difference in how the phrase queries are merged into the main 
query:
  See Michael Dodsworth's comment of 25/Sep/12  on this issue:   
https://issues.apache.org/jira/browse/SOLR-2058  -- ticket is closed, but this 
issue is not addressed.

and pertaining to skipping terms in phrase boosting when part of the query is a 
phrase:
  https://issues.apache.org/jira/browse/SOLR-4130

- Naomi


On Sep 3, 2013, at 5:54 PM, Naomi Dushay wrote:

 When I have a field using CJKBigramFilter,  parsed CJK chars have a different 
 parsedQuery than  non-CJK  queries.
 
   (旧小说 is 3 chars, so 2 bigrams)
 
 args sent in:   q={!qf=bi_fld}旧小说pf=pf2=pf3=
 
  debugQuery
str name=rawquerystring{!qf=bi_fld}旧小说/str
str name=querystring{!qf=bi_fld}旧小说/str
str name=parsedquery(+DisjunctionMaxQuerybi_fld:旧小 
 bi_fld:小说)~2))~0.01) ())/no_coord/str
str name=parsedquery_toString+(((bi_fld:旧小 bi_fld:小说)~2))~0.01 ()/str
 
 
 If i use a non-CJK query string, with the same field:
 
 args sent in:  q={!qf=bi_fld}foo barpf=pf2=pf3=
 
 debugQuery:
str name=rawquerystring{!qf=bi_fld}foo bar/str
str name=querystring{!qf=bi_fld}foo bar/str
str name=parsedquery(+((DisjunctionMaxQuery((bi_fld:foo)~0.01) 
 DisjunctionMaxQuery((bi_fld:bar)~0.01))~2))/no_coord/str
str name=parsedquery_toString+(((bi_fld:foo)~0.01 
 (bi_fld:bar)~0.01)~2)/str
 
 
 Why are the  parsedquery_toString   formula different?  And is there any 
 difference in the actual relevancy formula?
 
 How can you tell the difference between the MinNrShouldMatch and a qs or ps 
 or tie value, if they are all represented as ~n  in the parsedQuery string?
 
 
 To try to get a handle on qs, ps, tie and mm:
 
  args:  q={!qf=bi_fld pf=bi_fld}a b c dqs=5ps=4
 
 debugQuery:
   str name=rawquerystring{!qf=bi_fld pf=bi_fld}a b c d/str
   str name=querystring{!qf=bi_fld pf=bi_fld}a b c d/str
   str name=parsedquery(+((DisjunctionMaxQuery((bi_fld:a b~5)~0.01) 
 DisjunctionMaxQuery((bi_fld:c)~0.01) DisjunctionMaxQuery((bi_fld:d)~0.01))~3) 
 DisjunctionMaxQuery((bi_fld:c d~4)~0.01))/no_coord/str
   str name=parsedquery_toString+(((bi_fld:a b~5)~0.01 (bi_fld:c)~0.01 
 (bi_fld:d)~0.01)~3) (bi_fld:c d~4)~0.01/str
 
 
 I get that qs, the query slop, is for explicit phrases in the query, so a 
 b~5makes sense.   I also get that ps is for boosting of phrases, so I 
 get  (bi_fld:c d~4) … but where is   (cjk_uni_pub_search:a b c d~4)  ?
 
 
 Using dismax (instead of edismax):
 
 args:   q={!dismax  qf=bi_fld pf=bi_fld}a b c dqs=5ps=4
 
 debugQuery:
   str name=rawquerystring{!dismax qf=bi_fld pf=bi_fld}a b c d/str
   str name=querystring{!dismax qf=bi_fld pf=bi_fld}a b c d/str
   str name=parsedquery(+((DisjunctionMaxQuery((bi_fld:a b~5)~0.01) 
 DisjunctionMaxQuery((bi_fld:c)~0.01) DisjunctionMaxQuery((bi_fld:d)~0.01))~3) 
 DisjunctionMaxQuery((bi_fld:a b c d~4)~0.01))/no_coord/str
   str name=parsedquery_toString+(((bi_fld:a b~5)~0.01 (bi_fld:c)~0.01 
 (bi_fld:d)~0.01)~3) (bi_fld:a b c d~4)~0.01/str
 
 
 So is this an edismax bug?
 
 
 
 FYI,   I am running Solr 4.4. I have fields defined like so:
 fieldtype name=text_cjk_bi class=solr.TextField 
 positionIncrementGap=1 autoGeneratePhraseQueries=false
   analyzer
 tokenizer class=solr.ICUTokenizerFactory /
 filter class=solr.CJKWidthFilterFactory/
 filter class=solr.ICUTransformFilterFactory 
 id=Traditional-Simplified/
 filter class=solr.ICUTransformFilterFactory id=Katakana-Hiragana/
 filter class=solr.ICUFoldingFilterFactory/
 filter class=solr.CJKBigramFilterFactory han=true hiragana=true 
 katakana=true hangul=true outputUnigrams=false /
   /analyzer
 /fieldtype
 
 The request handler uses edismax:
 
 requestHandler name=search class=solr.SearchHandler default=true
 lst name=defaults
 str name=defTypeedismax/str
 str name=q.alt:/str
 str name=mm6-1 690%/str
 int name=qs1/int
 int name=ps0/int
 



Re: mm, tie, qs, ps and CJKBigramFilter and edismax and dismax

2013-09-03 Thread Jack Krupansky
The query parser sees q=foo bar as two separate source query terms and 
analyzes each separately, but q=旧小说 is seen by the query parser as a 
single source query term and then that one source query term gets tokenized 
by the query term analyzer as two CJK bigrams.


Try q=foo-bar and you should then get comparable structure to the 
generated queries.


-- Jack Krupansky

-Original Message- 
From: Naomi Dushay

Sent: Tuesday, September 03, 2013 8:54 PM
To: solr-user@lucene.apache.org
Subject: mm, tie, qs, ps and CJKBigramFilter and edismax and dismax

When I have a field using CJKBigramFilter,  parsed CJK chars have a 
different parsedQuery than  non-CJK  queries.


 (旧小说 is 3 chars, so 2 bigrams)

args sent in:   q={!qf=bi_fld}旧小说pf=pf2=pf3=

debugQuery
  str name=rawquerystring{!qf=bi_fld}旧小说/str
  str name=querystring{!qf=bi_fld}旧小说/str
  str name=parsedquery(+DisjunctionMaxQuerybi_fld:旧小 
bi_fld:小说)~2))~0.01) ())/no_coord/str
  str name=parsedquery_toString+(((bi_fld:旧小 bi_fld:小说)~2))~0.01 
()/str



If i use a non-CJK query string, with the same field:

args sent in:  q={!qf=bi_fld}foo barpf=pf2=pf3=

debugQuery:
  str name=rawquerystring{!qf=bi_fld}foo bar/str
  str name=querystring{!qf=bi_fld}foo bar/str
  str name=parsedquery(+((DisjunctionMaxQuery((bi_fld:foo)~0.01) 
DisjunctionMaxQuery((bi_fld:bar)~0.01))~2))/no_coord/str
  str name=parsedquery_toString+(((bi_fld:foo)~0.01 
(bi_fld:bar)~0.01)~2)/str



Why are the  parsedquery_toString   formula different?  And is there any 
difference in the actual relevancy formula?


How can you tell the difference between the MinNrShouldMatch and a qs or ps 
or tie value, if they are all represented as ~n  in the parsedQuery string?



To try to get a handle on qs, ps, tie and mm:

args:  q={!qf=bi_fld pf=bi_fld}a b c dqs=5ps=4

debugQuery:
 str name=rawquerystring{!qf=bi_fld pf=bi_fld}a b c d/str
 str name=querystring{!qf=bi_fld pf=bi_fld}a b c d/str
 str name=parsedquery(+((DisjunctionMaxQuery((bi_fld:a b~5)~0.01) 
DisjunctionMaxQuery((bi_fld:c)~0.01) 
DisjunctionMaxQuery((bi_fld:d)~0.01))~3) DisjunctionMaxQuery((bi_fld:c 
d~4)~0.01))/no_coord/str
 str name=parsedquery_toString+(((bi_fld:a b~5)~0.01 (bi_fld:c)~0.01 
(bi_fld:d)~0.01)~3) (bi_fld:c d~4)~0.01/str



I get that qs, the query slop, is for explicit phrases in the query, so a 
b~5makes sense.   I also get that ps is for boosting of phrases, so I 
get  (bi_fld:c d~4) … but where is   (cjk_uni_pub_search:a b c d~4)  ?



Using dismax (instead of edismax):

args:   q={!dismax  qf=bi_fld pf=bi_fld}a b c dqs=5ps=4

debugQuery:
 str name=rawquerystring{!dismax qf=bi_fld pf=bi_fld}a b c d/str
 str name=querystring{!dismax qf=bi_fld pf=bi_fld}a b c d/str
 str name=parsedquery(+((DisjunctionMaxQuery((bi_fld:a b~5)~0.01) 
DisjunctionMaxQuery((bi_fld:c)~0.01) 
DisjunctionMaxQuery((bi_fld:d)~0.01))~3) DisjunctionMaxQuery((bi_fld:a b c 
d~4)~0.01))/no_coord/str
 str name=parsedquery_toString+(((bi_fld:a b~5)~0.01 (bi_fld:c)~0.01 
(bi_fld:d)~0.01)~3) (bi_fld:a b c d~4)~0.01/str



So is this an edismax bug?



FYI,   I am running Solr 4.4. I have fields defined like so:
fieldtype name=text_cjk_bi class=solr.TextField 
positionIncrementGap=1 autoGeneratePhraseQueries=false

 analyzer
   tokenizer class=solr.ICUTokenizerFactory /
   filter class=solr.CJKWidthFilterFactory/
   filter class=solr.ICUTransformFilterFactory 
id=Traditional-Simplified/

   filter class=solr.ICUTransformFilterFactory id=Katakana-Hiragana/
   filter class=solr.ICUFoldingFilterFactory/
   filter class=solr.CJKBigramFilterFactory han=true hiragana=true 
katakana=true hangul=true outputUnigrams=false /

 /analyzer
/fieldtype

The request handler uses edismax:

requestHandler name=search class=solr.SearchHandler default=true
lst name=defaults
str name=defTypeedismax/str
str name=q.alt:/str
str name=mm6-1 690%/str
int name=qs1/int
int name=ps0/int