Re: Solr cloud and auto shard timeline

2013-03-22 Thread Otis Gospodnetic
Hi,

I think there is a mixup here.  SolrCloud has the same sharding
capabilities as ES at this point, I believe, other than manual moving of
shards Mark mentions.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Thu, Mar 21, 2013 at 7:08 PM, Jamie Johnson jej2...@gmail.com wrote:

 I've seen that Elastic Search has had auto shardding capabilities for some
 time, is there a timeline for when a similar capability is being targeted
 for Solr Cloud?



Re: Writing new indexes from index readers slow!

2013-03-22 Thread Otis Gospodnetic
Jed,

While this is something completely different, have you considered using
SolrEntityProcessor instead? (assuming all your fields are stored)
http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Thu, Mar 21, 2013 at 2:25 PM, Jed Glazner jglaz...@adobe.com wrote:

  Hey Hey Everybody!,

 I'm not sure if I should have posted this to the developers list... if i'm
 totally barking up the wrong tree here, please let me know!

 Anywho, I've developed a command line utility based on the
 MultiPassIndexSplitter class from the lucene library, but I'm finding that
 on our large index (350GB), it's taking WAY to long to write the newly
 split indexes! It took 20.5 hours for execution to finish. I should note
 that solr is not running while I'm splitting the index. Because solr can't
 really be running while I run this tool performance is critical as our
 service will be down!

 I am aware that there is an api currently under development on trunk in
 solr cloud (https://issues.apache.org/jira/browse/SOLR-3755) but I need
 something now as our large index wreaking havoc on our service.

 Here is some basic context info:

 The Index:
 ==
 Solr/Lucene 4.1
 Index Size: 350GB
 Documents: 185,194,528

 The Hardware (http://aws.amazon.com/ec2/instance-types/):
 ===
 AWS High-Memory X-Large (m2.xlarge) instance
 CPU: 8 cores (2 virtual cores with 3.25 EC2 Compute Units each)
 17.1 GB ram
 1.2TB ebs raid

 The Process (splitting 1 index into 8):
 ===
 I'm trying to split this index into 8 separate indexes using this tool.
 To do this I create 8 worker threads.  Each thread creates gets a new
 FakeDeleteIndexReader object, and loops over every document, and uses a
 hash algorithm to decide if it should keep or delete the document.  Note
 that the documents are not actually deleted at this point because (as I
 understand it) the  FakeDeleteIndexReader emulates deletes without actually
 modifying the underlying index.

 After each worker has determined which documents it should keep I create a
 new Directory object, Instanciate a new IndexWriter, and pass the
 FakeDeleteIndexReader object to the addIndexs method. (this is the part
 that takes forever!)

 It only takes about an hour for all of the threads to hash/delete the
 documents it doesn't want. However it takes 19+ hours to write all of the
 new indexes!  Watching iowait  The disk doesn't look to be over worked
 (about 85% idle), so i'm baffled as to why it would take that long!  I've
 tried running the write operations inside the worker threads, and serialy
 with no real difference!

 Here is the relevant code that I'm using to write the indexes:

 /**
  * Creates/merges a new index with a FakeDeleteIndexReader. The reader
 should have marked/deleted all
  * of the documents that should not be included in this new index. When
 the index is written/committed
  * these documents will be removed.
  *
  * @param directory
  *The directory object of the new index
  * @param version
  *The lucene version of the index
  * @param reader
  *A FakeDeleteIndexReader that contains lots of uncommitted
 deletes.
  * @throws IOException
  */
 private void writeToDisk(Directory directory, Version version,
 FakeDeleteIndexReader reader) throws IOException
 {
 IndexWriterConfig cfg = new IndexWriterConfig(version, new
 WhitespaceAnalyzer(version));
 cfg.setOpenMode(OpenMode.CREATE);

 IndexWriter w = new IndexWriter(directory, cfg);
 w.addIndexes(reader);
 w.commit();
 w.close();
 reader.close();
 }

 Any Ideas??  I'm happy to share more snippets of source code if that is
 helpful..
 --

 

 *Jed** Glazner*
 Sr. Software Engineer
 Adobe Social

 385.221.1072 (tel)
 801.360.0181 (cell)
 jglaz...@adobe.com

 550 East Timpanogus Circle
 Orem, UT 84097-6215, USA
 www.adobe.com

 ** **



Re: Writing new indexes from index readers slow!

2013-03-22 Thread Jed Glazner
Thanks Otis,

I had not considered that approach, however not all of our fields are stored so 
that's not going to work for me.

I'm wondering if its slow because there is just the one reader getting passed 
to the index writer... I noticed today that the addIndexes method can take an 
array of readers.  Maybe if I can send in an array of readers for the 
individual segments in the index...

I'll try that tomorrow.

Jed

Otis Gospodnetic otis.gospodne...@gmail.com wrote:



Jed,

While this is something completely different, have you considered using 
SolrEntityProcessor instead? (assuming all your fields are stored)
http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Thu, Mar 21, 2013 at 2:25 PM, Jed Glazner 
jglaz...@adobe.commailto:jglaz...@adobe.com wrote:
Hey Hey Everybody!,

I'm not sure if I should have posted this to the developers list... if i'm 
totally barking up the wrong tree here, please let me know!

Anywho, I've developed a command line utility based on the 
MultiPassIndexSplitter class from the lucene library, but I'm finding that on 
our large index (350GB), it's taking WAY to long to write the newly split 
indexes! It took 20.5 hours for execution to finish. I should note that solr is 
not running while I'm splitting the index. Because solr can't really be running 
while I run this tool performance is critical as our service will be down!

I am aware that there is an api currently under development on trunk in solr 
cloud (https://issues.apache.org/jira/browse/SOLR-3755) but I need something 
now as our large index wreaking havoc on our service.

Here is some basic context info:

The Index:
==
Solr/Lucene 4.1
Index Size: 350GB
Documents: 185,194,528

The Hardware (http://aws.amazon.com/ec2/instance-types/):
===
AWS High-Memory X-Large (m2.xlarge) instance
CPU: 8 cores (2 virtual cores with 3.25 EC2 Compute Units each)
17.1 GB ram
1.2TB ebs raid

The Process (splitting 1 index into 8):
===
I'm trying to split this index into 8 separate indexes using this tool.  To do 
this I create 8 worker threads.  Each thread creates gets a new 
FakeDeleteIndexReader object, and loops over every document, and uses a hash 
algorithm to decide if it should keep or delete the document.  Note that the 
documents are not actually deleted at this point because (as I understand it) 
the  FakeDeleteIndexReader emulates deletes without actually modifying the 
underlying index.

After each worker has determined which documents it should keep I create a new 
Directory object, Instanciate a new IndexWriter, and pass the 
FakeDeleteIndexReader object to the addIndexs method. (this is the part that 
takes forever!)

It only takes about an hour for all of the threads to hash/delete the documents 
it doesn't want. However it takes 19+ hours to write all of the new indexes!  
Watching iowait  The disk doesn't look to be over worked (about 85% idle), so 
i'm baffled as to why it would take that long!  I've tried running the write 
operations inside the worker threads, and serialy with no real difference!

Here is the relevant code that I'm using to write the indexes:

/**
 * Creates/merges a new index with a FakeDeleteIndexReader. The reader should 
have marked/deleted all
 * of the documents that should not be included in this new index. When the 
index is written/committed
 * these documents will be removed.
 *
 * @param directory
 *The directory object of the new index
 * @param version
 *The lucene version of the index
 * @param reader
 *A FakeDeleteIndexReader that contains lots of uncommitted deletes.
 * @throws IOException
 */
private void writeToDisk(Directory directory, Version version, 
FakeDeleteIndexReader reader) throws IOException
{
IndexWriterConfig cfg = new IndexWriterConfig(version, new 
WhitespaceAnalyzer(version));
cfg.setOpenMode(OpenMode.CREATE);

IndexWriter w = new IndexWriter(directory, cfg);
w.addIndexes(reader);
w.commit();
w.close();
reader.close();
}

Any Ideas??  I'm happy to share more snippets of source code if that is 
helpful..
--
[cid:part1.06000602.0109@adobe.com]

Jed Glazner
Sr. Software Engineer
Adobe Social

385.221.1072tel:385.221.1072 (tel)
801.360.0181tel:801.360.0181 (cell)
jglaz...@adobe.commailto:jglaz...@adobe.com

550 East Timpanogus Circle
Orem, UT 84097-6215, USA
www.adobe.comhttp://www.adobe.com






Re: Solr 4.2 - Slave Index version is higher than Master

2013-03-22 Thread Mark Miller
The other odd thing here is that this should not stop replication at all. When 
the slave is ahead, it will still have it's index replaced.

- Mark

On Mar 22, 2013, at 1:26 AM, Mark Miller markrmil...@gmail.com wrote:

 I'm working on testing to try and catch what you are seeing here: 
 https://issues.apache.org/jira/browse/SOLR-4629
 
 - Mark
 
 On Mar 22, 2013, at 12:23 AM, Mark Miller markrmil...@gmail.com wrote:
 
 Let me know if there is anything else you can add.
 
 A test with your setup that index n docs randomly, commits, randomly updates 
 a conf file or not, and then replicates and repeats x times does not seem to 
 fail, even with very high values for n and x. On every replication, the 
 versions are compared.
 
 Is there anything else you are putting into this mix?
 
 - Mark
 
 On Mar 21, 2013, at 11:28 PM, Uomesh uom...@gmail.com wrote:
 
 Thank you!!,
 
 Attached is my master solrconfig.xml. I have few custom handlers which you
 might need to remove. In custom handler i have not much code just adding
 some custom data for UI.
 
 Thanks,
 Umesh
 
 On Thu, Mar 21, 2013 at 9:59 PM, Mark Miller-3 [via Lucene]
 ml-node+s472066n4049933h47@n3.nabou
 mible.com ml-node+s472066n4049933...@n3.nabble.com wrote:
 
 Could you attach the master as well?
 
 - Mark
 
 On Mar 21, 2013, at 4:36 PM, Uomesh [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4049933i=0
 wrote:
 
 Hi Mark,
 
 Attached is my solrconfig_slave.xml. My replication interval is 1
 minute(default).
 
 Please let me know if you need any more config details
 
 Thanks,
 umesh
 
 On Thu, Mar 21, 2013 at 3:19 PM, Mark Miller-3 [via Lucene] 
 [hidden email] http://user/SendEmail.jtp?type=nodenode=4049933i=1
 wrote:
 
 Can you give more details about your configuration and setup?
 
 Our best bet is to try and recreate this with a unit test.
 
 - Mark
 
 On Mar 21, 2013, at 4:08 PM, Uomesh [hidden email]
 http://user/SendEmail.jtp?type=nodenode=4049832i=0
 wrote:
 
 Hi,
 
 I am seeing an issue after upgrading from solr 3.6.2 to Solr 4.2. My
 Slave
 stop replicating after sometime. And it seems issue is because of my
 Slave
 Index version is higher than master. How could it be possible to Slave
 Index
 version is higher than master? Please help me. IS there anything i
 need
 to
 remove from my slave solrconfig.xml.
 
 Index Version Gen Size
 Master: 1363893820575 93 8.75 MB
 Slave: 1363896006624 94 8.75 MB
 
 Thanks,
 Umesh
 
 
 
 --
 View this message in context:
 
 http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 --
 If you reply to this email, your message will be added to the
 discussion
 below:
 
 
 http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827p4049832.html
 To unsubscribe from Solr 4.2 - Slave Index version is higher than
 Master, click
 here
 .
 NAML
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
 
 
 
 
 solrconfig_slave.xml (67K) 
 http://lucene.472066.n3.nabble.com/attachment/4049840/0/solrconfig_slave.xml
 
 
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827p4049840.html
 
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 --
 If you reply to this email, your message will be added to the discussion
 below:
 
 http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827p4049933.html
 To unsubscribe from Solr 4.2 - Slave Index version is higher than Master, 
 click
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4049827code=VW9tZXNoQGdtYWlsLmNvbXw0MDQ5ODI3fDIyODkyODYxMg==
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
 
 
 
 solrconfig.xml (74K) 
 http://lucene.472066.n3.nabble.com/attachment/4049934/0/solrconfig.xml
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827p4049934.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 



solr 4.1 replcation whole indexs files from leader

2013-03-22 Thread Brad Hill
Hi,
 I use solrcloud 4.1.
 I start up two solr nodes A and B and then created a new collection using 
CoreAdmin to A using one shard, so Node A is leader.
 Then I index some docs to it. Then I created the same collection using 
CoreAdmin to B to become a replica. I found that solr will sync all index files 
from A to B.
 Under B's data dir, I have: index.20130318083415358 folder which has all the 
synced index files, index.properties, replication.properties and tlog 
folder(empty inside).
 Then I removed the collection from node B using CoreAdmin UNLOAD, I keep all 
files in B's data dir, I didn't delete them.
 Then I create the same collection from B again. I found that solr will sync 
files from A to B AGAIN!!!
 And there is another folder index.20130318084514166 been created under B's 
data folder.
 Actually I didn't index any docs to A after I UNLOAD collection on B.
 So I wonder how to let solr know that B already has the correct index files 
and not to do the sync again? 

Re: DocValues and field requirements

2013-03-22 Thread Marcin Rzewucki
Hi Shawn,

Thank you for your response. Yes, that's strange. By enabling DocValues the
information about missing fields is lost, which changes the way of sorting
as well. Adding default value to the fields can change a logic of
application dramatically (I can't set default value to 0 for all
Trie*Fields fields, because it could impact the results displayed to the
end user, which is not good). It's a pity that using DocValues is so
limited.

Regards.

On 21 March 2013 22:29, Shawn Heisey s...@elyograg.org wrote:

 On 3/21/2013 3:07 PM, Shawn Heisey wrote:

 This might be a requirement of the lower-level Lucene API, or it might
 be a requirement that was instituted at the Solr level because a problem
 was found when docs did not contain the field.  Google seems reluctant
 to tell me, and I haven't figured out the right way to ask.


 Some poking around the Lucene API has turned up an interesting notation on
 all the different types on DocValues:

 http://lucene.apache.org/core/**4_1_0/core/org/apache/lucene/**
 index/DocValues.Type.html#**FIXED_INTS_16http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/index/DocValues.Type.html#FIXED_INTS_16

 They all say that if a value isn't present, zero (or an empty string) is
 assumed, and that there is no way to distinguish this from the same value
 that is intentionally indexed.

 So it appears that Solr *could* use docValues without the required or
 default value restriction, but there is a strong possibility that the
 behavior will not be what the user expects.  When docValues is not turned
 on, there is a clear difference between a default value and a missing
 field.  The sort mechanism without docValues can sort documents with the
 field missing either before or after the other values.  That would be
 impossible with docValues.

 By including the restriction, the dev team has made it less likely that
 the Solr admin will be surprised by the new behavior, because they have to
 change the field definition to make docValues work.

 Thanks,
 Shawn




RE: Solr 4.2 - Slave Index version is higher than Master

2013-03-22 Thread John, Phil (CSS)
To add to the discussion.
 
We're running classic master/slave replication (not solrcloud) with 1 master 
and 2 slaves and I noticed the slave having a higher version number than the 
master the other day as well.
 
In our case, knock on wood, it hasn't stopped replication.
 
If you'd like a copy of our config I can provide off-list.
 
Regards,
 
Phil.



From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: Fri 22/03/2013 06:32
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.2 - Slave Index version is higher than Master



The other odd thing here is that this should not stop replication at all. When 
the slave is ahead, it will still have it's index replaced.

- Mark

On Mar 22, 2013, at 1:26 AM, Mark Miller markrmil...@gmail.com wrote:

 I'm working on testing to try and catch what you are seeing here: 
 https://issues.apache.org/jira/browse/SOLR-4629

 - Mark

 On Mar 22, 2013, at 12:23 AM, Mark Miller markrmil...@gmail.com wrote:

 Let me know if there is anything else you can add.

 A test with your setup that index n docs randomly, commits, randomly updates 
 a conf file or not, and then replicates and repeats x times does not seem to 
 fail, even with very high values for n and x. On every replication, the 
 versions are compared.

 Is there anything else you are putting into this mix?

 - Mark

 On Mar 21, 2013, at 11:28 PM, Uomesh uom...@gmail.com wrote:

 Thank you!!,

 Attached is my master solrconfig.xml. I have few custom handlers which you
 might need to remove. In custom handler i have not much code just adding
 some custom data for UI.

 Thanks,
 Umesh

 On Thu, Mar 21, 2013 at 9:59 PM, Mark Miller-3 [via Lucene]
 ml-node+s472066n4049933h47@n3.nabou
 mible.com ml-node+s472066n4049933...@n3.nabble.com wrote:

 Could you attach the master as well?

 - Mark

 On Mar 21, 2013, at 4:36 PM, Uomesh [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4049933i=0
 wrote:

 Hi Mark,

 Attached is my solrconfig_slave.xml. My replication interval is 1
 minute(default).

 Please let me know if you need any more config details

 Thanks,
 umesh

 On Thu, Mar 21, 2013 at 3:19 PM, Mark Miller-3 [via Lucene] 
 [hidden email] http://user/SendEmail.jtp?type=nodenode=4049933i=1
 wrote:

 Can you give more details about your configuration and setup?

 Our best bet is to try and recreate this with a unit test.

 - Mark

 On Mar 21, 2013, at 4:08 PM, Uomesh [hidden email]
 http://user/SendEmail.jtp?type=nodenode=4049832i=0
 wrote:

 Hi,

 I am seeing an issue after upgrading from solr 3.6.2 to Solr 4.2. My
 Slave
 stop replicating after sometime. And it seems issue is because of my
 Slave
 Index version is higher than master. How could it be possible to Slave
 Index
 version is higher than master? Please help me. IS there anything i
 need
 to
 remove from my slave solrconfig.xml.

 Index Version Gen Size
 Master: 1363893820575 93 8.75 MB
 Slave: 1363896006624 94 8.75 MB

 Thanks,
 Umesh



 --
 View this message in context:

 http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827.html
 Sent from the Solr - User mailing list archive at Nabble.com.



 --
 If you reply to this email, your message will be added to the
 discussion
 below:


 http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827p4049832.html
 To unsubscribe from Solr 4.2 - Slave Index version is higher than
 Master, click
 here
 .
 NAML
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




 solrconfig_slave.xml (67K) 
 http://lucene.472066.n3.nabble.com/attachment/4049840/0/solrconfig_slave.xml





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827p4049840.html

 Sent from the Solr - User mailing list archive at Nabble.com.



 --
 If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827p4049933.html
 To unsubscribe from Solr 4.2 - Slave Index version is higher than Master, 
 click
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4049827code=VW9tZXNoQGdtYWlsLmNvbXw0MDQ5ODI3fDIyODkyODYxMg==
 .
 

Re: Solr cloud and auto shard timeline

2013-03-22 Thread Jamie Johnson
I am sorry for the confusion, I had assumed that there was a way to issue
commands to ES to have it change it's current shard layout (i.e. go from 2
to 4 for instance) but on further reading of their documentation I do not
see that.  That being said is there a timeline on being able to add shards
to solr cloud by splitting an existing shard (or set of shards) and does
anyone have a good writeup of the different capabilities between the two at
this point?


On Fri, Mar 22, 2013 at 2:01 AM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Hi,

 I think there is a mixup here.  SolrCloud has the same sharding
 capabilities as ES at this point, I believe, other than manual moving of
 shards Mark mentions.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Thu, Mar 21, 2013 at 7:08 PM, Jamie Johnson jej2...@gmail.com wrote:

  I've seen that Elastic Search has had auto shardding capabilities for
 some
  time, is there a timeline for when a similar capability is being targeted
  for Solr Cloud?
 



Re: Don't cache filter queries

2013-03-22 Thread Dotan Cohen
On Thu, Mar 21, 2013 at 6:22 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : Just add {!cache=false} to the filter in your query
 : (http://wiki.apache.org/solr/SolrCaching#filterCache).
 ...
 :  I need to use the filter query feature to filter my results, but I
 :  don't want the results cached as documents are added to the index
 :  several times per second and the results will be state immediately. Is
 :  there any way to disable filter query caching?

 Or remove the filterCache config option from your solrconfig.xml if you
 really don't want any caching of any filter queries.

 Fnrakly though: that's throwing the baby out with the bath water -- just
 because you are updating your index super-fast-like doesn't mean you
 aren't getting benefts from the caches, particularly from commonly
 reused filters which are applied to many qureies which might get
 executed concurrently -- not to entnion that a single filter might be
 reused multiple times within a single request to solr.

 disabling cache *warming* can make a lot of sense in NRT cases, but
 eliminating caching alltogether rarely does.


Thanks. The problem is that the queries with filter queries are taking
much longer to run (~60-80 ms) than the queries without (~1-4 ms). I
figured that the problem may have been with the caching.

In fact, running a query with a filter query and caching disabled is
running in the range of 16-30 ms, which is quite an improvement.

Thanks.


-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Solr 4.2 - Slave Index version is higher than Master

2013-03-22 Thread Bernd Fehling
That issue was already with solr 4.1.
http://lucene.472066.n3.nabble.com/replication-problems-with-solr4-1-td4039647.html

Nice to know that it is still there in 4.2.

With some luck it will make it to 4.2.1 ;-)

Regards
Bernd

Am 21.03.2013 21:08, schrieb Uomesh:
 Hi,
 
 I am seeing an issue after upgrading from solr 3.6.2 to Solr 4.2. My Slave
 stop replicating after sometime. And it seems issue is because of my Slave
 Index version is higher than master. How could it be possible to Slave Index
 version is higher than master? Please help me. IS there anything i need to
 remove from my slave solrconfig.xml.
 
 Index Version Gen Size
 Master:   1363893820575   93  8.75 MB
 Slave:1363896006624   94  8.75 MB
 
 Thanks,
 Umesh
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 


Using Solr For a Real Search Engine

2013-03-22 Thread Furkan KAMACI
If I want to use Solr in a web search engine what kind of strategies should
I follow about how to run Solr. I mean I can run it via embedded jetty or
use war and deploy to a container? You should consider that I will have
heavy work load on my Solr.


RE: Logging inside a custom analyzer

2013-03-22 Thread Gian Maria Ricci
Thanks a lot, it was exactly what I need, sorry for not being so clear with
my question :).

Gian Maria.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, March 19, 2013 3:04 PM
To: solr-user@lucene.apache.org; alkamp...@nablasoft.com
Subject: Re: Logging inside a custom analyzer

what do you mean  log information into solar from a custom analyzer? Have
info go from your custom analyzer into the Solr log? In which case, just do
something
like:
private static final Logger log =
LoggerFactory.getLogger(YourPrivateClass.class.getName());

and then in your code something like
log.info(your message here);

Best
Erick


On Tue, Mar 19, 2013 at 1:32 AM, Gian Maria Ricci
alkamp...@nablasoft.comwrote:

 Hi to everyone,



 What is the best way to log information into solar from a custom analyzer?
 Is there any way to integrate log4j or is it better to use some solr 
 logging method?



 Thanks again for your invaluable help



 Gian Maria








Re: SOLR - Documents with large number of fields ~ 450

2013-03-22 Thread Marcin Rzewucki
Hi,

I have a collection with more than 4K fields, but mostly Trie*Fields types.
It is used for faceting,sorting,searching and statsComponent. It works
pretty fine on Amazon 4xm1.large (7.5GB RAM) EC2 boxes. I'm using
SolrCloud, multi A-Z setup and ephemeral storage. Index is managed by mmap,
4GB for Java heap, CMS for GC. Currently there is 800K records, but will be
about 2m. Queries response is much longer (couple to dozen of seconds)
during bulk loading, but this is rather typical as I think. Indexing takes
much much longer than in case of records with less number of fields. I'm
sending updates in 5MB batches. No OOM issues.

Regarding DocValues: I believe they are great improvement for faceting, but
they are annoying because of their limitations: as far as I checked a field
has to be required or to have default value which is not possible in my
case (I can't set some figures to 0 by default as it may impact other
results displayed to the end user, which is not good). I wish it could
change.

Regards.

On 21 March 2013 07:56, kobe.free.wo...@gmail.com kobe.free.wo...@gmail.com
 wrote:

 Hello All,

 Scenario:

 My data model consist of approx. 450 fields with different types of data.
 We
 want to include each field for indexing as a result it will create a single
 SOLR document with *450 fields*. The total of number of records in the data
 set is *755K*. We will be using the features like faceting and sorting on
 approx. 50 fields.

 We are planning to use SOLR 4.1. Following is the hardware configuration of
 the web server that we plan to install SOLR on:-

 CPU: 2 x Dual Core (4 cores) | RAM: 12GB | Storage: 212 GB

 Questions :

 1)What's the best approach when dealing with documents with large number of
 fields. What's the drawback of having a single document with a very large
 number of fields. Does SOLR support documents with large number of fields
 as
 in my case?

 2)Will there be any performance issue if i define all of the 450 fields for
 indexing? Also if faceting is done on 50 fields with document having large
 number of fields and huge number of records?

 3)The name of the fields in the data set are quiet lengthy around 60
 characters. Will it be a problem defining fields with such a huge name in
 the schema file? Is there any best practice to be followed related to
 naming
 convention? Will big field names create problem during querying?

 Thanks!



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SOLR-Documents-with-large-number-of-fields-450-tp4049633.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr cloud and auto shard timeline

2013-03-22 Thread Jamie Johnson
Yes Anshum exactly what I was looking for.  Is this being targeted in a
particular solr release?  I see that some of the related issues are
targeted for 4.3, is that the goal for this as well?


On Fri, Mar 22, 2013 at 8:07 AM, Anshum Gupta ans...@anshumgupta.netwrote:

 Hi Jamie,

 There's progress on the Shard splitting JIRA that I believe you are talking
 about.
 You may have a look at this for more details:
 https://issues.apache.org/jira/browse/SOLR-3755 .


 On Fri, Mar 22, 2013 at 4:30 PM, Jamie Johnson jej2...@gmail.com wrote:

  I am sorry for the confusion, I had assumed that there was a way to issue
  commands to ES to have it change it's current shard layout (i.e. go from
 2
  to 4 for instance) but on further reading of their documentation I do not
  see that.  That being said is there a timeline on being able to add
 shards
  to solr cloud by splitting an existing shard (or set of shards) and does
  anyone have a good writeup of the different capabilities between the two
 at
  this point?
 
 
  On Fri, Mar 22, 2013 at 2:01 AM, Otis Gospodnetic 
  otis.gospodne...@gmail.com wrote:
 
   Hi,
  
   I think there is a mixup here.  SolrCloud has the same sharding
   capabilities as ES at this point, I believe, other than manual moving
 of
   shards Mark mentions.
  
   Otis
   --
   Solr  ElasticSearch Support
   http://sematext.com/
  
  
  
  
  
   On Thu, Mar 21, 2013 at 7:08 PM, Jamie Johnson jej2...@gmail.com
  wrote:
  
I've seen that Elastic Search has had auto shardding capabilities for
   some
time, is there a timeline for when a similar capability is being
  targeted
for Solr Cloud?
   
  
 



 --

 Anshum Gupta
 http://www.anshumgupta.net



Re: Slow queries for common terms

2013-03-22 Thread Jan Høydahl
Hi

There might not be a final cure with more RAM if you are CPU bound. Scoring 90M 
docs is some work. Can you check what's going on during those 15 seconds? Is 
your CPU at 100%? Try an (foo OR bar OR baz) search which generates 100mill 
hits and see if that is slow too, even if you don't use frequent words.

I'm sure you can find other frequent terms in your corpus which display similar 
behaviour, words which are even more frequent than book. Are you using AND 
as default operator? You will benefit from limiting the number of results as 
much as possible.

The real solution is to shard across N number of servers, until you reach the 
desired performance for the desired indexing/querying load.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

22. mars 2013 kl. 02:52 skrev David Parks davidpark...@yahoo.com:

 I figured I was trying to pull a coup here, but this is a temporary
 configuration while we only run a few users through an early beta. The
 performance is perfectly good for most terms, it's just this books term. I'm
 curious how adding RAM will solve that. I can see how deploying solr cloud
 and sharding should affect it, but would simply giving Solr 16GB of ram
 improve query time with this one term that is common to 90M of the 300M
 documents?
 
 In due time I do plan to implement solr cloud and run the whole thing
 through proper load testing. Right now I'm just trying to get it to work
 for a few users. If you could elaborate a bit on your thinking I'd be quite
 grateful.
 
 David
 
 
 -Original Message-
 From: Jan Høydahl [mailto:jan@cominvent.com] 
 Sent: Thursday, March 21, 2013 8:01 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Slow queries for common terms
 
 Hi,
 
 If you say that you try to index 300M docs in ONE single Solr server, with
 a few gigs of RAM, then that's the reason for some bad performance right
 there. You should benchmark to find the sweet-spot of how many documents you
 want to fit per node/shard and still have acceptable indexing/query
 performance.
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com
 
 21. mars 2013 kl. 12:43 skrev David Parks davidpark...@yahoo.com:
 
 We have 300M documents, each about a paragraph of text on average. The 
 index is 140GB in size. I'm not sure how to find the IDF score, was 
 that in the debug query below?
 
 It seems that any query with the word book in it triggers a 15 sec 
 response time (unless it's the 2nd time we run the same query). 
 Looking at terms, 'book' is the 2nd highest term with 90M documents in the
 index.
 
 Calling 'book' a stop word doesn't seem reasonable, and while that 
 article on bigrams and common grams is fascinating, I wonder if it 
 addresses this situation, in which we aren't really likely to manage a 
 bi-gram phrase match between the search book sales improvement, and the
 terms in the document:
 category book marketing and sales today the real guide to improving
 right?
 I think this is what's happening here, everything with a common phrase 
 category book is getting included, which seems logical and correct.
 
 
 
 -Original Message-
 From: Jan Høydahl [mailto:jan@cominvent.com]
 Sent: Thursday, March 21, 2013 5:43 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Slow queries for common terms
 
 Hi,
 
 I think you can start by reading this blog 
 http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-co
 mmon-w
 ords-part-2 and try out the approach using a dictionary of the most 
 common words in your index.
 
 You don't say how many documents, avg. doc size, the IDF value of 
 book, how much RAM, whether you utilize disk caching well enough and 
 many other things which could affect this situation. But the pure fact 
 that only a few common search words trigger such a delay would suggest 
 commongrams as a possible way forward.
 
 --
 Jan Høydahl, search solution architect Cominvent AS - 
 www.cominvent.com Solr Training - www.solrtraining.com
 
 21. mars 2013 kl. 11:09 skrev David Parks davidpark...@yahoo.com:
 
 I've got a query that takes 15 seconds to return whenever I have the 
 term book in a query that isn't cached. That's a pretty common term 
 in our search index. We're indexing about 120 GB of text data. We 
 only store terms and IDs, no document data, and the disk is virtually 
 unused, it's all CPU time.
 
 
 
 I haven't done much yet to optimizing and scale solr, as we're only 
 trying to support a small number of users in a private beta. I 
 currently only have a couple of gigs of ram dedicated to Solr (we've 
 ordered more hardware for it, but it's not in yet).
 
 
 
 I wonder if there's something I can do in the short term to alleviate 
 the problem. Many searches work great, but these ones that take 15+ 
 sec are a black eye. I'd be happy with a short term fix followed in 
 the near future by a more proper 

Solr 4.2 replcation whole index files mechanism.

2013-03-22 Thread bradhill99
Hi,
 I use solrcloud 4.1. 
 I start up two solr nodes A and B and then created a new collection using
CoreAdmin to A using one shard, so Node A is leader. 
 Then I index some docs to it. Then I created the same collection using
CoreAdmin to B to become a replica. I found that solr will sync all index
files from A to B. 
 Under B's data dir, I have: index.20130318083415358 folder which has all
the synced index files, index.properties, replication.properties and tlog
folder(empty inside). 
 Then I removed the collection from node B using CoreAdmin UNLOAD, I keep
all files in B's data dir, I didn't delete them. 
 Then I create the same collection from B again. I found that solr will sync
files from A to B AGAIN!!! 
 And there is another folder index.20130318084514166 been created under B's
data folder. 
 Actually I didn't index any docs to A after I UNLOAD collection on B. 
 So I wonder how to let solr know that B already has the correct index files
and not to do the sync again?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-2-replcation-whole-index-files-mechanism-tp4049980.html
Sent from the Solr - User mailing list archive at Nabble.com.


Urgent:Solr cloud issue

2013-03-22 Thread anuj vats
Hi Shawan,


I have seen your post on solr cloude Master-Master configuration on two 
servers. I have to use the same Solr structure, but from long I am not able to 
configure it to comunicate between two server, on single server it works fine.
Can you pls help me out to provide required config changes, so that SOLR can 
comunicate between two servers.


http://grokbase.com/t/lucene/solr-user/132pb1pe34/solrcloud-master-master


Regards
Anuj Vats

PatternReplaceFilterFactory -- what does this regex do?

2013-03-22 Thread Eric Wilson
I'm using the Solr Suggester for autocompletion with WFSTLookup suggest
component, and a text file with phrases and weights. (
http://wiki.apache.org/solr/Suggester)

I found that the following filter made it impossible to match on
ampersands. So I removed it. But I'm sure it was there for a reason. What
was this suppose to do?

filter class=solr.PatternReplaceFilterFactory pattern=
([^\p{L}\p{M}\p{N}\p{Cs}]*[\p{L}\p{M}\p{N}\p{Cs}\_]+:)|([^\p{L}\p{M}\p{N}\p{Cs}])+
replacement=  replace=all/

Thanks,

Eric Wilson


Re: Sort-field for ALL docs in FieldCache for sort queries - OOM on lots of docs

2013-03-22 Thread Per Steffensen

On 3/21/13 10:50 PM, Shawn Heisey wrote:

On 3/21/2013 4:05 AM, Per Steffensen wrote:

Can anyone else elaborate? How to activate it? How to make sure, for
sorting, that sort-field-value for all docs are not read into memory for
sorting - leading to OOM when you have a lot of docs? Can this feature
be activated on top of an existing 4.0 index, or do you have to re-index
everything?


There is one requirement that may not be obvious - every document must 
have a value in the field, so you must either make the field either 
required or give it a default value in the schema.  Solr 4.2 will 
refuse to start the core if this requirement is not met.

That is not problem for us. The field exist on every document.
The example schema hints that the value might need to be 
single-valued.  I have not tested this.  Sorting is already 
problematic on multi-valued fields, so I assume that this won't be the 
case for you.

That is not a problem for us either. The field is single-valued.


To use docValues, add docValues=true and then either set 
required=true or default=somevalue on the field definition in 
schema.xml, restart Solr or reload the core, and reindex.  Your index 
will get bigger.

So the answer to ...or do you have to re-index everything? is yes!?


If the touted behavior of handling the sort mechanism in OS disk cache 
memory (or just reading the disk if there's not enough memory) rather 
than heap is correct, then it should solve your issues.  I hope it does!
Me too. I will find out soon - I hope! But re-indexing is kinda a 
problem for us, but we will figure out.
Any guide to re-index all you stuff anywhere, so I do it the easiest 
way? Guess maybe there are some nice tricks about steaming data directly 
from one Solr running the old index into a new Solr running the new 
index, and then discard the old index afterwards?


Thanks,
Shawn



Thanks a lot, Shawn!

Regards, Per Steffensen


Re: Sort-field for ALL docs in FieldCache for sort queries - OOM on lots of docs

2013-03-22 Thread Shawn Heisey
On 3/22/2013 8:54 AM, Per Steffensen wrote:
 Me too. I will find out soon - I hope! But re-indexing is kinda a
 problem for us, but we will figure out.
 Any guide to re-index all you stuff anywhere, so I do it the easiest
 way? Guess maybe there are some nice tricks about steaming data directly
 from one Solr running the old index into a new Solr running the new
 index, and then discard the old index afterwards?

There is no guide to reindexing, because there are so many ways to
index.  The basic procedure is to repeat whatever you did the first
time, possibly deleting the entire index first.  Because Lucene and Solr
indexes often require changes to deal with changing requirements, the
full index procedure should be automated and repeatable.

The dataimport handler has a SolrEntityProcessor that can index from
another Solr instance.  All fields must be stored for this to work,
because it just retrieves documents and ignores the search index.  Many
people (including myself) do not store all fields, in an attempt to keep
the index size down.

Thanks,
Shawn



Solr 4.2, reindexing, transaction logs, high memory usage

2013-03-22 Thread Raghav Karol
Dear List,

We are using solr-4.2 to build an index of 5M docs each limited to 6K
in size. Conceptually we are modelling a stack of documents. Here is a
excerpt from our schema.xml

   dynamicField name=publicationBody_*   type=string
indexed=false stored=true  multiValued=false termVectors=false
/
   copyFieldsource=publicationBody_*   dest=publicationBodies/

We have publicationBody_1: ..., publicationBody_2: ... maximum of 30
with max 10K of data in each.

We run this index in 8 solr sharded in 8 solr cores on a single host
an m2.4xlarge EC2 instances. We do not use zookeeper (because of
operational issues on our live indexes) and manage the sharding
ourselves.

For this index we run with -Xmx30G and observe in (jsconsole) that the
solr runs with approximately 25G.
Autocommit kills solr, it sends heap memory usage to max and kills
solr. The reason appears to be committing to all cores in parallel.
Disabling autoCommit and  running a loop like
while(true); do for i in $(seq 0 7); do curl -s
http://localhost:8085/solr/core${i}/update?commit=truewt=json; done

produces:

{responseHeader:{status:0,QTime:8297}}
{responseHeader:{status:0,QTime:8358}}
{responseHeader:{status:0,QTime:9552}}
{responseHeader:{status:0,QTime:8368}}
{responseHeader:{status:0,QTime:9296}}
{responseHeader:{status:0,QTime:8527}}
{responseHeader:{status:0,QTime:9458}}
{responseHeader:{status:0,QTime:8929}}

8 seconds to process a commit where with no changes to the index!?!

Transaction Logs

55M /mnt/solr-stack/solr.data.0/tlog
45M /mnt/solr-stack/solr.data.1/tlog
28M /mnt/solr-stack/solr.data.2/tlog
17M /mnt/solr-stack/solr.data.3/tlog
118M/mnt/solr-stack/solr.data.4/tlog
123M/mnt/solr-stack/solr.data.5/tlog
68M /mnt/solr-stack/solr.data.6/tlog
63M /mnt/solr-stack/solr.data.7/tlog

Index
---
2.8G/mnt/solr-stack/solr.data.0/index
2.7G/mnt/solr-stack/solr.data.1/index
3.2G/mnt/solr-stack/solr.data.2/index
2.7G/mnt/solr-stack/solr.data.3/index
3.1G/mnt/solr-stack/solr.data.4/index
2.7G/mnt/solr-stack/solr.data.5/index
2.9G/mnt/solr-stack/solr.data.6/index
3.0G/mnt/solr-stack/solr.data.7/index

Why does solr need such a large heap space for this index (it dies
with 10G and 20G and is constant at 28G in jconsole)?
Why does running a commits in parallel via autoCommit or the command
exhaust the memory?
Are we using dynamic fields incorrectly?

We have also tried to run the same index on an SSD-disk backed
hi1.4xlarge Amazon instance. Here autoCommit every 30 seconds works,
rotating transaction logs files correctly.

--
Raghav
Senior backend developer - www.issuu.com


Re: Solr 4.2 replcation whole index files mechanism.

2013-03-22 Thread Mark Miller
There are a few things going on here that caused this, all resolved in 4.2 as 
far as I know. 

- Mark

On Mar 22, 2013, at 3:56 AM, bradhill99 bradhil...@yahoo.com wrote:

 Hi,
 I use solrcloud 4.1. 
 I start up two solr nodes A and B and then created a new collection using
 CoreAdmin to A using one shard, so Node A is leader. 
 Then I index some docs to it. Then I created the same collection using
 CoreAdmin to B to become a replica. I found that solr will sync all index
 files from A to B. 
 Under B's data dir, I have: index.20130318083415358 folder which has all
 the synced index files, index.properties, replication.properties and tlog
 folder(empty inside). 
 Then I removed the collection from node B using CoreAdmin UNLOAD, I keep
 all files in B's data dir, I didn't delete them. 
 Then I create the same collection from B again. I found that solr will sync
 files from A to B AGAIN!!! 
 And there is another folder index.20130318084514166 been created under B's
 data folder. 
 Actually I didn't index any docs to A after I UNLOAD collection on B. 
 So I wonder how to let solr know that B already has the correct index files
 and not to do the sync again?
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-4-2-replcation-whole-index-files-mechanism-tp4049980.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr 4.2 - Slave Index version is higher than Master

2013-03-22 Thread Mark Miller
Are you replicating configuration files as well?

- Mark

On Mar 22, 2013, at 6:38 AM, John, Phil (CSS) philj...@capita.co.uk wrote:

 To add to the discussion.
 
 We're running classic master/slave replication (not solrcloud) with 1 master 
 and 2 slaves and I noticed the slave having a higher version number than the 
 master the other day as well.
 
 In our case, knock on wood, it hasn't stopped replication.
 
 If you'd like a copy of our config I can provide off-list.
 
 Regards,
 
 Phil.
 
 
 
 From: Mark Miller [mailto:markrmil...@gmail.com]
 Sent: Fri 22/03/2013 06:32
 To: solr-user@lucene.apache.org
 Subject: Re: Solr 4.2 - Slave Index version is higher than Master
 
 
 
 The other odd thing here is that this should not stop replication at all. 
 When the slave is ahead, it will still have it's index replaced.
 
 - Mark
 
 On Mar 22, 2013, at 1:26 AM, Mark Miller markrmil...@gmail.com wrote:
 
 I'm working on testing to try and catch what you are seeing here: 
 https://issues.apache.org/jira/browse/SOLR-4629
 
 - Mark
 
 On Mar 22, 2013, at 12:23 AM, Mark Miller markrmil...@gmail.com wrote:
 
 Let me know if there is anything else you can add.
 
 A test with your setup that index n docs randomly, commits, randomly 
 updates a conf file or not, and then replicates and repeats x times does 
 not seem to fail, even with very high values for n and x. On every 
 replication, the versions are compared.
 
 Is there anything else you are putting into this mix?
 
 - Mark
 
 On Mar 21, 2013, at 11:28 PM, Uomesh uom...@gmail.com wrote:
 
 Thank you!!,
 
 Attached is my master solrconfig.xml. I have few custom handlers which you
 might need to remove. In custom handler i have not much code just adding
 some custom data for UI.
 
 Thanks,
 Umesh
 
 On Thu, Mar 21, 2013 at 9:59 PM, Mark Miller-3 [via Lucene]
 ml-node+s472066n4049933h47@n3.nabou
 mible.com ml-node+s472066n4049933...@n3.nabble.com wrote:
 
 Could you attach the master as well?
 
 - Mark
 
 On Mar 21, 2013, at 4:36 PM, Uomesh [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4049933i=0
 wrote:
 
 Hi Mark,
 
 Attached is my solrconfig_slave.xml. My replication interval is 1
 minute(default).
 
 Please let me know if you need any more config details
 
 Thanks,
 umesh
 
 On Thu, Mar 21, 2013 at 3:19 PM, Mark Miller-3 [via Lucene] 
 [hidden email] http://user/SendEmail.jtp?type=nodenode=4049933i=1
 wrote:
 
 Can you give more details about your configuration and setup?
 
 Our best bet is to try and recreate this with a unit test.
 
 - Mark
 
 On Mar 21, 2013, at 4:08 PM, Uomesh [hidden email]
 http://user/SendEmail.jtp?type=nodenode=4049832i=0
 wrote:
 
 Hi,
 
 I am seeing an issue after upgrading from solr 3.6.2 to Solr 4.2. My
 Slave
 stop replicating after sometime. And it seems issue is because of my
 Slave
 Index version is higher than master. How could it be possible to Slave
 Index
 version is higher than master? Please help me. IS there anything i
 need
 to
 remove from my slave solrconfig.xml.
 
 Index Version Gen Size
 Master: 1363893820575 93 8.75 MB
 Slave: 1363896006624 94 8.75 MB
 
 Thanks,
 Umesh
 
 
 
 --
 View this message in context:
 
 http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 --
 If you reply to this email, your message will be added to the
 discussion
 below:
 
 
 http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827p4049832.html
 To unsubscribe from Solr 4.2 - Slave Index version is higher than
 Master, click
 here
 .
 NAML
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
 
 
 
 
 solrconfig_slave.xml (67K) 
 http://lucene.472066.n3.nabble.com/attachment/4049840/0/solrconfig_slave.xml
 
 
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827p4049840.html
 
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 --
 If you reply to this email, your message will be added to the discussion
 below:
 
 http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827p4049933.html
 To unsubscribe from Solr 4.2 - Slave Index version is higher than Master, 
 click
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4049827code=VW9tZXNoQGdtYWlsLmNvbXw0MDQ5ODI3fDIyODkyODYxMg==
 .
 

Re: SOLR - Documents with large number of fields ~ 450

2013-03-22 Thread John Nielsen
with the on disk option.

Could you elaborate on that?
Den 22/03/2013 05.25 skrev Mark Miller markrmil...@gmail.com:

 You might try using docvalues with the on disk option and try and let the
 OS manage all the memory needed for all the faceting/sorting. This would
 require Solr 4.2.

 - Mark

 On Mar 21, 2013, at 2:56 AM, kobe.free.wo...@gmail.com wrote:

  Hello All,
 
  Scenario:
 
  My data model consist of approx. 450 fields with different types of
 data. We
  want to include each field for indexing as a result it will create a
 single
  SOLR document with *450 fields*. The total of number of records in the
 data
  set is *755K*. We will be using the features like faceting and sorting on
  approx. 50 fields.
 
  We are planning to use SOLR 4.1. Following is the hardware configuration
 of
  the web server that we plan to install SOLR on:-
 
  CPU: 2 x Dual Core (4 cores) | RAM: 12GB | Storage: 212 GB
 
  Questions :
 
  1)What's the best approach when dealing with documents with large number
 of
  fields. What's the drawback of having a single document with a very large
  number of fields. Does SOLR support documents with large number of
 fields as
  in my case?
 
  2)Will there be any performance issue if i define all of the 450 fields
 for
  indexing? Also if faceting is done on 50 fields with document having
 large
  number of fields and huge number of records?
 
  3)The name of the fields in the data set are quiet lengthy around 60
  characters. Will it be a problem defining fields with such a huge name in
  the schema file? Is there any best practice to be followed related to
 naming
  convention? Will big field names create problem during querying?
 
  Thanks!
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/SOLR-Documents-with-large-number-of-fields-450-tp4049633.html
  Sent from the Solr - User mailing list archive at Nabble.com.




Re: Solr 4.2 - Slave Index version is higher than Master

2013-03-22 Thread Uomesh
Hi Mrk,

I am replicating below config files but not replicating solrconfig.xml.

confFiles:schema.xml, elevate.xml, stopwords.txt, mapping-FoldToASCII.txt,
mapping-ISOLatin1Accent.txt, protwords.txt, spellings.txt, synonyms.txt


also strange I am seeing big Gen difference between Master and slave. My
master slave is 2 while Slave is 56. If i do the full import then the Gen
is getting higher then slave one and its replicating. i have more than 30
cores on my solr instance and all are scheduled to replicate on same time.

IndexVersionGenSizeMaster:
1363903243590
2
94 bytes
Slave:
1363967579193
56
94 bytes

Thanks,
Umesh


On Fri, Mar 22, 2013 at 10:42 AM, Mark Miller-3 [via Lucene] 
ml-node+s472066n4050075...@n3.nabble.com wrote:

 Are you replicating configuration files as well?

 - Mark

 On Mar 22, 2013, at 6:38 AM, John, Phil (CSS) [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4050075i=0
 wrote:

  To add to the discussion.
 
  We're running classic master/slave replication (not solrcloud) with 1
 master and 2 slaves and I noticed the slave having a higher version number
 than the master the other day as well.
 
  In our case, knock on wood, it hasn't stopped replication.
 
  If you'd like a copy of our config I can provide off-list.
 
  Regards,
 
  Phil.
 
  
 
  From: Mark Miller [mailto:[hidden 
  email]http://user/SendEmail.jtp?type=nodenode=4050075i=1]

  Sent: Fri 22/03/2013 06:32
  To: [hidden email]http://user/SendEmail.jtp?type=nodenode=4050075i=2
  Subject: Re: Solr 4.2 - Slave Index version is higher than Master
 
 
 
  The other odd thing here is that this should not stop replication at
 all. When the slave is ahead, it will still have it's index replaced.
 
  - Mark
 
  On Mar 22, 2013, at 1:26 AM, Mark Miller [hidden 
  email]http://user/SendEmail.jtp?type=nodenode=4050075i=3
 wrote:
 
  I'm working on testing to try and catch what you are seeing here:
 https://issues.apache.org/jira/browse/SOLR-4629
 
  - Mark
 
  On Mar 22, 2013, at 12:23 AM, Mark Miller [hidden 
  email]http://user/SendEmail.jtp?type=nodenode=4050075i=4
 wrote:
 
  Let me know if there is anything else you can add.
 
  A test with your setup that index n docs randomly, commits, randomly
 updates a conf file or not, and then replicates and repeats x times does
 not seem to fail, even with very high values for n and x. On every
 replication, the versions are compared.
 
  Is there anything else you are putting into this mix?
 
  - Mark
 
  On Mar 21, 2013, at 11:28 PM, Uomesh [hidden 
  email]http://user/SendEmail.jtp?type=nodenode=4050075i=5
 wrote:
 
  Thank you!!,
 
  Attached is my master solrconfig.xml. I have few custom handlers
 which you
  might need to remove. In custom handler i have not much code just
 adding
  some custom data for UI.
 
  Thanks,
  Umesh
 
  On Thu, Mar 21, 2013 at 9:59 PM, Mark Miller-3 [via Lucene]
  [hidden email]http://user/SendEmail.jtp?type=nodenode=4050075i=6
  mible.com [hidden 
  email]http://user/SendEmail.jtp?type=nodenode=4050075i=7
 wrote:
 
  Could you attach the master as well?
 
  - Mark
 
  On Mar 21, 2013, at 4:36 PM, Uomesh [hidden email]
 http://user/SendEmail.jtp?type=nodenode=4049933i=0
  wrote:
 
  Hi Mark,
 
  Attached is my solrconfig_slave.xml. My replication interval is 1
  minute(default).
 
  Please let me know if you need any more config details
 
  Thanks,
  umesh
 
  On Thu, Mar 21, 2013 at 3:19 PM, Mark Miller-3 [via Lucene] 
  [hidden email] 
 http://user/SendEmail.jtp?type=nodenode=4049933i=1
  wrote:
 
  Can you give more details about your configuration and setup?
 
  Our best bet is to try and recreate this with a unit test.
 
  - Mark
 
  On Mar 21, 2013, at 4:08 PM, Uomesh [hidden email]
  http://user/SendEmail.jtp?type=nodenode=4049832i=0
  wrote:
 
  Hi,
 
  I am seeing an issue after upgrading from solr 3.6.2 to Solr 4.2.
 My
  Slave
  stop replicating after sometime. And it seems issue is because of
 my
  Slave
  Index version is higher than master. How could it be possible to
 Slave
  Index
  version is higher than master? Please help me. IS there anything
 i
  need
  to
  remove from my slave solrconfig.xml.
 
  Index Version Gen Size
  Master: 1363893820575 93 8.75 MB
  Slave: 1363896006624 94 8.75 MB
 
  Thanks,
  Umesh
 
 
 
  --
  View this message in context:
 
 
 http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
  --
  If you reply to this email, your message will be added to the
  discussion
  below:
 
 
 
 http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827p4049832.html
  To unsubscribe from Solr 4.2 - Slave Index version is higher than
  Master, click
  here
  .
  NAML
 
 

Re: Solr 4.2 - Slave Index version is higher than Master

2013-03-22 Thread Uomesh
Also, I am replicating only on commit and startup.

Thanks,
Umesh

On Fri, Mar 22, 2013 at 11:23 AM, Umesh Sharma uom...@gmail.com wrote:

 Hi Mrk,

 I am replicating below config files but not replicating solrconfig.xml.

 confFiles: schema.xml, elevate.xml, stopwords.txt,
 mapping-FoldToASCII.txt, mapping-ISOLatin1Accent.txt, protwords.txt,
 spellings.txt, synonyms.txt


 also strange I am seeing big Gen difference between Master and slave. My
 master slave is 2 while Slave is 56. If i do the full import then the Gen
 is getting higher then slave one and its replicating. i have more than 30
 cores on my solr instance and all are scheduled to replicate on same time.

  Index Version Gen Size Master:
 1363903243590
 2
 94 bytes
 Slave:
 1363967579193
 56
 94 bytes

 Thanks,
 Umesh


 On Fri, Mar 22, 2013 at 10:42 AM, Mark Miller-3 [via Lucene] 
 ml-node+s472066n4050075...@n3.nabble.com wrote:

 Are you replicating configuration files as well?

 - Mark

 On Mar 22, 2013, at 6:38 AM, John, Phil (CSS) [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4050075i=0
 wrote:

  To add to the discussion.
 
  We're running classic master/slave replication (not solrcloud) with 1
 master and 2 slaves and I noticed the slave having a higher version number
 than the master the other day as well.
 
  In our case, knock on wood, it hasn't stopped replication.
 
  If you'd like a copy of our config I can provide off-list.
 
  Regards,
 
  Phil.
 
  
 
  From: Mark Miller [mailto:[hidden 
  email]http://user/SendEmail.jtp?type=nodenode=4050075i=1]

  Sent: Fri 22/03/2013 06:32
  To: [hidden email]http://user/SendEmail.jtp?type=nodenode=4050075i=2
  Subject: Re: Solr 4.2 - Slave Index version is higher than Master
 
 
 
  The other odd thing here is that this should not stop replication at
 all. When the slave is ahead, it will still have it's index replaced.
 
  - Mark
 
  On Mar 22, 2013, at 1:26 AM, Mark Miller [hidden 
  email]http://user/SendEmail.jtp?type=nodenode=4050075i=3
 wrote:
 
  I'm working on testing to try and catch what you are seeing here:
 https://issues.apache.org/jira/browse/SOLR-4629
 
  - Mark
 
  On Mar 22, 2013, at 12:23 AM, Mark Miller [hidden 
  email]http://user/SendEmail.jtp?type=nodenode=4050075i=4
 wrote:
 
  Let me know if there is anything else you can add.
 
  A test with your setup that index n docs randomly, commits, randomly
 updates a conf file or not, and then replicates and repeats x times does
 not seem to fail, even with very high values for n and x. On every
 replication, the versions are compared.
 
  Is there anything else you are putting into this mix?
 
  - Mark
 
  On Mar 21, 2013, at 11:28 PM, Uomesh [hidden 
  email]http://user/SendEmail.jtp?type=nodenode=4050075i=5
 wrote:
 
  Thank you!!,
 
  Attached is my master solrconfig.xml. I have few custom handlers
 which you
  might need to remove. In custom handler i have not much code just
 adding
  some custom data for UI.
 
  Thanks,
  Umesh
 
  On Thu, Mar 21, 2013 at 9:59 PM, Mark Miller-3 [via Lucene]
  [hidden email]http://user/SendEmail.jtp?type=nodenode=4050075i=6
  mible.com [hidden 
  email]http://user/SendEmail.jtp?type=nodenode=4050075i=7
 wrote:
 
  Could you attach the master as well?
 
  - Mark
 
  On Mar 21, 2013, at 4:36 PM, Uomesh [hidden email]
 http://user/SendEmail.jtp?type=nodenode=4049933i=0
  wrote:
 
  Hi Mark,
 
  Attached is my solrconfig_slave.xml. My replication interval is 1
  minute(default).
 
  Please let me know if you need any more config details
 
  Thanks,
  umesh
 
  On Thu, Mar 21, 2013 at 3:19 PM, Mark Miller-3 [via Lucene] 
  [hidden email] 
 http://user/SendEmail.jtp?type=nodenode=4049933i=1
  wrote:
 
  Can you give more details about your configuration and setup?
 
  Our best bet is to try and recreate this with a unit test.
 
  - Mark
 
  On Mar 21, 2013, at 4:08 PM, Uomesh [hidden email]
  http://user/SendEmail.jtp?type=nodenode=4049832i=0
  wrote:
 
  Hi,
 
  I am seeing an issue after upgrading from solr 3.6.2 to Solr
 4.2. My
  Slave
  stop replicating after sometime. And it seems issue is because
 of my
  Slave
  Index version is higher than master. How could it be possible to
 Slave
  Index
  version is higher than master? Please help me. IS there anything
 i
  need
  to
  remove from my slave solrconfig.xml.
 
  Index Version Gen Size
  Master: 1363893820575 93 8.75 MB
  Slave: 1363896006624 94 8.75 MB
 
  Thanks,
  Umesh
 
 
 
  --
  View this message in context:
 
 
 http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
  --
  If you reply to this email, your message will be added to the
  discussion
  below:
 
 
 
 http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827p4049832.html
  To unsubscribe from Solr 4.2 - Slave Index version is higher than
  

NoSuchMethodError updateDocument

2013-03-22 Thread Furkan KAMACI
I use Solr 4.1.0 and Nutch 2.1, Java 1.7.0_17, Tomcat 7.0, Intellij IDEA
12.with a Centos 6.4 at my 64 bit computer.

I run that command succesfully:

bin/nutch solrindex http://localhost:8080/solr -index

However when I run that command:

bin/nutch solrindex http://localhost:8080/solr -reindex

I get that error :

Mar 22, 2013 6:48:27 PM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.RuntimeException: java.lang.NoSuchMethodError:
org.apache.lucene.index.IndexWriter.updateDocument(Lorg/apache/lucene/index/Term;Lorg/apache/lucene/index/IndexDocument;Lorg/apache/lucene/analysis/Analyzer;)V
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:653)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:366)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:936)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.NoSuchMethodError:
org.apache.lucene.index.IndexWriter.updateDocument(Lorg/apache/lucene/index/Term;Lorg/apache/lucene/index/IndexDocument;Lorg/apache/lucene/analysis/Analyzer;)V
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:201)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:451)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:587)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:346)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
... 16 more


Re: Solr 4.2 - Slave Index version is higher than Master

2013-03-22 Thread Mark Miller
And your also on 4.2?

- Mark

On Mar 22, 2013, at 12:41 PM, Uomesh uom...@gmail.com wrote:

 Also, I am replicating only on commit and startup.
 
 Thanks,
 Umesh
 
 On Fri, Mar 22, 2013 at 11:23 AM, Umesh Sharma uom...@gmail.com wrote:
 
 Hi Mrk,
 
 I am replicating below config files but not replicating solrconfig.xml.
 
 confFiles: schema.xml, elevate.xml, stopwords.txt,
 mapping-FoldToASCII.txt, mapping-ISOLatin1Accent.txt, protwords.txt,
 spellings.txt, synonyms.txt
 
 
 also strange I am seeing big Gen difference between Master and slave. My
 master slave is 2 while Slave is 56. If i do the full import then the Gen
 is getting higher then slave one and its replicating. i have more than 30
 cores on my solr instance and all are scheduled to replicate on same time.
 
 Index Version Gen Size Master:
 1363903243590
 2
 94 bytes
 Slave:
 1363967579193
 56
 94 bytes
 
 Thanks,
 Umesh
 
 
 On Fri, Mar 22, 2013 at 10:42 AM, Mark Miller-3 [via Lucene] 
 ml-node+s472066n4050075...@n3.nabble.com wrote:
 
 Are you replicating configuration files as well?
 
 - Mark
 
 On Mar 22, 2013, at 6:38 AM, John, Phil (CSS) [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4050075i=0
 wrote:
 
 To add to the discussion.
 
 We're running classic master/slave replication (not solrcloud) with 1
 master and 2 slaves and I noticed the slave having a higher version number
 than the master the other day as well.
 
 In our case, knock on wood, it hasn't stopped replication.
 
 If you'd like a copy of our config I can provide off-list.
 
 Regards,
 
 Phil.
 
 
 
 From: Mark Miller [mailto:[hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4050075i=1]
 
 Sent: Fri 22/03/2013 06:32
 To: [hidden email]http://user/SendEmail.jtp?type=nodenode=4050075i=2
 Subject: Re: Solr 4.2 - Slave Index version is higher than Master
 
 
 
 The other odd thing here is that this should not stop replication at
 all. When the slave is ahead, it will still have it's index replaced.
 
 - Mark
 
 On Mar 22, 2013, at 1:26 AM, Mark Miller [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4050075i=3
 wrote:
 
 I'm working on testing to try and catch what you are seeing here:
 https://issues.apache.org/jira/browse/SOLR-4629
 
 - Mark
 
 On Mar 22, 2013, at 12:23 AM, Mark Miller [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4050075i=4
 wrote:
 
 Let me know if there is anything else you can add.
 
 A test with your setup that index n docs randomly, commits, randomly
 updates a conf file or not, and then replicates and repeats x times does
 not seem to fail, even with very high values for n and x. On every
 replication, the versions are compared.
 
 Is there anything else you are putting into this mix?
 
 - Mark
 
 On Mar 21, 2013, at 11:28 PM, Uomesh [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4050075i=5
 wrote:
 
 Thank you!!,
 
 Attached is my master solrconfig.xml. I have few custom handlers
 which you
 might need to remove. In custom handler i have not much code just
 adding
 some custom data for UI.
 
 Thanks,
 Umesh
 
 On Thu, Mar 21, 2013 at 9:59 PM, Mark Miller-3 [via Lucene]
 [hidden email]http://user/SendEmail.jtp?type=nodenode=4050075i=6
 mible.com [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4050075i=7
 wrote:
 
 Could you attach the master as well?
 
 - Mark
 
 On Mar 21, 2013, at 4:36 PM, Uomesh [hidden email]
 http://user/SendEmail.jtp?type=nodenode=4049933i=0
 wrote:
 
 Hi Mark,
 
 Attached is my solrconfig_slave.xml. My replication interval is 1
 minute(default).
 
 Please let me know if you need any more config details
 
 Thanks,
 umesh
 
 On Thu, Mar 21, 2013 at 3:19 PM, Mark Miller-3 [via Lucene] 
 [hidden email] 
 http://user/SendEmail.jtp?type=nodenode=4049933i=1
 wrote:
 
 Can you give more details about your configuration and setup?
 
 Our best bet is to try and recreate this with a unit test.
 
 - Mark
 
 On Mar 21, 2013, at 4:08 PM, Uomesh [hidden email]
 http://user/SendEmail.jtp?type=nodenode=4049832i=0
 wrote:
 
 Hi,
 
 I am seeing an issue after upgrading from solr 3.6.2 to Solr
 4.2. My
 Slave
 stop replicating after sometime. And it seems issue is because
 of my
 Slave
 Index version is higher than master. How could it be possible to
 Slave
 Index
 version is higher than master? Please help me. IS there anything
 i
 need
 to
 remove from my slave solrconfig.xml.
 
 Index Version Gen Size
 Master: 1363893820575 93 8.75 MB
 Slave: 1363896006624 94 8.75 MB
 
 Thanks,
 Umesh
 
 
 
 --
 View this message in context:
 
 
 http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 --
 If you reply to this email, your message will be added to the
 discussion
 below:
 
 
 
 http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827p4049832.html
 To unsubscribe from Solr 4.2 - Slave 

Re: strange behaviour of wordbreak spellchecker in solr cloud

2013-03-22 Thread alxsss
Hello,


Further investigation shows the following pattern, for both DirectIndex and 
wordbreak spellchekers.

Assume that in all cases there are spellchecker results when distrib=false

In distributed mode (distrib=true)
  case when matches=0
1. group=true,  no spellcheck results

2. group=false , there are spellcheck results

  case when matches0
1. group=true, there are spellcheck results
2. group =false, there are spellcheck results


Do these constitute a failing test case?

Thanks.
Alex.

 

 

-Original Message-
From: alxsss alx...@aim.com
To: solr-user solr-user@lucene.apache.org
Sent: Thu, Mar 21, 2013 6:50 pm
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud



Hello,

I am debugging the SpellCheckComponent#finishStage. 
 
From the responses I see that not only wordbreak, but also directSpellchecker 
does not return some results in distributed mode. 
The request handler I was using had 

str name=grouptrue/str


So, I desided to turn of grouping and I see spellcheck results in distributed 
mode.


curl 
'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandler'
has no spellchek results 
but

curl 
'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandler
group=false'
returns results.

So, the conclusion is that grouping causes the distributed spellcheker to fail.

Could please you point me to the class that may be responsible to this issue?

Thanks.
Alex.
 




-Original Message-
From: Dyer, James james.d...@ingramcontent.com
To: solr-user solr-user@lucene.apache.org
Sent: Thu, Mar 21, 2013 11:23 am
Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud


The shard responses get combined in SpellCheckComponent#finishStage .  I highly 
recommend you file a JIRA bug report for this at 
https://issues.apache.org/jira/browse/SOLR 

.  If you write a failing unit test, it would make it much more likely that 
others would help you with a fix.  Of course, if you solve the issue entirely, 
a 

patch would be much appreciated.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: alx...@aim.com [mailto:alx...@aim.com]
Sent: Thursday, March 21, 2013 12:45 PM
To: solr-user@lucene.apache.org
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud

Hello,

We need this feature be fixed ASAP. So, please let me know which class is 
responsible for combining spellcheck results from all shards. I will try to 
debug the code.

Thanks in advance.
Alex.







-Original Message-
From: alxsss alx...@aim.com
To: solr-user solr-user@lucene.apache.org
Sent: Tue, Mar 19, 2013 11:34 am
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud


-- distributed environment.  But to nail it down, we probably need to see both
-- the applicable requestHandler /

Not sure what this is?

I have

 searchComponent name=spellcheck class=solr.SpellCheckComponent

str name=queryAnalyzerFieldTypespell/str

!-- Multiple Spell Checkers can be declared and used by this
 component
  --

!-- a spellchecker built from a field of the main index --
lst name=spellchecker
  str name=namedirect/str
  str name=fieldspell/str
  str name=classnamesolr.DirectSolrSpellChecker/str
  !-- the spellcheck distance measure used, the default is the internal
levenshtein --
  str name=distanceMeasureinternal/str
  !-- minimum accuracy needed to be considered a valid spellcheck
suggestion --
  float name=accuracy0.5/float
  !-- the maximum #edits we consider when enumerating terms: can be 1 or 2
--
  int name=maxEdits2/int
  !-- the minimum shared prefix when enumerating terms --
  int name=minPrefix1/int
  !-- maximum number of inspections per result. --
  int name=maxInspections5/int
  !-- minimum length of a query term to be considered for correction --
  int name=minQueryLength4/int
  !-- maximum threshold of documents a query term can appear to be
considered for correction --
  float name=maxQueryFrequency0.01/float
  !-- uncomment this to require suggestions to occur in 1% of the documents
float name=thresholdTokenFrequency.01/float
  --
/lst

!-- a spellchecker that can break or combine words.  See /spell handler
below for usage --
lst name=spellchecker
  str name=namewordbreak/str
  str name=classnamesolr.WordBreakSolrSpellChecker/str
  str name=fieldspell/str
  str name=combineWordstrue/str
  str name=breakWordstrue/str
  int name=maxChanges10/int
/lst

!-- a spellchecker that uses a different distance measure --
!--
   lst name=spellchecker
 str name=namejarowinkler/str
 str name=fieldspell/str
 str name=classnamesolr.DirectSolrSpellChecker/str
 str name=distanceMeasure
   org.apache.lucene.search.spell.JaroWinklerDistance
 /str
   /lst
 --
 

Re: Solr 4.2, reindexing, transaction logs, high memory usage

2013-03-22 Thread Shawn Heisey

On 3/22/2013 9:24 AM, Raghav Karol wrote:

We run this index in 8 solr sharded in 8 solr cores on a single host
an m2.4xlarge EC2 instances. We do not use zookeeper (because of
operational issues on our live indexes) and manage the sharding
ourselves.

For this index we run with -Xmx30G and observe in (jsconsole) that the
solr runs with approximately 25G.
Autocommit kills solr, it sends heap memory usage to max and kills
solr. The reason appears to be committing to all cores in parallel.
Disabling autoCommit and  running a loop like
 while(true); do for i in $(seq 0 7); do curl -s
http://localhost:8085/solr/core${i}/update?commit=truewt=json; done

produces:

{responseHeader:{status:0,QTime:8297}}
{responseHeader:{status:0,QTime:8358}}
{responseHeader:{status:0,QTime:9552}}
{responseHeader:{status:0,QTime:8368}}
{responseHeader:{status:0,QTime:9296}}
{responseHeader:{status:0,QTime:8527}}
{responseHeader:{status:0,QTime:9458}}
{responseHeader:{status:0,QTime:8929}}

8 seconds to process a commit where with no changes to the index!?!


If this index is actively processing queries, then what you are 
experiencing here is probably cache warming - Solr looks at the entries 
in each of its caches and uses those entries to run queries against the 
new index to pre-populate the new caches.  The number of entries that 
are used for warming queries will be controlled by the autoWarmCount 
value on the cache definition.



Why does solr need such a large heap space for this index (it dies
with 10G and 20G and is constant at 28G in jconsole)?
Why does running a commits in parallel via autoCommit or the command
exhaust the memory?
Are we using dynamic fields incorrectly?


When you run a commit, Solr fires up a new index searcher object, 
complete with caches, which will then be autowarmed from the old caches 
as described above.  Until the new object is fully warmed, the old 
searcher will exist and will continue to serve queries.  If you issue 
another commit while a new searcher is already warming, then *another* 
searcher is likely to get fired up as well, depending on the value of 
maxWarmingSearchers in your solrconfig.xml file.


The amount of memory required by a searcher can be very high, due in 
part to caches, especially the FieldCache, which is used internally by 
Lucene and is not configurable like the others.  If you have 8 cores and 
you run commits on them in parallel that take several seconds, then for 
several seconds you will have at least sixteen searchers running.  If 
your maxWarmingSearchers value is higher than 1, you might end up with 
even more searchers running at the same time.  This is likely where your 
memory is going.


By lowering the autoWarmCount values on your caches, you can reduce the 
amount of time it takes to do a commit.  You should also keep track of 
whether anything has actually changed on each core and don't issue a 
commit when nothing has changed.  Also, it would be a good idea to 
stagger the commits so that all your cores are not committing at the 
same time.


Thanks,
Shawn



Re: Slow queries for common terms

2013-03-22 Thread Tom Burton-West
Hi David and Jan,

I wrote the blog post, and David, you are right, the problem we had was
with phrase queries because our positions lists are so huge.  Boolean
queries don't need to read the positions lists.   I think you need to
determine whether you are CPU bound or I/O bound.It is possible that
you are I/O bound and reading the term frequency postings for 90 million
docs is taking a long time.  In that case, More memory in the machine (but
not dedicated to Solr) might help because Solr relies on OS disk caching
for caching the postings lists.  You would still need to do some cache
warming with your most common terms.

On the other hand as Jan pointed out, you may be cpu bound because Solr
doesn't have early termination and has to rank all 90 million docs in order
to show the top 10 or 25.

Did you try the OR search to see if your CPU is at 100%?

Tom

On Fri, Mar 22, 2013 at 10:14 AM, Jan Høydahl jan@cominvent.com wrote:

 Hi

 There might not be a final cure with more RAM if you are CPU bound.
 Scoring 90M docs is some work. Can you check what's going on during those
 15 seconds? Is your CPU at 100%? Try an (foo OR bar OR baz) search which
 generates 100mill hits and see if that is slow too, even if you don't use
 frequent words.

 I'm sure you can find other frequent terms in your corpus which display
 similar behaviour, words which are even more frequent than book. Are you
 using AND as default operator? You will benefit from limiting the number
 of results as much as possible.

 The real solution is to shard across N number of servers, until you reach
 the desired performance for the desired indexing/querying load.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com




Re: DocValues and field requirements

2013-03-22 Thread Chris Hostetter

: Thank you for your response. Yes, that's strange. By enabling DocValues the
: information about missing fields is lost, which changes the way of sorting
: as well. Adding default value to the fields can change a logic of
: application dramatically (I can't set default value to 0 for all
: Trie*Fields fields, because it could impact the results displayed to the
: end user, which is not good). It's a pity that using DocValues is so
: limited.

I'm not really up on docvalues, but i asked rmuir about this a bit on IRC

the crux of the issue is that there are two differnet docvalue impls, one 
that uses a fixed amount of space per doc (ie: exactly one value per doc) 
and one that alloaws an ordered set of values per doc (ie: multivalued).

the multivalued docvals impl was wired into solr for multivalued fields, 
and the single valued docvals impl was wired in for hte single valued case 
-- but since since the single valued docvals impl *has* to have a value 
for every doc, the schema error you encountered was added if you try to 
use it on a field that isn't required or doesn't have a default value -- 
to force you to be explicit about which default you want, instead of hte 
low level lucene 0 default coming into play w/o you knowing about it. 
(as Shawn mentioned)

the multivalued docvals impl could concivably be used instead for these 
types of single valued fields (ie: to support 0 or 1 values) but there is 
no sorting support for multivalued docvals, so it would cause other 
problems.

One possible workarround for people who want to take advantage of sort 
missing first/last type sorting on a docvals type field would be to mange 
the missing information yourself in a distinct field which you also 
leveraged in any filtering or sorting on the docvals field.

ie, have a docvalues field myfield which is single valued, with some 
configured default value, and then have a myfield_exists boolean field 
which is single valued and required.  when indexing docs, if myfield 
does/doesn't have a value set myfield_exists to accordingly (this would 
be fairly trivial in an updated processor) and then instead of sorting 
just on myfield desc you would sort on myfield_exists (asc|desc), 
myfield desc (where you pick hte asc or desc depending on wether you want 
docs w/o values first or last).  you would likewise need to filter on 
myfield_exists:true anytime you did queries against the myfield field.


(perhaps someoen could work on patch to inject a synthetic field like this 
automatically for fields that are docValues=true multiValued=false 
required=false w/o a defualtValue?)


-Hoss


Re: Can we manipulate termfreq to count as 1 for multiple matches?

2013-03-22 Thread Chris Hostetter

: parameter *omitTermFreqAndPositions*

the key thing to remember being: if you use this, then by omiting 
positions you can no longer do phrase queries.

: or you can use a custom similarity class that overrides the term freq and
: return one for only that field.
: http://wiki.apache.org/solr/SchemaXml#Similarity

There is actaully a SImilarity class already written designed to target 
this specific problem of keyword spamming in text fields...

:  Document_1
:  Name = Blue Jeans
:  Description = This jeans is very soft.  Jeans is pretty nice.
: 
:  Now, If I Search for Jeans then Jeans is found in 2 places in
:  Description field.

...first off, it's important to remember that 'tf' doesn't afect things in 
isolation -- usually there is also a lenghtNorm factor that would 
penalize the score of that document compared to another one that had a 
short description that only included the word Jeans once (ie: These are 
Red Jeans)

Using the SweetSpotSimilarity, you can specify target values identifying 
what ideal values (ie: sweet spot) you anticipate in a typical document 
for both the tf and lengthNorm ... 

https://lucene.apache.org/solr/4_2_0/solr-core/org/apache/solr/search/similarities/SweetSpotSimilarityFactory.html
https://lucene.apache.org/core/4_2_0/misc/org/apache/lucene/misc/SweetSpotSimilarity.html

...so if you want to say that 1 to 4 instances of the term are equally 
good, and above that start to reward docs more you could configure the tf 
function to do that.

(If you really want the same tf() scoring factor for all docs, regardless 
on how many times the term is mentioned -- then you would need to write 
your own SImilarity subclass at the moment)

-Hoss


Re: transientCacheSize not working

2013-03-22 Thread didier deshommes
I've created an issue and patch here that makes it possible to specify
transient and loadOnStatup on core creation:
https://issues.apache.org/jira/browse/SOLR-4631


On Wed, Mar 20, 2013 at 10:14 AM, didier deshommes dfdes...@gmail.comwrote:

 Thanks. Is there a way to pass loadOnStartup and/or transient as
 parameters to the core admin http api? This doesn't seem to work: curl
 http://localhost:8983/solr/admin/cores?action=CREATEtransient=truename=c1


 On Tue, Mar 19, 2013 at 7:29 PM, Mark Miller markrmil...@gmail.comwrote:

 I don't think SolrCloud works with the transient stuff.

 - Mark

 On Mar 19, 2013, at 8:04 PM, didier deshommes dfdes...@gmail.com wrote:

  Hi,
  I cannot get Solrcloud to respect transientCacheSize when creating
 multiple
  cores via the web api. I'm runnig solr 4.2 like this:
 
  java -Dbootstrap_confdir=./solr/collection1/conf
  -Dcollection.configName=conf1 -DzkRun -DnumShards=1 -jar start.jar
 
  I'm creating multiple cores via the core admin http api:
  curl http://localhost:8983/solr/admin/cores?action=CREATEname=tmp1
  curl http://localhost:8983/solr/admin/cores?action=CREATEname=tmp2
  curl http://localhost:8983/solr/admin/cores?action=CREATEname=tmp3
 
  My solr.xml looks like:
 
  ?xml version=1.0 encoding=UTF-8 ?
  solr persistent=true
   cores transientCacheSize=2 adminPath=/admin/cores
 shareSchema=true
  zkClientTimeout=${zkClientTimeout:15000} hostPort=8983
  hostContext=solr
 /cores
  /solr
 
  When I list all cores currently loaded, via curl
  http://localhost:8983/solr/admin/cores?action=status , I notice that
 all 3
  cores are still running, even though transientCacheSize is 2. Can anyone
  tell me why that is?
 
  Also, is there a way to pass loadOnStartup and transient to the core
 admin
  http api? Specifying these when creating a core doesn't seem to work:
 curl
  http://localhost:8983/solr/admin/cores?action=CREATEtransient=true
 
  Thanks,
  didier





Re: how to get term vector information of sepcific word/position in field

2013-03-22 Thread Chris Hostetter

: is there any way, if i can get term vector information of specific word
: only, like i can pass the word, and it will just return term position and
: frequency for that word only?
: 
: and also if i can pass the position e.g. startPosition=5 and endPosition=10;
: then it will return terms, positions and frequency of words which are there
: occurred inbeween start and end postion.

I don't think either of these are available out of hte box, but you could 
probably modify the code in TermVectoryComponent that iterates over terms 
to filter what it adds to the response based on explicitly bassed in 
term startPos and endPos params.

It would not only cut down on the total data being returned, but since you 
can do a seek on a TermsEnum limiting that way should speed up hte 
processing as well.  i don't think you can seek on term positions 
however, so you'd still have to iterate over all the positions until you 
found the startPos, but bailing out once you reach the endPos may save 
some time as well.

If you do go this route, by all means please submit a patch in jira, it 
could be handy for other TVC users...

https://wiki.apache.org/solr/HowToContribute
https://issues.apache.org/jira/browse/SOLR


-Hoss


Re: Solr 4.2 - Slave Index version is higher than Master

2013-03-22 Thread Mark Miller
That was to you Phil.

So it seems this is a problem with the configuration replication case I would 
guess - I didn't really look at that path in the 4.2 fixes I worked on.

I did add it to the new testing I'm doing since I've suspected it (it will 
prompt a core reload that doesn't happen when configs don't replicate). I'll 
see what I can do to try and get a test to catch it.

- mark

On Mar 22, 2013, at 1:49 PM, Mark Miller markrmil...@gmail.com wrote:

 And your also on 4.2?
 
 - Mark
 
 On Mar 22, 2013, at 12:41 PM, Uomesh uom...@gmail.com wrote:
 
 Also, I am replicating only on commit and startup.
 
 Thanks,
 Umesh
 
 On Fri, Mar 22, 2013 at 11:23 AM, Umesh Sharma uom...@gmail.com wrote:
 
 Hi Mrk,
 
 I am replicating below config files but not replicating solrconfig.xml.
 
 confFiles: schema.xml, elevate.xml, stopwords.txt,
 mapping-FoldToASCII.txt, mapping-ISOLatin1Accent.txt, protwords.txt,
 spellings.txt, synonyms.txt
 
 
 also strange I am seeing big Gen difference between Master and slave. My
 master slave is 2 while Slave is 56. If i do the full import then the Gen
 is getting higher then slave one and its replicating. i have more than 30
 cores on my solr instance and all are scheduled to replicate on same time.
 
 Index Version Gen Size Master:
 1363903243590
 2
 94 bytes
 Slave:
 1363967579193
 56
 94 bytes
 
 Thanks,
 Umesh
 
 
 On Fri, Mar 22, 2013 at 10:42 AM, Mark Miller-3 [via Lucene] 
 ml-node+s472066n4050075...@n3.nabble.com wrote:
 
 Are you replicating configuration files as well?
 
 - Mark
 
 On Mar 22, 2013, at 6:38 AM, John, Phil (CSS) [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4050075i=0
 wrote:
 
 To add to the discussion.
 
 We're running classic master/slave replication (not solrcloud) with 1
 master and 2 slaves and I noticed the slave having a higher version number
 than the master the other day as well.
 
 In our case, knock on wood, it hasn't stopped replication.
 
 If you'd like a copy of our config I can provide off-list.
 
 Regards,
 
 Phil.
 
 
 
 From: Mark Miller [mailto:[hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4050075i=1]
 
 Sent: Fri 22/03/2013 06:32
 To: [hidden email]http://user/SendEmail.jtp?type=nodenode=4050075i=2
 Subject: Re: Solr 4.2 - Slave Index version is higher than Master
 
 
 
 The other odd thing here is that this should not stop replication at
 all. When the slave is ahead, it will still have it's index replaced.
 
 - Mark
 
 On Mar 22, 2013, at 1:26 AM, Mark Miller [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4050075i=3
 wrote:
 
 I'm working on testing to try and catch what you are seeing here:
 https://issues.apache.org/jira/browse/SOLR-4629
 
 - Mark
 
 On Mar 22, 2013, at 12:23 AM, Mark Miller [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4050075i=4
 wrote:
 
 Let me know if there is anything else you can add.
 
 A test with your setup that index n docs randomly, commits, randomly
 updates a conf file or not, and then replicates and repeats x times does
 not seem to fail, even with very high values for n and x. On every
 replication, the versions are compared.
 
 Is there anything else you are putting into this mix?
 
 - Mark
 
 On Mar 21, 2013, at 11:28 PM, Uomesh [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4050075i=5
 wrote:
 
 Thank you!!,
 
 Attached is my master solrconfig.xml. I have few custom handlers
 which you
 might need to remove. In custom handler i have not much code just
 adding
 some custom data for UI.
 
 Thanks,
 Umesh
 
 On Thu, Mar 21, 2013 at 9:59 PM, Mark Miller-3 [via Lucene]
 [hidden email]http://user/SendEmail.jtp?type=nodenode=4050075i=6
 mible.com [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4050075i=7
 wrote:
 
 Could you attach the master as well?
 
 - Mark
 
 On Mar 21, 2013, at 4:36 PM, Uomesh [hidden email]
 http://user/SendEmail.jtp?type=nodenode=4049933i=0
 wrote:
 
 Hi Mark,
 
 Attached is my solrconfig_slave.xml. My replication interval is 1
 minute(default).
 
 Please let me know if you need any more config details
 
 Thanks,
 umesh
 
 On Thu, Mar 21, 2013 at 3:19 PM, Mark Miller-3 [via Lucene] 
 [hidden email] 
 http://user/SendEmail.jtp?type=nodenode=4049933i=1
 wrote:
 
 Can you give more details about your configuration and setup?
 
 Our best bet is to try and recreate this with a unit test.
 
 - Mark
 
 On Mar 21, 2013, at 4:08 PM, Uomesh [hidden email]
 http://user/SendEmail.jtp?type=nodenode=4049832i=0
 wrote:
 
 Hi,
 
 I am seeing an issue after upgrading from solr 3.6.2 to Solr
 4.2. My
 Slave
 stop replicating after sometime. And it seems issue is because
 of my
 Slave
 Index version is higher than master. How could it be possible to
 Slave
 Index
 version is higher than master? Please help me. IS there anything
 i
 need
 to
 remove from my slave solrconfig.xml.
 
 Index Version Gen Size
 Master: 1363893820575 93 8.75 MB
 Slave: 1363896006624 94 8.75 MB
 
 Thanks,
 Umesh
 
 
 
 --
 View this 

RE: strange behaviour of wordbreak spellchecker in solr cloud

2013-03-22 Thread Dyer, James
Alex,

I added your comments to SOLR-3758 
(https://issues.apache.org/jira/browse/SOLR-3758) , which seems to me to be the 
very same issue.

If you need this to work now and if you cannot devise a fix yourself, then 
perhaps a workaround is if the query returns with 0 results, re-issue the query 
with rows=0group=false (you would omit all other optional components also). 
 This will give you back just a spell check result.  I realize this is not 
optimal because it requires the overhead of issuing 2 queries but if you do it 
only in instances the user gets nothing (or very little) back maybe it would be 
tolerable?  Then once a viable fix is devised you can remove the extra code 
from your application.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: alx...@aim.com [mailto:alx...@aim.com]
Sent: Friday, March 22, 2013 12:53 PM
To: solr-user@lucene.apache.org
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud

Hello,


Further investigation shows the following pattern, for both DirectIndex and 
wordbreak spellchekers.

Assume that in all cases there are spellchecker results when distrib=false

In distributed mode (distrib=true)
  case when matches=0
1. group=true,  no spellcheck results

2. group=false , there are spellcheck results

  case when matches0
1. group=true, there are spellcheck results
2. group =false, there are spellcheck results


Do these constitute a failing test case?

Thanks.
Alex.





-Original Message-
From: alxsss alx...@aim.com
To: solr-user solr-user@lucene.apache.org
Sent: Thu, Mar 21, 2013 6:50 pm
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud



Hello,

I am debugging the SpellCheckComponent#finishStage.

From the responses I see that not only wordbreak, but also directSpellchecker
does not return some results in distributed mode.
The request handler I was using had

str name=grouptrue/str


So, I desided to turn of grouping and I see spellcheck results in distributed
mode.


curl 
'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandler'
has no spellchek results
but

curl 
'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandler
group=false'
returns results.

So, the conclusion is that grouping causes the distributed spellcheker to fail.

Could please you point me to the class that may be responsible to this issue?

Thanks.
Alex.





-Original Message-
From: Dyer, James james.d...@ingramcontent.com
To: solr-user solr-user@lucene.apache.org
Sent: Thu, Mar 21, 2013 11:23 am
Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud


The shard responses get combined in SpellCheckComponent#finishStage .  I highly
recommend you file a JIRA bug report for this at 
https://issues.apache.org/jira/browse/SOLR

.  If you write a failing unit test, it would make it much more likely that
others would help you with a fix.  Of course, if you solve the issue entirely, a

patch would be much appreciated.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: alx...@aim.com [mailto:alx...@aim.com]
Sent: Thursday, March 21, 2013 12:45 PM
To: solr-user@lucene.apache.org
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud

Hello,

We need this feature be fixed ASAP. So, please let me know which class is
responsible for combining spellcheck results from all shards. I will try to
debug the code.

Thanks in advance.
Alex.







-Original Message-
From: alxsss alx...@aim.com
To: solr-user solr-user@lucene.apache.org
Sent: Tue, Mar 19, 2013 11:34 am
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud


-- distributed environment.  But to nail it down, we probably need to see both
-- the applicable requestHandler /

Not sure what this is?

I have

 searchComponent name=spellcheck class=solr.SpellCheckComponent

str name=queryAnalyzerFieldTypespell/str

!-- Multiple Spell Checkers can be declared and used by this
 component
  --

!-- a spellchecker built from a field of the main index --
lst name=spellchecker
  str name=namedirect/str
  str name=fieldspell/str
  str name=classnamesolr.DirectSolrSpellChecker/str
  !-- the spellcheck distance measure used, the default is the internal
levenshtein --
  str name=distanceMeasureinternal/str
  !-- minimum accuracy needed to be considered a valid spellcheck
suggestion --
  float name=accuracy0.5/float
  !-- the maximum #edits we consider when enumerating terms: can be 1 or 2
--
  int name=maxEdits2/int
  !-- the minimum shared prefix when enumerating terms --
  int name=minPrefix1/int
  !-- maximum number of inspections per result. --
  int name=maxInspections5/int
  !-- minimum length of a query term to be considered for correction --
  int name=minQueryLength4/int
  !-- maximum threshold of documents a query term 

Re: strange behaviour of wordbreak spellchecker in solr cloud

2013-03-22 Thread alxsss
Thanks.

I can fix this, but going over code it seems it is not easy to figure out where 
the whole request and response come from.

I followed up  SpellCheckComponent#finishStage
 

 and found out that SearchHandler#handleRequestBody calls this function. 
However, which part calls handleRequestBody and how its arguments are 
constructed is not clear.


Thanks.
Alex.

 

-Original Message-
From: Dyer, James james.d...@ingramcontent.com
To: solr-user solr-user@lucene.apache.org
Sent: Fri, Mar 22, 2013 2:08 pm
Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud


Alex,

I added your comments to SOLR-3758 
(https://issues.apache.org/jira/browse/SOLR-3758) 
, which seems to me to be the very same issue.

If you need this to work now and if you cannot devise a fix yourself, then 
perhaps a workaround is if the query returns with 0 results, re-issue the query 
with rows=0group=false (you would omit all other optional components also). 
 
This will give you back just a spell check result.  I realize this is not 
optimal because it requires the overhead of issuing 2 queries but if you do it 
only in instances the user gets nothing (or very little) back maybe it would be 
tolerable?  Then once a viable fix is devised you can remove the extra code 
from 
your application.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: alx...@aim.com [mailto:alx...@aim.com]
Sent: Friday, March 22, 2013 12:53 PM
To: solr-user@lucene.apache.org
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud

Hello,


Further investigation shows the following pattern, for both DirectIndex and 
wordbreak spellchekers.

Assume that in all cases there are spellchecker results when distrib=false

In distributed mode (distrib=true)
  case when matches=0
1. group=true,  no spellcheck results

2. group=false , there are spellcheck results

  case when matches0
1. group=true, there are spellcheck results
2. group =false, there are spellcheck results


Do these constitute a failing test case?

Thanks.
Alex.





-Original Message-
From: alxsss alx...@aim.com
To: solr-user solr-user@lucene.apache.org
Sent: Thu, Mar 21, 2013 6:50 pm
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud



Hello,

I am debugging the SpellCheckComponent#finishStage.

From the responses I see that not only wordbreak, but also directSpellchecker
does not return some results in distributed mode.
The request handler I was using had

str name=grouptrue/str


So, I desided to turn of grouping and I see spellcheck results in distributed
mode.


curl 
'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandler'
has no spellchek results
but

curl 
'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandler
group=false'
returns results.

So, the conclusion is that grouping causes the distributed spellcheker to fail.

Could please you point me to the class that may be responsible to this issue?

Thanks.
Alex.





-Original Message-
From: Dyer, James james.d...@ingramcontent.com
To: solr-user solr-user@lucene.apache.org
Sent: Thu, Mar 21, 2013 11:23 am
Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud


The shard responses get combined in SpellCheckComponent#finishStage .  I highly
recommend you file a JIRA bug report for this at 
https://issues.apache.org/jira/browse/SOLR

.  If you write a failing unit test, it would make it much more likely that
others would help you with a fix.  Of course, if you solve the issue entirely, a

patch would be much appreciated.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: alx...@aim.com [mailto:alx...@aim.com]
Sent: Thursday, March 21, 2013 12:45 PM
To: solr-user@lucene.apache.org
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud

Hello,

We need this feature be fixed ASAP. So, please let me know which class is
responsible for combining spellcheck results from all shards. I will try to
debug the code.

Thanks in advance.
Alex.







-Original Message-
From: alxsss alx...@aim.com
To: solr-user solr-user@lucene.apache.org
Sent: Tue, Mar 19, 2013 11:34 am
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud


-- distributed environment.  But to nail it down, we probably need to see both
-- the applicable requestHandler /

Not sure what this is?

I have

 searchComponent name=spellcheck class=solr.SpellCheckComponent

str name=queryAnalyzerFieldTypespell/str

!-- Multiple Spell Checkers can be declared and used by this
 component
  --

!-- a spellchecker built from a field of the main index --
lst name=spellchecker
  str name=namedirect/str
  str name=fieldspell/str
  str name=classnamesolr.DirectSolrSpellChecker/str
  !-- the spellcheck distance measure used, the default is the internal
levenshtein --
  str 

Re: Did something change with Payloads?

2013-03-22 Thread jimtronic
Ok, this is very bizzare.

If I insert more than one document at a time using the update handler like
so:

[{id:1,foo_ap:bar|50}},{id:2,foo_ap:bar|75}]

It actually stores the same payload value 50 for both docs.

That seems like a bug, no?

There was a core change in 4.1 to how payloads were stored. I'm wondering if
solr is not handling them properly?

Jim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Did-something-change-with-Payloads-tp4049561p4050599.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: overseer queue clogged

2013-03-22 Thread Gary Yngve
Thanks, Mark!

The core node names in the solr.xml in solr4.2 is great!  Maybe in 4.3 it
can be supported via API?

Also I am glad you mentioned in other post the chance to namespace
zookeeper by adding a path to the end of the comma-delim zk hosts.  That
works out really well in our situation for having zk serve multiple amazon
environments that go up and down independently of each other -- no issues
w/ shared clusterstate.json or overseers.

Regarding our original problem, we were able to restart all our shards but
one, which wasn't getting past
Mar 20, 2013 5:12:54 PM org.apache.solr.common.cloud.ZkStateReader$2 process
INFO: A cluster state change has occurred - updating...
Mar 20, 2013 5:12:54 PM org.apache.zookeeper.ClientCnxn$EventThread
processEvent
SEVERE: Error while calling watcher
java.lang.NullPointerException
at
org.apache.solr.common.cloud.ZkStateReader$2.process(ZkStateReader.java:201)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)

We ended up upgrading to solr4.2 and rebuilding the whole index from our
datastore.

-Gary


On Sat, Mar 16, 2013 at 9:51 AM, Mark Miller markrmil...@gmail.com wrote:

 Yeah, I don't know that I've ever tried with 4.0, but I've done this with
 4.1 and 4.2.

 - Mark

 On Mar 16, 2013, at 12:19 PM, Gary Yngve gary.yn...@gmail.com wrote:

  Cool, I'll need to try this.  I could have sworn that it didn't work that
  way in 4.0, but maybe my test was bunk.
 
  -g
 
 
  On Fri, Mar 15, 2013 at 9:41 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
  You can do this - just modify your starting Solr example to have no
 cores
  in solr.xml. You won't be able to make use of the admin UI until you
 create
  at least one core, but the core and collection apis will both work fine.




Re: overseer queue clogged

2013-03-22 Thread Mark Miller

On Mar 22, 2013, at 5:54 PM, Gary Yngve gary.yn...@gmail.com wrote:

 Thanks, Mark!
 
 The core node names in the solr.xml in solr4.2 is great!  Maybe in 4.3 it
 can be supported via API?

It is with the core admin api - do you mean the collections api? Please make a 
JIRA for any feature requests so they don't get lost!

 
 Also I am glad you mentioned in other post the chance to namespace
 zookeeper by adding a path to the end of the comma-delim zk hosts.  That
 works out really well in our situation for having zk serve multiple amazon
 environments that go up and down independently of each other -- no issues
 w/ shared clusterstate.json or overseers.
 
 Regarding our original problem, we were able to restart all our shards but
 one, which wasn't getting past
 Mar 20, 2013 5:12:54 PM org.apache.solr.common.cloud.ZkStateReader$2 process
 INFO: A cluster state change has occurred - updating...
 Mar 20, 2013 5:12:54 PM org.apache.zookeeper.ClientCnxn$EventThread
 processEvent
 SEVERE: Error while calling watcher
 java.lang.NullPointerException
at
 org.apache.solr.common.cloud.ZkStateReader$2.process(ZkStateReader.java:201)
at
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 
 We ended up upgrading to solr4.2 and rebuilding the whole index from our
 datastore.

Hmm…hopefully this issue has been addressed. Thanks for the stack trace, I'll 
use it to do some inspection.

- Mark

 
 -Gary
 
 
 On Sat, Mar 16, 2013 at 9:51 AM, Mark Miller markrmil...@gmail.com wrote:
 
 Yeah, I don't know that I've ever tried with 4.0, but I've done this with
 4.1 and 4.2.
 
 - Mark
 
 On Mar 16, 2013, at 12:19 PM, Gary Yngve gary.yn...@gmail.com wrote:
 
 Cool, I'll need to try this.  I could have sworn that it didn't work that
 way in 4.0, but maybe my test was bunk.
 
 -g
 
 
 On Fri, Mar 15, 2013 at 9:41 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
 You can do this - just modify your starting Solr example to have no
 cores
 in solr.xml. You won't be able to make use of the admin UI until you
 create
 at least one core, but the core and collection apis will both work fine.
 
 



RE: strange behaviour of wordbreak spellchecker in solr cloud

2013-03-22 Thread Dyer, James
Alex,

You may want to move over to the dev user's list now that you're working on 
code.  Or if you would rather not subscribe to the dev-list, add yourself as a 
watcher to SOLR-3758 and comment further there.  This will help us keep track 
on progress for the issue.

The short answer is that in a distributed set-up SpellCheckComponent (and 
others) work in 2 phases.  In the first phase, each shard is sent the request 
almost as if they were a complete (non-distributed) index each to its own self. 
 The difference is that an additional parameter is added to the request 
indicating that this is the first phase of a distributed request.  In 
SpellCheckComponent, it uses this knowledge to include additional information 
in the response that normally wouldn't go out to an end client.  The first 
phase calls the Component's process() method, just as would be done if this was 
a non-distributed call.

In the second phase, the initiating shard collects the response from all of the 
shards' process() methods and combines them.  This is where finishStage() is 
called.  So while process() runs in parallel on all of the shards, 
finishStage() runs only on the initiating shard, after the various shards have 
returned their responses.

The code you found in SearchHandler is what coordinates all of these 
activities.  It is very complicated code, but honestly you probably will not 
need to understand it to fix this.

What you probably will find is that each shard's process() returns the correct 
result, just as you get with your hand-done testing.  But somehow finishStage() 
does not properly combine the responses when grouping is involved.  It might be 
that the responses come back just a little differently and finishStage() cannot 
cope, or something along those lines.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: alx...@aim.com [mailto:alx...@aim.com]
Sent: Friday, March 22, 2013 4:31 PM
To: solr-user@lucene.apache.org
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud

Thanks.

I can fix this, but going over code it seems it is not easy to figure out where 
the whole request and response come from.

I followed up  SpellCheckComponent#finishStage


 and found out that SearchHandler#handleRequestBody calls this function. 
However, which part calls handleRequestBody and how its arguments are 
constructed is not clear.


Thanks.
Alex.



-Original Message-
From: Dyer, James james.d...@ingramcontent.com
To: solr-user solr-user@lucene.apache.org
Sent: Fri, Mar 22, 2013 2:08 pm
Subject: RE: strange behaviour of wordbreak spellchecker in solr cloud


Alex,

I added your comments to SOLR-3758 
(https://issues.apache.org/jira/browse/SOLR-3758)
, which seems to me to be the very same issue.

If you need this to work now and if you cannot devise a fix yourself, then
perhaps a workaround is if the query returns with 0 results, re-issue the query
with rows=0group=false (you would omit all other optional components also).
This will give you back just a spell check result.  I realize this is not
optimal because it requires the overhead of issuing 2 queries but if you do it
only in instances the user gets nothing (or very little) back maybe it would be
tolerable?  Then once a viable fix is devised you can remove the extra code from
your application.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: alx...@aim.com [mailto:alx...@aim.com]
Sent: Friday, March 22, 2013 12:53 PM
To: solr-user@lucene.apache.org
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud

Hello,


Further investigation shows the following pattern, for both DirectIndex and
wordbreak spellchekers.

Assume that in all cases there are spellchecker results when distrib=false

In distributed mode (distrib=true)
  case when matches=0
1. group=true,  no spellcheck results

2. group=false , there are spellcheck results

  case when matches0
1. group=true, there are spellcheck results
2. group =false, there are spellcheck results


Do these constitute a failing test case?

Thanks.
Alex.





-Original Message-
From: alxsss alx...@aim.com
To: solr-user solr-user@lucene.apache.org
Sent: Thu, Mar 21, 2013 6:50 pm
Subject: Re: strange behaviour of wordbreak spellchecker in solr cloud



Hello,

I am debugging the SpellCheckComponent#finishStage.

From the responses I see that not only wordbreak, but also directSpellchecker
does not return some results in distributed mode.
The request handler I was using had

str name=grouptrue/str


So, I desided to turn of grouping and I see spellcheck results in distributed
mode.


curl 
'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandler'
has no spellchek results
but

curl 
'server1:8983/solr/test/testhandler?q=paulusolesindent=truerows=10shards.qt=testhandler
group=false'
returns results.

So, the conclusion is that grouping causes the distributed spellcheker to 

Re: Did something change with Payloads?

2013-03-22 Thread Mark Miller

On Mar 22, 2013, at 5:54 PM, jimtronic jimtro...@gmail.com wrote:

 Ok, this is very bizzare.
 
 If I insert more than one document at a time using the update handler like
 so:
 
 [{id:1,foo_ap:bar|50}},{id:2,foo_ap:bar|75}]
 
 It actually stores the same payload value 50 for both docs.
 
 That seems like a bug, no?
 
 There was a core change in 4.1 to how payloads were stored. I'm wondering if
 solr is not handling them properly?

This could be - if you have compiled a lot of evidence (sorry i have not had 
time to follow up on this myself), please create a jira issue for more 
prominence.

- Mark

 
 Jim
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Did-something-change-with-Payloads-tp4049561p4050599.html
 Sent from the Solr - User mailing list archive at Nabble.com.



doc cache issues... query-time way to bypass cache?

2013-03-22 Thread Gary Yngve
I have a situation we just discovered in solr4.2 where there are previously
cached results from a limited field list, and when querying for the whole
field list, it responds differently depending on which shard gets the query
(no extra replicas).  It either returns the document on the limited field
list or the full field list.

We're releasing tonight, so is there a query param to selectively bypass
the cache, which I can use as a temp fix?

Thanks,
Gary


Boost query parameter with Lucid parser and using query FunctionQuery

2013-03-22 Thread Miller, Will Jr
I have been playing around with the bq/bf/boost query parameters available in 
dismax/edismax. I am using the Lucid parser as my default parser for the query. 
The lucid parser is an extension of the DisMax parser and should contain 
everything that is available in that  parser. My goal is boost items that have 
the word treatment in the title field. I started with the bq parameter and this 
works but it is an additive boost. I would prefer a multiplicative boost so I 
started to look at using boost which is part of edismax.

This is my full query:
/lucid?q=cancersort=score+descfl=title,scorewt=xmlindent=truedebugQuery=trueboost=product(10,query({!dismax
 qf=title v=treatment},0))
What I see in the debug data:

  str name=parsedqueryBoostedQuery(boost((abstract:blood | author:blood | 
origtitle:blood | substance:blood | text_all:blood | 
title:blood^5.0)~0.01,product(const(10),query(+(title:treatment) 
(abstract:treatment | author:treatment | substance:treatment | 
title:treatment^5.0 | text_all:treatment | 
origtitle:treatment),def=0.0/str
  str name=parsedquery_toStringboost((abstract:blood | author:blood | 
origtitle:blood | substance:blood | text_all:blood | 
title:blood^5.0)~0.01,product(const(10),query(+(title:treatment) 
(abstract:treatment | author:treatment | substance:treatment | 
title:treatment^5.0 | text_all:treatment | origtitle:treatment),def=0.0)))/str

In the boost query I am specifying the field as title but it is expanding to 
look in all of the fields.

How do I restrict the boost query to just look in the title field?

Thanks,
Will


Re: NoSuchMethodError updateDocument

2013-03-22 Thread Furkan KAMACI
I just indicated that JVM parameter:

-Dsolr.solr.home=/home/projects/lucene-solr/solr/solr_home

solr_home is where is my config files etc. stands. My solr.xml has that
lines:

cores adminPath=/admin/cores defaultCoreName=collection1
host=${host:} hostPort=${jetty.port:} hostContext=${hostContext:}
zkClientTimeout=${zkClientTimeout:15000}
   core name=collection1 instanceDir=collection1/
/cores

On the other hand I run it from my tomcat without using example embedded
jetty start.jar.

Any ideas?

2013/3/22 Furkan KAMACI furkankam...@gmail.com

 I use Solr 4.1.0 and Nutch 2.1, Java 1.7.0_17, Tomcat 7.0, Intellij IDEA
 12.with a Centos 6.4 at my 64 bit computer.

 I run that command succesfully:

 bin/nutch solrindex http://localhost:8080/solr -index

 However when I run that command:

 bin/nutch solrindex http://localhost:8080/solr -reindex

 I get that error :

 Mar 22, 2013 6:48:27 PM org.apache.solr.common.SolrException log
 SEVERE: null:java.lang.RuntimeException: java.lang.NoSuchMethodError:
 org.apache.lucene.index.IndexWriter.updateDocument(Lorg/apache/lucene/index/Term;Lorg/apache/lucene/index/IndexDocument;Lorg/apache/lucene/analysis/Analyzer;)V
 at
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:653)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:366)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
 at
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:936)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
 at
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
 at
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
 at
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: java.lang.NoSuchMethodError:
 org.apache.lucene.index.IndexWriter.updateDocument(Lorg/apache/lucene/index/Term;Lorg/apache/lucene/index/IndexDocument;Lorg/apache/lucene/analysis/Analyzer;)V
 at
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:201)
 at
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
 at
 org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
 at
 org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:451)
 at
 org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:587)
 at
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:346)
 at
 org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
 at
 org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
 at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
 at
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
 at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
 ... 16 more



Re: Boost query parameter with Lucid parser and using query FunctionQuery

2013-03-22 Thread Jack Krupansky
You'll have to contact Lucid's support for questions about their code. (I've 
been away from that code too long to recall much about it.)


-- Jack Krupansky

-Original Message- 
From: Miller, Will Jr

Sent: Friday, March 22, 2013 7:07 PM
To: solr-user@lucene.apache.org
Subject: Boost query parameter with Lucid parser and using query 
FunctionQuery


I have been playing around with the bq/bf/boost query parameters available 
in dismax/edismax. I am using the Lucid parser as my default parser for the 
query. The lucid parser is an extension of the DisMax parser and should 
contain everything that is available in that  parser. My goal is boost items 
that have the word treatment in the title field. I started with the bq 
parameter and this works but it is an additive boost. I would prefer a 
multiplicative boost so I started to look at using boost which is part of 
edismax.


This is my full query:
/lucid?q=cancersort=score+descfl=title,scorewt=xmlindent=truedebugQuery=trueboost=product(10,query({!dismax 
qf=title v=treatment},0))

What I see in the debug data:

 str name=parsedqueryBoostedQuery(boost((abstract:blood | author:blood 
| origtitle:blood | substance:blood | text_all:blood | 
title:blood^5.0)~0.01,product(const(10),query(+(title:treatment) 
(abstract:treatment | author:treatment | substance:treatment | 
title:treatment^5.0 | text_all:treatment | 
origtitle:treatment),def=0.0/str
 str name=parsedquery_toStringboost((abstract:blood | author:blood | 
origtitle:blood | substance:blood | text_all:blood | 
title:blood^5.0)~0.01,product(const(10),query(+(title:treatment) 
(abstract:treatment | author:treatment | substance:treatment | 
title:treatment^5.0 | text_all:treatment | 
origtitle:treatment),def=0.0)))/str


In the boost query I am specifying the field as title but it is expanding to 
look in all of the fields.


How do I restrict the boost query to just look in the title field?

Thanks,
Will 



Re: Boost query parameter with Lucid parser and using query FunctionQuery

2013-03-22 Thread Jan Høydahl
Why would you use dismax for the query() when you want to match a simple term 
to one field?

If you share echoParams=all the answer may lie somewhere therein?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

23. mars 2013 kl. 00:07 skrev Miller, Will Jr will.mil...@wolterskluwer.com:

 I have been playing around with the bq/bf/boost query parameters available in 
 dismax/edismax. I am using the Lucid parser as my default parser for the 
 query. The lucid parser is an extension of the DisMax parser and should 
 contain everything that is available in that  parser. My goal is boost items 
 that have the word treatment in the title field. I started with the bq 
 parameter and this works but it is an additive boost. I would prefer a 
 multiplicative boost so I started to look at using boost which is part of 
 edismax.
 
 This is my full query:
 /lucid?q=cancersort=score+descfl=title,scorewt=xmlindent=truedebugQuery=trueboost=product(10,query({!dismax
  qf=title v=treatment},0))
 What I see in the debug data:
 
  str name=parsedqueryBoostedQuery(boost((abstract:blood | author:blood | 
 origtitle:blood | substance:blood | text_all:blood | 
 title:blood^5.0)~0.01,product(const(10),query(+(title:treatment) 
 (abstract:treatment | author:treatment | substance:treatment | 
 title:treatment^5.0 | text_all:treatment | 
 origtitle:treatment),def=0.0/str
  str name=parsedquery_toStringboost((abstract:blood | author:blood | 
 origtitle:blood | substance:blood | text_all:blood | 
 title:blood^5.0)~0.01,product(const(10),query(+(title:treatment) 
 (abstract:treatment | author:treatment | substance:treatment | 
 title:treatment^5.0 | text_all:treatment | 
 origtitle:treatment),def=0.0)))/str
 
 In the boost query I am specifying the field as title but it is expanding to 
 look in all of the fields.
 
 How do I restrict the boost query to just look in the title field?
 
 Thanks,
 Will



Re: NoSuchMethodError updateDocument

2013-03-22 Thread Jan Høydahl
Are you 100% sure you use the exact jars for 4.1.0 *everywhere*, and that 
you're not blending older versions from the Nutch distro in your classpath here?

 Any ideas?
BTW: What was your question here regarding Jetty vs Tomcat?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

23. mars 2013 kl. 00:50 skrev Furkan KAMACI furkankam...@gmail.com:

 I just indicated that JVM parameter:
 
 -Dsolr.solr.home=/home/projects/lucene-solr/solr/solr_home
 
 solr_home is where is my config files etc. stands. My solr.xml has that
 lines:
 
 cores adminPath=/admin/cores defaultCoreName=collection1
 host=${host:} hostPort=${jetty.port:} hostContext=${hostContext:}
 zkClientTimeout=${zkClientTimeout:15000}
   core name=collection1 instanceDir=collection1/
 /cores
 
 On the other hand I run it from my tomcat without using example embedded
 jetty start.jar.
 
 Any ideas?
 
 2013/3/22 Furkan KAMACI furkankam...@gmail.com
 
 I use Solr 4.1.0 and Nutch 2.1, Java 1.7.0_17, Tomcat 7.0, Intellij IDEA
 12.with a Centos 6.4 at my 64 bit computer.
 
 I run that command succesfully:
 
 bin/nutch solrindex http://localhost:8080/solr -index
 
 However when I run that command:
 
 bin/nutch solrindex http://localhost:8080/solr -reindex
 
 I get that error :
 
 Mar 22, 2013 6:48:27 PM org.apache.solr.common.SolrException log
 SEVERE: null:java.lang.RuntimeException: java.lang.NoSuchMethodError:
 org.apache.lucene.index.IndexWriter.updateDocument(Lorg/apache/lucene/index/Term;Lorg/apache/lucene/index/IndexDocument;Lorg/apache/lucene/analysis/Analyzer;)V
at
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:653)
at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:366)
at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:936)
at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
at
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
 Caused by: java.lang.NoSuchMethodError:
 org.apache.lucene.index.IndexWriter.updateDocument(Lorg/apache/lucene/index/Term;Lorg/apache/lucene/index/IndexDocument;Lorg/apache/lucene/analysis/Analyzer;)V
at
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:201)
at
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at
 org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at
 org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:451)
at
 org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:587)
at
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:346)
at
 org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at
 org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
... 16 more
 



RE: Boost query parameter with Lucid parser and using query FunctionQuery

2013-03-22 Thread Miller, Will Jr
This is the echo params... It looks like it ignores the qf in the FunctionQuery 
and instead takes the qf of the main query.

lst name=params
str name=spellchecktrue/str
str name=facettrue/str
str name=sortscore desc/str
str name=facet.limit11/str
str name=q.alt*:*/str
str name=showFindSimilarLinkstrue/str
str name=f.body.hl.alternateFieldbody/str
str name=hltrue/str
str name=stopwords.enabledtrue/str
str name=feedbackfalse/str
str name=echoParamsall/str
str name=fltitle,score/str
str name=f.body.hl.maxAlternateFieldLength250/str
arr name=role
strDEFAULT/str  
strDEFAULT/str
/arr
arr name=facet.field
strauthor_display/str
strdata_source_name/str
strkeywords_display/str
strmimeType/str
/arr
str 
name=synonyms.fieldsabstract,body,comments,country,description,diseaseconcept,genesymbol,grant,institution,investigator,investigatoraffiliation,keywordheading,nlmjournalname,origtitle,otherabstract,personname,primaryauthor,protocolconcept,spaceflight,substance,text_all,title/str
str name=auto-completetrue/str
str name=likeDoc.flauthor,title/str
str name=facet.mincount1/str
str name=feedback.emphasisrelevancy/str
str name=qfabstract author origtitle substance text_all 
title^5.0/str
str 
name=hl.flabstract,author,authorfullname,authorlast,body,comments,country,diseaseconcept,genesymbol,grant,institution,investigator,investigatoraffiliation,keywordheading,nlmjournalname,origtitle,otherabstract,personname,primaryauthor,protocolconcept,substance,title/str
str name=spellcheck.collatetrue/str
str name=spellcheck.onlyMorePopulartrue/str
str name=defTypelucid/str
str name=pfabstract substance author title^5.0 text_all 
origtitle/str
str 
name=stopwords.fieldsabstract,body,comments,country,description,diseaseconcept,genesymbol,grant,institution,investigator,investigatoraffiliation,keywordheading,keywords,nlmjournalname,origtitle,otherabstract,personname,primaryauthor,protocolconcept,spaceflight,substance,title/str
str name=boostproduct(10,query({!dismax qf=title 
v=treatment},0))/str
str name=synonyms.enabledtrue/str
str name=debugQuerytrue/str
str name=indenttrue/str
str name=qcancer/str
str name=wtxml/str
/lst

-Original Message-
From: Jan Høydahl [mailto:jan@cominvent.com] 
Sent: Friday, March 22, 2013 8:07 PM
To: solr-user@lucene.apache.org
Subject: Re: Boost query parameter with Lucid parser and using query 
FunctionQuery

Why would you use dismax for the query() when you want to match a simple term 
to one field?

If you share echoParams=all the answer may lie somewhere therein?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

23. mars 2013 kl. 00:07 skrev Miller, Will Jr will.mil...@wolterskluwer.com:

 I have been playing around with the bq/bf/boost query parameters available in 
 dismax/edismax. I am using the Lucid parser as my default parser for the 
 query. The lucid parser is an extension of the DisMax parser and should 
 contain everything that is available in that  parser. My goal is boost items 
 that have the word treatment in the title field. I started with the bq 
 parameter and this works but it is an additive boost. I would prefer a 
 multiplicative boost so I started to look at using boost which is part of 
 edismax.
 
 This is my full query:
 /lucid?q=cancersort=score+descfl=title,scorewt=xmlindent=truedebu
 gQuery=trueboost=product(10,query({!dismax qf=title v=treatment},0)) 
 What I see in the debug data:
 
  str name=parsedqueryBoostedQuery(boost((abstract:blood | 
 author:blood | origtitle:blood | substance:blood | text_all:blood | 
 title:blood^5.0)~0.01,product(const(10),query(+(title:treatment) 
 (abstract:treatment | author:treatment | substance:treatment | 
 title:treatment^5.0 | text_all:treatment | 
 origtitle:treatment),def=0.0/str
  str name=parsedquery_toStringboost((abstract:blood | author:blood 
 | origtitle:blood | substance:blood | text_all:blood | 
 title:blood^5.0)~0.01,product(const(10),query(+(title:treatment) 
 (abstract:treatment | author:treatment | substance:treatment | 
 title:treatment^5.0 | text_all:treatment | 
 origtitle:treatment),def=0.0)))/str
 
 In the boost query I am specifying the field as title but it is expanding to 
 look in all of the fields.
 
 How do I restrict the boost query to just look in the title field?
 
 Thanks,
 Will



Question on highlighting of external fields

2013-03-22 Thread Jamie Johnson
Some time ago I had worked with a fellow developer to put together an addon
to the (then) current Solr Highlighter to support fetching fields from an
external source (like a database for instance).  The general mechanics seem
to work properly but I am seeing issues now where the highlights do not
match up with the values in the query (i.e. the user enters dragon and 10
characters after that word are the em tags).  A simple test I put together
does not exhibit this so I am at a bit of an endpass as to how exactly
track the issue down.  Are there any general things that I should be aware
of when attempting to do this?  Is there any encoding/analysis that I need
to consider when doing this (i.e. is it sufficient to store the text as it
came in or should it be after some analysis via an analyzer has done
something to it)?  Any thoughts on this would be greatly appreciated.