Re: Hi

2013-01-24 Thread Dmitry Kan
(start-off-topic): Alexandre, nice ideas. Last in the *) list is a bit far
stretched, but still good. I would still add one: how to have exact matches
and inexact matches in the same analyzed field. (end-off-topic)

On Wed, Jan 23, 2013 at 2:40 PM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 We need a Make your own adventure  (TM) Solr troubleshooting guide. :-)

 *) You are staring at the Solr installation full of twisty little passages
 and nuances. Would you like to:
*) Build your first index?
*) Make your first query?
*) Spread your documents in the cloud?
*) Build your own UpdateProcessor to integrate reverse Geocoding web
 service into your NLP disambiguation UIMA module to drive your More Like
 This suggestions?

 Well, maybe somebody with more imagination can figure the better way to
 phrase it. Then, we make a mobile app for doing this and retire
 millionaires. :-) Though that last one could make for an awesome Solr demo.
 :-)

 Seriously though.

 Thendral,
 You do need to say at least how far you got before you emailed us. Have you
 gone through tutorial and understood that but your own custom schema is
 giving you troubles? Have you tried indexing a Solr Update XML document
 containing the data you believe you have?

 You need to be able to take a long problem and split it into half and see
 which half works and which one does not. It is bit hard from your
 description.

 Regards,
Alex.


 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


 On Wed, Jan 23, 2013 at 7:00 AM, Upayavira u...@odoko.co.uk wrote:

  You are going to have to give more information than this. If you get bad
  request, look in the logs for the Solr server and you will probably find
  an exception there that tells you what was wrong with your document.
 
  Upayavira
 
  On Wed, Jan 23, 2013, at 08:58 AM, Thendral Thiruvengadam wrote:
   Hi,
  
   We are trying to use solar for indexing our application data.
  
   When we try to add a new object into solr, we are getting Bad Request.
  
   Please help us with this.
  
   Thanks,
   Thendral
  
   
  
   http://www.mindtree.com/email/disclaimer.html
 



RE: Issues with docFreq/docCount on SolrCloud

2013-01-24 Thread Markus Jelsma
Alright, so my suggestion of overriding HttpShardHandler to route users to the 
same replica instead of shuffling the replica URL's is doable? What about the 
comment in HttpShardHandler then?

  //
  // Shuffle the list instead of use round-robin by default.
  // This prevents accidental synchronization where multiple shards could 
get in sync
  // and query the same replica at the same time.
  //
  if (urls.size()  1)
Collections.shuffle(urls, httpShardHandlerFactory.r);
  shardToURLs.put(shard, urls);

Instead of shuffling i would then hash the user to the correct replica if 
possible.

Thanks,
Markus
 
-Original message-
 From:Mark Miller markrmil...@gmail.com
 Sent: Thu 24-Jan-2013 00:33
 To: solr-user@lucene.apache.org
 Subject: Re: Issues with docFreq/docCount on SolrCloud
 
 
 On Jan 23, 2013, at 6:21 PM, Yonik Seeley yo...@lucidworks.com wrote:
 
  A solr request could request a token that when resubmitted with a
  follow-up request would result in hitting the same replicas if
  possible.
 
 Yeah, this would be good. It's also useful for not catching eventual 
 consistency effects between queries.
 
 - Mark


RE: problem in qf parameter - no results

2013-01-24 Thread Markus Jelsma
Hi,

I think it's your mm-parameter and that the terms are not matched in the 
'setctor' field.

Cheers, 
 
-Original message-
 From:Gastone Penzo gastone.pe...@gmail.com
 Sent: Thu 24-Jan-2013 10:11
 To: solr-user@lucene.apache.org
 Subject: problem in qf parameter - no results
 
 Hi,
 i have a problem with qf parameter:
 
 
 38 results
 localhost:8983/solr/select/?defType=edismaxqf=title^1author ^0.75
 publisher^0.25q=bibbia di gerusalemme
 
 0 risults
 localhost:8983/solr/select/?defType=edismaxqf=title^1 author^0.75
 publisher^0.25 setctor^0.25q=bibbia di gerusalemme
 
 
 the different is only the field sector which is:
 
 field name=sector  type=string  indexed=true  stored=true
 required=false multiValued=true/
 
 why adding the sector field in qf parameter solr returns 0 products??
 
 thank you
 
 -- 
 *Gastone Penzo*
 *
 *
 


Re: Confused by queries

2013-01-24 Thread Anders Melchiorsen

Hello.

That is indeed an excellent article, thanks for pointing me at it. With
a title like that, it is no wonder that I was unable to google it on my
own.

It is probably the exception in this rule that has been confusing me:

If a BooleanQuery contains no MUST BooleanClauses, then a
document is only considered a match against the BooleanQuery
if one or more of the SHOULD BooleanClauses is a match.

So +group:id +keyword:text and (+group:id) +keyword:text mean
completely different things.

I have mostly been using the reference at
http://lucene.apache.org/core/3_6_0/queryparsersyntax.html and it does
not mention this distinction. Quite the contrary, actually, as it says
that grouping can be used to eliminate confusion, thereby suggesting 
that

the usual rules of Boolean algebra apply.


Thanks again,
Anders.


On 23.01.2013 02:20, Erick Erickson wrote:

Solr/Lucene does not implement strict boolean logic. Here's an
excellent blog discussing this:

http://searchhub.org/dev/2011/12/28/why-not-and-or-and-not/

Best
Erick

On Tue, Jan 22, 2013 at 7:25 PM, Otis Gospodnetic
otis.gospodne...@gmail.com wrote:

Well, depends on what you indexed.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Jan 22, 2013 5:48 PM, Anders Melchiorsen 
m...@spoon.kalibalik.dk

wrote:


Thanks, though I am still confused.

How about this one:

manu:apple = 1 hit
+name:video = 2 hits

manu:apple +name:video = 2 hits

Solr ignores the manu:apple part completely?


Cheers,
Anders.


Den 22/01/13 23.16, Jack Krupansky skrev:


The first query:

   name:ipod OR -name:ipod = 0 hits

The OR and - are actually at the same level of the 
BooleanQuery, so

the - overrides the OR so it's equivalent to:

   name:ipod -name:ipod = 0 hits

For the second query:

   (name:ipod) OR (-name:ipod) = 3 hits

Pure negative queries are supported only at the top level, so the
(-name:ipod) matches nothing, so the query is equivalent to:

   (name:ipod) = 3 hits

You can simply insert a *:* to assure that it is not a pure 
negative

query inside the parentheses:

   (name:ipod) OR (*:* -name:ipod)

-- Jack Krupansky

-Original Message- From: Anders Melchiorsen
Sent: Tuesday, January 22, 2013 4:59 PM
To: solr-user@lucene.apache.org
Subject: Confused by queries

Hello!

With the example server of Solr 4.0.0 (with *.xml indexed), I get 
these

results:

*:* = 32 hits
name:ipod = 3 hits
-name:ipod = 29 hits

That is all fine, but for these next queries, I would expect to 
get 32
hits (i.e. everything), or at least the same number of hits for 
both

queries:

name:ipod OR -name:ipod = 0 hits
(name:ipod) OR (-name:ipod) = 3 hits

As my expectations are not met, I must be missing something?


Thanks,
Anders.








Re: setting up master and slave in same machine with diff ip's and same port

2013-01-24 Thread Upayavira
You could configure your servlet container (jetty/tomcat) to have
specific webapps/contexts listen on specific IP/port combinations, that
would get you some way, But what you are asking is more about networking
and servlet container configuration than about Solr.

Upayavira

On Wed, Jan 23, 2013, at 10:48 PM, epnRui wrote:
 Hi everyone 
 
 its my first post here so I hope im doing it in the right place. 
 
 Im a software developer and Im setting up a DEV environment in Ubuntu
 with
 the same configuration as in PROD. (apparently this IT department doesnt
 know the difference between a developer and a sys admin) 
 
 In PROD we have Solr Master and Solr slave, on two different IPs. Lets
 say: 
 Master 192.10.1.1 
 Slave 192.10.1.2 
 
 In DEV I have only one server: 
 10.1.1.1 
 
 All of them are Ubuntu servers. 
 
 Can I put Master and Slave, without touching any configurations in
 Solr,no
 IP change, no Port change, in 10.1.1.1 (DEV), and still make it work? 
 
 Basically what Im looking for is what Ubuntu server configuration Id have
 to
 do to make this work. 
 
 Thanks a lot
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/setting-up-master-and-slave-in-same-machine-with-diff-ip-s-and-same-port-tp4035795.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: zookeeper config

2013-01-24 Thread J Mohamed Zahoor
Cool. Thanks.


On 24-Jan-2013, at 1:28 PM, Per Steffensen st...@designware.dk wrote:

 This is supported. You just need to ajust your ZK connection-string: 
 host1:port1/solr,host2:port2/solr,...,hostN:portN/solr
 
 Regards, Per Steffensen
 
 On 1/24/13 7:57 AM, J Mohamed Zahoor wrote:
 Hi
 
 I am using Solr 4.0.
 I see the Solr data in zookeeper is placed on the root znode itself.
 This becomes a pain if the zookeeper instance is used for multiple projects 
 like HBase and like.
 
 I am thinking of raising a Jira for putting them under a znode /solr or 
 something like that?
 
 ./Zahoor
 
 
 



solr running with multi cores

2013-01-24 Thread real_junlin
Hi,
Our company want to use solr to index our reports'data ,so we are going to 
understand solr.
 Solr support the multi cores ,in our system, the cores'num will dynamic 
increase, I afraid with more cores,the performance is decresing 
dramatically.Our system's cores'num will by over one hundred.


What I want to know is:
How many cores is supported of solr , under which level can solr running 
perfectly?
How solr allocate the system's resource(memory,disk space, cpu...) of the multi 
cores?
Is there a performance experment about solr running with many cores ?




Thanks
 junlin.

Re: solr running with multi cores

2013-01-24 Thread Otis Gospodnetic
Hi,

Please search the mailing list archives - this has been discussed a few
times in the last few months.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Jan 24, 2013 6:33 AM, real_junlin real_jun...@163.com wrote:

 Hi,
 Our company want to use solr to index our reports'data ,so we are going to
 understand solr.
  Solr support the multi cores ,in our system, the cores'num will dynamic
 increase, I afraid with more cores,the performance is decresing
 dramatically.Our system's cores'num will by over one hundred.


 What I want to know is:
 How many cores is supported of solr , under which level can solr running
 perfectly?
 How solr allocate the system's resource(memory,disk space, cpu...) of the
 multi cores?
 Is there a performance experment about solr running with many cores ?




 Thanks
  junlin.


Re: zookeeper config

2013-01-24 Thread Shawn Heisey

On 1/24/2013 12:58 AM, Per Steffensen wrote:
This is supported. You just need to ajust your ZK connection-string: 
host1:port1/solr,host2:port2/solr,...,hostN:portN/solr
My experience has been that you put the chroot at the very end, not on 
every host entry.  For a standalone zookeeper ensemble with three nodes:


server1:2181,server2:2181,server3:2181/mysolr1

This is used for the zkHost parameter both on Solr startup and with the 
CloudSolrServer object from SolrJ.  The string is used without 
modification in constructing the actual ZooKeeper object down in the 
SolrCloud internals.  Here's the documentation for that object:


http://zookeeper.apache.org/doc/r3.4.5/api/org/apache/zookeeper/ZooKeeper.html#ZooKeeper%28java.lang.String,%20int,%20org.apache.zookeeper.Watcher%29

Thanks,
Shawn



Solr autocomplete feature

2013-01-24 Thread ilay
Hi

 I want to change autocomplete implementation for our search. Current I have
a suggest field whose definition in schema.xml is as below:

   field name=suggest type=edgytext indexed=true stored=true
required=true omitNorms=false/

fieldType name=edgytext class=solr.TextField
positionIncrementGap=0
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=0 splitOnCaseChange=0 splitOnNumerics=0
catenateWords=1 catenateNumbers=1 catenateAll=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory
minGramSize=2 maxGramSize=10 /
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType


It works as follows. 
“shoes” will match “casual shoes”, “sports shoes”, “shoes” etc.


Whereas I want it to match only the values that starts with the user query.
Ie. If user types “shoes”, I want suggest terms that starts with “shoes”
(or) has the query string as prefix string in “suggest” filed in the index.

Please let me know how to do this.

Regards,
Ilay




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-autocomplete-feature-tp4035927.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr autocomplete feature

2013-01-24 Thread Ilayaraja . P
Hi

 I want to change autocomplete implementation for our search. Current I have a 
suggest field whose definition in schema.xml is as below:

   field name=suggest type=edgytext indexed=true stored=true 
required=true omitNorms=false/

fieldType name=edgytext class=solr.TextField 
positionIncrementGap=0
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=0 splitOnCaseChange=0 splitOnNumerics=0 
catenateWords=1 catenateNumbers=1 catenateAll=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory 
minGramSize=2 maxGramSize=10 /
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType


It works as follows.
shoes will match casual shoes, sports shoes, shoes etc.


Whereas I want it to match only the values that starts with the user query.
Ie. If user types shoes, I want suggest terms that starts with shoes (or) 
has the query string as prefix string in suggest filed in the index.

Please let me know how to do this.

Regards,
Ilay


Re: AW: AW: auto completion search with solr using NGrams in SOLR

2013-01-24 Thread AnnaVak
Thanks for your solution it works for me too, I'm new with Solr but how I can
additionally fetch another fields not only field that was used for
searching? For example I have product title and image fields and I want to
get the title but also related to this title image ? How can I do this?

Thanks in advance 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4035931.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr 4.1.0 shardHandlerFactory Null Pointer Exception when setting up embedded solrj solr server for unit testing

2013-01-24 Thread Ted Merchant
We recently updated from Solr 4.0.0 to Solr 4.1.0.  Because of the change we 
were forced to upgrade a custom query parser.  While the code change itself was 
minimal, we found that our unit tests stopped working because of a 
NullPointerException on line 181 of handler.component.SearchHandler:
ShardHandler shardHandler1 = shardHandlerFactory.getShardHandler();
We determined that the cause of this exception was that shardHandlerFactory was 
never initialized in the solr container.  The reason for this seems to be that 
the shard handler is setup in core.CoreContainer::initShardHandler which is 
called from core.CoreContainer::load.
When setting up the core container we were using the  public 
CoreContainer(SolrResourceLoader loader) constructor.  This constructor never 
calls the load method, so initShardHandler is never called and the shardHandler 
is never initialized.

In Solr 4.0.0 the shardHandler was initialized on the calling of 
getShardHandlerFactory.  This code was modified and moved by revision 1422728: 
SOLR-4204: Make SolrCloud tests more friendly to FreeBSD blackhole 2 
environments.

We fixed our issue by using the public CoreContainer(String dir, File 
configFile) constructor which calls the load method.
I just wanted to make sure that people were aware of this issue and to 
determine if it really is an issue or if having the shardHandler be null was 
expected behavior unless someone called the load(String dir, File configFile ) 
method.

Thank you,

Ted



Stack trace of error:
org.apache.solr.client.solrj.SolrServerException: 
org.apache.solr.client.solrj.SolrServerException: java.lang.NullPointerException
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:223)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
at 
org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
at 
com.cision.search.solr.ProximityQParserTest.testInit(ProximityQParserTest.java:72)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at 
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: org.apache.solr.client.solrj.SolrServerException: 
java.lang.NullPointerException
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:155)
... 27 more
Caused by: java.lang.NullPointerException
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:181)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at 

Re: Problem with migration from solr 3.5 with SOLR-2155 usage to solr 4.0

2013-01-24 Thread Viacheslav Davidovich
Hi David,

thank you for your answer.

After update to this field type and change the SOLR query I receive required 
behavior.

Also could you update the WIKI page after the words it needs to be in 
WEB-INF/lib in Solr's war file, basically also add the maven artifact code 
like this?

dependency
groupIdcom.vividsolutions/groupId
artifactIdjts/artifactId
version1.13/version
/dependency 

I think this may help for users used maven.

WBR Viacheslav.

On 23.01.2013, at 19:24, Smiley, David W. wrote:

 Viacheslav,
 
 
 SOLR-2155 is only compatible with Solr 3.  However the technology it is
 based on lives on in Lucene/Solr 4 in the
 SpatialRecursivePrefixTreeFieldType field type.  In the example schema
 it's registered under the name location_rpt.  For more information on
 how to use this field type, see: SpatialRecursivePrefixTreeFieldType
 
 ~ David Smiley
 
 On 1/23/13 11:11 AM, Viacheslav Davidovich
 viacheslav.davidov...@objectstyle.com wrote:
 
 Hi, 
 
 With Solr 3.5 I use SOLR-2155 plugin to filter the documents by distance
 as described in 
 http://wiki.apache.org/solr/SpatialSearch#Advanced_Spatial_Search and
 this solution perfectly filter the multiValued data defined in schema.xml
 like
 
 fieldType name=geohash class=solr2155.solr.schema.GeoHashField
 length=12 /
 
 field name=location_data type=geohash indexed=true stored=true
 multiValued=true/
 
 the query looks like this with Solr 3.5:  q=*:*fq={!geofilt}sfield=
 location_datapt=45.15,-93.85d=50sort=geodist() asc
 
 As SOLR-2155 plugin not compatible with solr 4.0 I try to change the
 field definition to next:
 
 fieldType name=location class=solr.LatLonType
 subFieldSuffix=_coordinate /
 
 field name=location_data type=location indexed=true stored=true
 multiValued=true/
 
 dynamicField name=*_coordinate type=tdouble indexed=true
 stored=false /
 
 But in this case after geofilt by location_data execution the correct
 values returns only if the field have 1 value, if more them 1 value
 stored in index required documents returns only when all the location
 points are matched.
 
 Have anybody experience or any ideas how to receive the same behavior in
 solr4.0 as this was in solr3.5 with SOLR-2155 plugin usage?
 
 Is this possible at all or I need to refactor the document structure and
 field definition to store only 1 location value per document?
 
 WBR Viacheslav.
 
 
 



Submit schema definition using curl via SOLR

2013-01-24 Thread Fadi Mohsen
Hi, We would like to use Solr to index statistics from any Java module in
our production environment.

Applications have to can create collections and index data on demand, so my
initial thought is to use different HTTP methods to accomplish a collection
in cluster and then right away start HTTP POST documents, but the issue
here is the schema.xml.
Is it possible to HTTP POST the schema via Solr to Zookeeper?

Or do I have to know about other service host/IP than SOLR, such as
ZooKeeper (wanted to understand whether there is a way to avoid knowing
about zookeeper in production.)?

This must be a duplicate of another question, excuse me in advance.

Regards
Fadi


Re: AW: AW: auto completion search with solr using NGrams in SOLR

2013-01-24 Thread Naresh
Hi,
You can fetch all the stored fields by passing them as part of
*fl*parameter. Go through
http://wiki.apache.org/solr/CommonQueryParameters#fl


On Thu, Jan 24, 2013 at 8:56 PM, AnnaVak anna.vakulc...@gmail.com wrote:

 Thanks for your solution it works for me too, I'm new with Solr but how I
 can
 additionally fetch another fields not only field that was used for
 searching? For example I have product title and image fields and I want to
 get the title but also related to this title image ? How can I do this?

 Thanks in advance



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4035931.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards
Naresh


Does solr 4.1 support field compression?

2013-01-24 Thread Ken Prows
Hi everyone,

I didn't see any mention of field compression in the release notes for
Solr 4.1. Did the ability to automatically compress fields end up
getting added to this release?

Thanks!,
Ken


Re: Does solr 4.1 support field compression?

2013-01-24 Thread Rafał Kuć
Hello!

It should be turned on by default, because the stored fields
compression is the behavior of the default Lucene 4.1 codec.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi everyone,

 I didn't see any mention of field compression in the release notes for
 Solr 4.1. Did the ability to automatically compress fields end up
 getting added to this release?

 Thanks!,
 Ken



AW: Does solr 4.1 support field compression?

2013-01-24 Thread André Widhani
This is what it listed under the Highlights on the Apache page announcing the 
Solr 4.1 release:

  The default codec incorporates an efficient compressed stored fields 
implementation that compresses chunks of documents together with LZ4. (see 
http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene)

André


Von: Rafał Kuć [r@solr.pl]
Gesendet: Donnerstag, 24. Januar 2013 16:45
An: solr-user@lucene.apache.org
Betreff: Re: Does solr 4.1 support field compression?

Hello!

It should be turned on by default, because the stored fields
compression is the behavior of the default Lucene 4.1 codec.

--
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi everyone,

 I didn't see any mention of field compression in the release notes for
 Solr 4.1. Did the ability to automatically compress fields end up
 getting added to this release?

 Thanks!,
 Ken



Re: Submit schema definition using curl via SOLR

2013-01-24 Thread Per Steffensen
Basically uploading a Solr config (including schema.xml, 
solrconfig.xml etc.) is an operation different from creating 
collections. When creating a collection (e.g. using the Collection API) 
you reference the (already existing) Solr config it needs to use. 
Collections can share Solr configs. I know of at least two ways to 
load a Solr config into ZK using Solr-tools.


1) You can use ZkCLI tool (of course ZK needs to be started) - something 
like this

mkdir -p ${SOLR_INSTALL}/example/webapps/temp
cp ${SOLR_INSTALL}/example/webapps/solr.war 
${SOLR_INSTALL}/example/webapps/temp

cd ${SOLR_INSTALL}/example/webapps/temp
jar -xf solr.war
java  -classpath ${SOLR_INSTALL}/example/webapps/temp/WEB-INF/lib/* 
org.apache.solr.cloud.ZkCLI -cmd upconfig -confdir 
path_to_solr_config_dir -confname logical_solr_config_name --zkhost 
zk_connection_str

rm -rf ${SOLR_INSTALL}/example/webapps/temp
Believe there is also a zkcli.sh tool

2) or You can have an Solr node (server) load a Solr config into ZK 
during startup by adding collection.configName and bootstrap_confdir VM 
params - something like this
java -DzkHost=zk_connection_str -Dcollection.configName=edr_sms_conf 
-Dbootstrap_confdir=path_to_solr_config_dir -jar start.jar


I prefer 1) for several reasons.

Regards, Per Steffensen

On 1/24/13 4:02 PM, Fadi Mohsen wrote:

Hi, We would like to use Solr to index statistics from any Java module in
our production environment.

Applications have to can create collections and index data on demand, so my
initial thought is to use different HTTP methods to accomplish a collection
in cluster and then right away start HTTP POST documents, but the issue
here is the schema.xml.
Is it possible to HTTP POST the schema via Solr to Zookeeper?

Or do I have to know about other service host/IP than SOLR, such as
ZooKeeper (wanted to understand whether there is a way to avoid knowing
about zookeeper in production.)?

This must be a duplicate of another question, excuse me in advance.

Regards
Fadi





Re: Does solr 4.1 support field compression?

2013-01-24 Thread Ken Prows
Doh!, I went straight for the release notes. Thanks, this is the
feature I was waiting for :)

Ken

On Thu, Jan 24, 2013 at 10:49 AM, André Widhani
andre.widh...@digicol.de wrote:
 This is what it listed under the Highlights on the Apache page announcing 
 the Solr 4.1 release:

   The default codec incorporates an efficient compressed stored fields 
 implementation that compresses chunks of documents together with LZ4. (see 
 http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene)

 André

 
 Von: Rafał Kuć [r@solr.pl]
 Gesendet: Donnerstag, 24. Januar 2013 16:45
 An: solr-user@lucene.apache.org
 Betreff: Re: Does solr 4.1 support field compression?

 Hello!

 It should be turned on by default, because the stored fields
 compression is the behavior of the default Lucene 4.1 codec.

 --
 Regards,
  Rafał Kuć
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi everyone,

 I didn't see any mention of field compression in the release notes for
 Solr 4.1. Did the ability to automatically compress fields end up
 getting added to this release?

 Thanks!,
 Ken



Re: Does solr 4.1 support field compression?

2013-01-24 Thread Shawn Heisey

On 1/24/2013 8:42 AM, Ken Prows wrote:

I didn't see any mention of field compression in the release notes for
Solr 4.1. Did the ability to automatically compress fields end up
getting added to this release?


The concept of compressed fields (an option in schema.xml) that existed 
in the 1.x versions of Solr (based on Lucene 2.9) was removed in Lucene 
3.0.  Because Lucene and Solr development were combined, the Solr 
version after 1.4.1 is 3.1.0, there is no 1.5 or 2.x version of Solr.


Solr/Lucene 4.1 compresses all stored field data by default.  I don't 
think there's a way to turn it off at the moment, which is causing 
performance problems for a small subset of Solr users.  When it comes 
out, Solr 4.2 will also have compressed term vectors.


The release note contains this text:

Stored fields are compressed. (See 
http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene)


It looks like the solr CHANGES.txt file fails to specifically mention 
LUCENE-4226 https://issues.apache.org/jira/browse/LUCENE-4226 which 
implemented compressed stored fields.


Thanks,
Shawn



Re: Submit schema definition using curl via SOLR

2013-01-24 Thread Per Steffensen

On 1/24/13 4:51 PM, Per Steffensen wrote:


2) or You can have an Solr node (server) load a Solr config into ZK 
during startup by adding collection.configName and bootstrap_confdir 
VM params - something like this
java -DzkHost=zk_connection_str 
-Dcollection.configName=logical_solr_config_name 
-Dbootstrap_confdir=path_to_solr_config_dir -jar start.jar

Well logical_solr_config_name instead of edr_sms_conf, of course



Re: Starting instances with multiple collections

2013-01-24 Thread Per Steffensen
Each node needs a -Dsolr.solr.home pointing to a solr.xml, but the 
configuration-subfolder does not need to be there. It only needs to be 
there for the node you start with -Dbootstrap_confdir (to have it load 
the config into ZK). The next time you start this Solr you do not need 
to provide -Dbootstrap_confdir, since config is already loaded into ZK 
(well unless you run your ZK embedded in the Solr - in this case I 
believe all ZK state is removed when you close the Solr, but that is 
also just for playing)
In general, IMHO, using a Solr node to load a configuration during 
startup is only for playing. You ought to load configs into ZK as a 
separate operation from starting Solrs (and creating collections for 
that matter). Also see recent mail-list dialog Submit schema definition 
using curl via SOLR


Regards, Per Steffensen

On 1/23/13 11:12 PM, Walter Underwood wrote:

I can get one Solr 4.1 instance up with the config bootstrapped into Zookeeper. 
In zk I see two configs, two collections, and I can run the DIH on the first 
node.

I can get the other two nodes to start and sync if I give them a 
-Dsolr.solr.home pointing to a directory with a solr.xml and subdirectories 
with configuration for each collection. If I don't do that, they look for 
solr/solr.xml, then fail. But what is the point of putting configs in Zookeeper 
if each host needs a copy anyway?

The wiki does not have an example of how to start a cluster with multiple 
collections.

Am I missing something here?

wunder
--
Walter Underwood
wun...@wunderwood.org








Re: Solr SQL Express Integrated Security - Unable to execute query

2013-01-24 Thread O. Olson
Shawn Heisey-4 wrote
 There will be a lot more detail to this error.  This detail may have a 
 clue about what happened.  Can you include the entire stacktrace?
 
 Thanks,
Shawn

Thank you Shawn. The following is the entire stacktrace. I hope this helps:


INFO: Creating a connection for entity Product with URL:
jdbc:sqlserver://localhost;instanceName=SQLEXPRESS;databaseName=Amazon;integratedSecurity=true;
Jan 23, 2013 3:26:05 PM org.apache.solr.core.SolrCore execute
INFO: [db] webapp=/solr path=/dataimport params={command=status} status=0
QTime=1 
Jan 23, 2013 3:26:31 PM org.apache.solr.common.SolrException log
SEVERE: Exception while processing: Product document :
SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to execute query: SELECT [ProdID],[Descr] FROM
[Amazon].[dbo].[Table_Temp] Processing Document # 1
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:252)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:209)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:472)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:411)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:326)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:234)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The server
SQLEXPRESS is not configured to listen with TCP/IP.
at
com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:171)
at
com.microsoft.sqlserver.jdbc.SQLServerConnection.getInstancePort(SQLServerConnection.java:3188)
at
com.microsoft.sqlserver.jdbc.SQLServerConnection.primaryPermissionCheck(SQLServerConnection.java:937)
at
com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:800)
at
com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:700)
at
com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:842)
at
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:160)
at
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:127)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:362)
at
org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:38)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:239)
... 12 more

Jan 23, 2013 3:26:31 PM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [db] webapp=/solr path=/dataimport params={command=full-import}
status=0 QTime=13 {deleteByQuery=*:*} 0 13
Jan 23, 2013 3:26:31 PM org.apache.solr.common.SolrException log
SEVERE: Full Import failed:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: SELECT [ProdID],[Descr] FROM [Amazon].[dbo].[Table_Temp]
Processing Document # 1
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
Caused by: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: SELECT [ProdID],[Descr] FROM [Amazon].[dbo].[Table_Temp]
Processing Document # 1
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:413)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:326)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:234)
... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to execute query: SELECT [ProdID],[Descr] FROM
[Amazon].[dbo].[Table_Temp] 

Mahout - Solr vs Mahout Lucene Question

2013-01-24 Thread vybe3142
Hi,
I hate to double post but I'm not sure in which domain, the answer to my
question lies, so here's the link to my question on the mahout groups.

Basically, I'm getting different clustering results depending on whether I
index data with SOLR or Lucene. Please post any responses against the
original question.

Thanks

http://lucene.472066.n3.nabble.com/Clustering-using-Solr-Index-vs-Lucene-Index-Different-Results-td4036013.html



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mahout-Solr-vs-Mahout-Lucene-Question-tp4036014.html
Sent from the Solr - User mailing list archive at Nabble.com.


Deletion from database

2013-01-24 Thread hassancrowdc
Hi,
I am trying to figure out a way so that if i delete anything from my
database how will that item be deleted from my indexed data? 
is there anyway i can make new core with same config as the existing core,
do full index, swap the data with the existing core and delete the new core.
So every time i delete anything from database, it creates a new core, index
data, swap it and then delete the new core(that was made)?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Deletion-from-database-tp4036018.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Deletion from database

2013-01-24 Thread Walter Underwood
The general solution is to add a deleted column to your database, or even a 
deleted date column.

When you update Solr from the DB, issue a delete for each item deleted since 
the last successful update.

You can delete those rows after the Solr update or to be extra safe, delete 
them a few days later.

For this to work, you must not re-use IDs.

wunder

On Jan 24, 2013, at 10:05 AM, hassancrowdc wrote:

 Hi,
 I am trying to figure out a way so that if i delete anything from my
 database how will that item be deleted from my indexed data? 
 is there anyway i can make new core with same config as the existing core,
 do full index, swap the data with the existing core and delete the new core.
 So every time i delete anything from database, it creates a new core, index
 data, swap it and then delete the new core(that was made)?
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Deletion-from-database-tp4036018.html
 Sent from the Solr - User mailing list archive at Nabble.com.







Re: zookeeper config

2013-01-24 Thread Mark Miller

On Jan 24, 2013, at 7:05 AM, Shawn Heisey s...@elyograg.org wrote:

 My experience has been that you put the chroot at the very end, not on every 
 host entry

Yup - this came up on the mailing list not too long ago and it's currently 
correctly documented on the SolrCloud wiki.

- Mark

Re: Solr 4.1.0 shardHandlerFactory Null Pointer Exception when setting up embedded solrj solr server for unit testing

2013-01-24 Thread Mark Miller
This is my fault - I discovered this myself a few days ago. I've been meaning 
to file a jira ticket and have not gotten around to it yet.

You can also work around it like this:

CoreContainer container = new CoreContainer(loader) {
  // workaround since we don't call container#load
  {initShardHandler(null);}
};

- Mark

On Jan 24, 2013, at 9:22 AM, Ted Merchant ted.merch...@cision.com wrote:

 We recently updated from Solr 4.0.0 to Solr 4.1.0.  Because of the change we 
 were forced to upgrade a custom query parser.  While the code change itself 
 was minimal, we found that our unit tests stopped working because of a 
 NullPointerException on line 181 of handler.component.SearchHandler:
 ShardHandler shardHandler1 = shardHandlerFactory.getShardHandler();
 We determined that the cause of this exception was that shardHandlerFactory 
 was never initialized in the solr container.  The reason for this seems to be 
 that the shard handler is setup in core.CoreContainer::initShardHandler which 
 is called from core.CoreContainer::load. 
 When setting up the core container we were using the  public 
 CoreContainer(SolrResourceLoader loader) constructor.  This constructor never 
 calls the load method, so initShardHandler is never called and the 
 shardHandler is never initialized. 
 In Solr 4.0.0 the shardHandler was initialized on the calling of 
 getShardHandlerFactory.  This code was modified and moved by revision 
 1422728: SOLR-4204: Make SolrCloud tests more friendly to FreeBSD blackhole 2 
 environments.
  
 We fixed our issue by using the public CoreContainer(String dir, File 
 configFile) constructor which calls the load method.
 I just wanted to make sure that people were aware of this issue and to 
 determine if it really is an issue or if having the shardHandler be null was 
 expected behavior unless someone called the load(String dir, File configFile 
 ) method.
  
 Thank you,
  
 Ted
  
  
  
 Stack trace of error:
 org.apache.solr.client.solrj.SolrServerException: 
 org.apache.solr.client.solrj.SolrServerException: 
 java.lang.NullPointerException
 at 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:223)
 at 
 org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
 at 
 org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
 at 
 com.cision.search.solr.ProximityQParserTest.testInit(ProximityQParserTest.java:72)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
 Source)
 at java.lang.reflect.Method.invoke(Unknown Source)
 at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
 at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
 at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
 at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
 at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
 at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
 at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
 at 
 org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
 at 
 org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
 at 
 org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
 at 
 org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
 at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
 at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
 at 
 org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
 at 
 org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
 Caused by: org.apache.solr.client.solrj.SolrServerException: 
 java.lang.NullPointerException
 at 
 

Re: Solr SQL Express Integrated Security - Unable to execute query

2013-01-24 Thread Michael Della Bitta
On Thu, Jan 24, 2013 at 11:34 AM, O. Olson olson_...@yahoo.it wrote:

 Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The server
 SQLEXPRESS is not configured to listen with TCP/IP.


That's probably your problem...


Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


Re: Submit schema definition using curl via SOLR

2013-01-24 Thread Mark Miller

On Jan 24, 2013, at 10:02 AM, Fadi Mohsen fadi.moh...@gmail.com wrote:

 Hi, We would like to use Solr to index statistics from any Java module in
 our production environment.
 
 Applications have to can create collections and index data on demand, so my
 initial thought is to use different HTTP methods to accomplish a collection
 in cluster and then right away start HTTP POST documents, but the issue
 here is the schema.xml.
 Is it possible to HTTP POST the schema via Solr to Zookeeper?

I've done some work towards this at 
https://issues.apache.org/jira/browse/SOLR-4193

 
 Or do I have to know about other service host/IP than SOLR, such as
 ZooKeeper (wanted to understand whether there is a way to avoid knowing
 about zookeeper in production.)?

I wouldn't try to avoid it - it's probably simpler to deal with than you think.

It's also pretty easy to use 
http://wiki.apache.org/solr/SolrCloud#Command_Line_Util to upload a new 
schema.xml - then just Collections API reload command. Two lines in a script.

- Mark



Re: Deletion from database

2013-01-24 Thread hassancrowdc
ok, how can i issue delete for each item deleted since the last successful
update? Do i write something like delete query with delta import query in
dataconfig? If so, what will i add in dataconfig for deletion? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Deletion-from-database-tp4036018p4036026.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Deletion from database

2013-01-24 Thread Dyer, James
This post on stackoverflow has a good run-down on your options:
http://stackoverflow.com/questions/1555610/solr-dih-how-to-handle-deleted-documents/1557604#1557604

If you're using DIH, you can get more information from: 
http://wiki.apache.org/solr/DataImportHandler

The easiest thing, if using a delta import is to add deletePkQuery on your 
entity like this:
entity 
 name=... 
 query=... 
 deltaQuery=... 
 deltaImportQuery=...
 deletedPkQuery=SELECT ID FROM MY_TABLE WHERE DELETED='Y'
/

Another approach is to have a second top-level entity that uses the special 
command:
entity 
 name=Deletes 
 query=SELECT ID AS '$deleteDocById' FROM MY_TABLE WHERE DELETED='Y' 
/

This second approach works if you use DIH but do delta updates using the 
approach described here: 
http://wiki.apache.org/solr/DataImportHandlerFaq#fullimportdelta

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: hassancrowdc [mailto:hassancrowdc...@gmail.com] 
Sent: Thursday, January 24, 2013 12:19 PM
To: solr-user@lucene.apache.org
Subject: Re: Deletion from database

ok, how can i issue delete for each item deleted since the last successful
update? Do i write something like delete query with delta import query in
dataconfig? If so, what will i add in dataconfig for deletion? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Deletion-from-database-tp4036018p4036026.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Solr SQL Express Integrated Security - Unable to execute query

2013-01-24 Thread O. Olson
Michael Della Bitta-2 wrote
 On Thu, Jan 24, 2013 at 11:34 AM, O. Olson lt;

 olson_ord@

 gt; wrote:

 Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The server
 SQLEXPRESS is not configured to listen with TCP/IP.
 
 
 That's probably your problem...
 
 
 Michael Della Bitta
 
 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271
 
 www.appinions.com
 
 Where Influence Isn’t a Game


Good call Michael. I did have to enable TCP
(http://msdn.microsoft.com/en-us/library/hh231672.aspx  for others who have
the same problem), but I did not still not get this to work. 

I then tested my Driver, JDBC URL  SQL Query in a plain old Java class.
This showed me that it was almost impossible to get integrated
authentication to work in Java. I finally went with specifying the usename
and password literally. (I hope this useful to others):


public static void main(String[] args) throws Exception {
String url =
jdbc:sqlserver://localhost\\SQLEXPRESS;database=Amazon;user=solrusr;password=solrusr;;
String driver = com.microsoft.sqlserver.jdbc.SQLServerDriver;
Connection connection = null;
try {
System.out.println(Loading driver...);
Class.forName(driver);
System.out.println(Driver loaded! Attempting 
Connection ...);
connection = DriverManager.getConnection(url);
System.out.println(Connection succeeded!);
ResultSet RS = 
connection.createStatement().executeQuery(SELECT ProdID,
Descr FROM Table_Temp);
try {

while(RS.next() != false) {
System.out.println(RS.getString(1) ++
RS.getString(2));
}
} finally {
RS.close();
}
// Success.
} catch (SQLException e) {} finally {
if (connection != null) try { connection.close(); } catch
(SQLException ignore) {}
}
}

Hence, I modified my db-data-config.xml to

dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
url=jdbc:sqlserver://localhost\SQLEXPRESS;databaseName=Amazon;user=solrusr;password=solrusr;/
document
entity name=Product 
query=SELECT ProdID,Descr FROM Table_Temp
field column=ProdID name=ProdID /
field column=Descr name=Descr /
/entity
/document
/dataConfig

This worked for me.

Thanks again Michael  Shawn.
O. O.










--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-SQL-Express-Integrated-Security-Unable-to-execute-query-tp4035758p4036056.html
Sent from the Solr - User mailing list archive at Nabble.com.


PK uniqueness aware Solr index merging?

2013-01-24 Thread Gregg Donovan
We have a Hadoop process that produces a set of Solr indexes from a cluster
of HBase documents. After the job runs, we pull the indexes from HDFS and
merge the them together locally. The issue we're running into is that
sometimes we'll have duplicate occurrences of a primary key across indexes
that we'll want merged out. For example, a set of directories with:

./dir00/
doc_id=0
PK=1

./dir01/
doc_id=0
PK=1

should merge into a Solr index containing a single document rather than one
with two Lucene documents each containing PK=1.

The Lucene-level merge code -- i.e. oal.index.SegmentMerger.merge()--
doesn't know about the Solr schema, so it will merge these two directories
into two duplicate documents. It doesn't appear that either Solr's
oas.handler.admin.CoreAdminHandler.handleMergeAction(SolrQueryRequest,
SolrQueryResponse) handles this either, as it ends up passing the list of
merge directories to oal.index.IndexWriter.addIndexes(IndexReader...) via
oas.update.DirectUpdateHandler2.mergeIndexes(MergeIndexesCommand).

So, if I want to merge multiple Solr directories in a way that respects
primary key uniqueness, is there any more efficient manner than re-adding
all of the documents in each directory to a new Solr index to avoid PK
duplicates?

Thanks.

--Gregg

Gregg Donovan
Senior Software Engineer, Etsy.com
gr...@etsy.com


indexVersion returns multiple results when called

2013-01-24 Thread davidq
Hi,

We have 5 core masters and 5 core slaves. The main core houses about 85,000
douments, so small, although the content of each document is quite large.
The second core holds the same number of docs but far less - and different -
data.

We reindex all cores every morning and the replication poll is 5 minutues.
The main core takes 15 minutes to reindex (optimize). At some point, an
incomplete index is picked up by the slave and our web site disappears until
the optimize takes place. I know we could increase the poll to 30 minutes
but that would be no guarantee.

Thought we'd sove it by writing a script to get the indexversion, kick off
reindexing and periodically check the current indexversion against the first
- if the same, sleep for 2 minutes and then check again. Once they're
different, do a fetchIndex from the slave.

Works on all the cores except the main one. We get a different indexversion
after two minutes, the slave gets populated with an almost empty index and
the site is out!

All the other cores exhibit the same indexversion. What have we
misunderstood or got wrong?

Regards,

David Q




--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexVersion-returns-multiple-results-when-called-tp4036046.html
Sent from the Solr - User mailing list archive at Nabble.com.


PK uniqueness aware Solr index merging?

2013-01-24 Thread FollowUp

FollowUp.cc Reminder



You received this email because gregg...@gmail.com set a public FollowUp.cc 
reminder
and it's the first time you've appeared on one (congrats, you have wise 
friends!).



3 Reasons why people use FollowUp.cc Reminders


- It removes the step of having to mark on your calendar when to follow up with 
someone

- Can be used with the BCC field so the recipient does not know it was set

- Forward emails you want to deal with later using a simple reminder




Currently, you will receive the reminder that was just set or this email thread:


Unsubscribe from just this reminder

 http://followup.cc/et.php?pref=threadaction=pref_setcid=192384msg_id=792402date_rem_sent=1359061873email=solr-user%40lucene.apache.orgutm_source=pref_emailutm_medium=emailutm_content=c192384utm_term=thread


Unsubscribe from all future FollowUp reminders


 http://followup.cc/et.php?pref=allaction=pref_setcid=192384msg_id=792402date_rem_sent=1359061873email=solr-user%40lucene.apache.orgutm_source=pref_emailutm_medium=emailutm_content=c192384utm_term=perm




Copyright 2013 FollowUp.cc | http://www.followup.cc | All rights reserved
   



Re: Submit schema definition using curl via SOLR

2013-01-24 Thread Fadi Mohsen
Thanks Per, would the first approach involve restarting Solr?

Thanks Mark, that's great, Ill try checkout and apply patches from ticket
to understand further.
The reason we would like to avoid Zookeeper are
 * due to lack of knowledge.
 * the amount of work/scripting for developers per module and release
documentation.
 * the extra steps of patching ZK nodes for QA and operations.

ZkCLI is a nice tool, but then instead of interacting with one service over
HTTP, the application needs:
 * extra jar files
 * know ZK hostname/IP and port (different in each
dev/qa/systest/accept/production environment), which is per module a one to
much configuration step.


On Thu, Jan 24, 2013 at 7:18 PM, Mark Miller markrmil...@gmail.com wrote:


 On Jan 24, 2013, at 10:02 AM, Fadi Mohsen fadi.moh...@gmail.com wrote:

  Hi, We would like to use Solr to index statistics from any Java module in
  our production environment.
 
  Applications have to can create collections and index data on demand, so
 my
  initial thought is to use different HTTP methods to accomplish a
 collection
  in cluster and then right away start HTTP POST documents, but the issue
  here is the schema.xml.
  Is it possible to HTTP POST the schema via Solr to Zookeeper?

 I've done some work towards this at
 https://issues.apache.org/jira/browse/SOLR-4193

 
  Or do I have to know about other service host/IP than SOLR, such as
  ZooKeeper (wanted to understand whether there is a way to avoid knowing
  about zookeeper in production.)?

 I wouldn't try to avoid it - it's probably simpler to deal with than you
 think.

 It's also pretty easy to use
 http://wiki.apache.org/solr/SolrCloud#Command_Line_Util to upload a new
 schema.xml - then just Collections API reload command. Two lines in a
 script.

 - Mark




AW: Does solr 4.1 support field compression?

2013-01-24 Thread André Widhani
These are the figures I got after indexing 4 and half million documents with 
both Solr 3.6.1 and 4.1.0 (and optimizing the index at the end).

  $ du -h --max-depth=1
  67G   ./solr410
  80G   ./solr361

Main contributor to the reduced space consumption is (as expected I guess) the 
.fdt file:

  $ ls -lh solr361/*/*/*.fdt
  29G solr361/core-tex68bohyrh23qs192adaq-index361/index/_bab.fdt

  $ ls -lh solr410/*/*/*.fdt
  18G solr410/core-tex68bohyz1teef3xsjdaw-index410/index/_23uy.fdt

Depends of course on your individual ratio of stored versus indexed-only fields.

André


Von: Shawn Heisey [s...@elyograg.org]
Gesendet: Donnerstag, 24. Januar 2013 16:58
An: solr-user@lucene.apache.org
Betreff: Re: Does solr 4.1 support field compression?

On 1/24/2013 8:42 AM, Ken Prows wrote:
 I didn't see any mention of field compression in the release notes for
 Solr 4.1. Did the ability to automatically compress fields end up
 getting added to this release?

The concept of compressed fields (an option in schema.xml) that existed
in the 1.x versions of Solr (based on Lucene 2.9) was removed in Lucene
3.0.  Because Lucene and Solr development were combined, the Solr
version after 1.4.1 is 3.1.0, there is no 1.5 or 2.x version of Solr.

Solr/Lucene 4.1 compresses all stored field data by default.  I don't
think there's a way to turn it off at the moment, which is causing
performance problems for a small subset of Solr users.  When it comes
out, Solr 4.2 will also have compressed term vectors.

The release note contains this text:

Stored fields are compressed. (See
http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene)

It looks like the solr CHANGES.txt file fails to specifically mention
LUCENE-4226 https://issues.apache.org/jira/browse/LUCENE-4226 which
implemented compressed stored fields.

Thanks,
Shawn



RE: Sorting on Score Problem

2013-01-24 Thread Kuai, Ben
Hi Hoss

Thanks for the reply. 

Unfortunately we have other customized similarity classes that I don’t know how 
to disable them and still make query work. 

I am trying to attach more information once I work out how to simply the issue.

Thanks
Ben

From: Chris Hostetter [hossman_luc...@fucit.org]
Sent: Thursday, January 24, 2013 12:34 PM
To: solr-user@lucene.apache.org
Subject: Re: Sorting on Score Problem

: We met a wired problem in our project when sorting by score in Solr 4.0,
: the biggest score document is not a the top the debug explanation from
: solr are like this,

that's weird ... can you post the full debugQuery output of a an example
query showing the problem, using echoParams=all  fl=id,score (or
whatever unique key field you have)

also: can you elaborate wether you are using a single node setup or a
distributed (ie: SolrCloud) query?

: Then we thought it could be a float rounding problem then we implement
: our own similarity class to increse queryNorm by 10,000 and it changes
: the score scale but the rank is still wrong.

when you post the details request above, please don't use your custom
similarity (just the out of the box solr code) so there's one less
variable in the equation.


-Hoss


Re: Submit schema definition using curl via SOLR

2013-01-24 Thread Mark Miller

On Jan 24, 2013, at 5:22 PM, Fadi Mohsen fadi.moh...@gmail.com wrote:
 
 The reason we would like to avoid Zookeeper are
 * due to lack of knowledge.
 * the amount of work/scripting for developers per module and release
 documentation.
 * the extra steps of patching ZK nodes for QA and operations.
 
 ZkCLI is a nice tool, but then instead of interacting with one service over
 HTTP, the application needs:
 * extra jar files

We should address this I think - it really shouldn't require anymore than the 
SolrJ jars. Currently it also requires the core jars. Still not as minimal as 
just curl posting I know.

Testing and reporting on the issue I posted, as well as discussion around 
expanding it, will likely help pushing those features forward.

- Mark



Re: solr parsed query dropping special chars

2013-01-24 Thread Chris Hostetter
: When I search for these characters in the admin query, I can only find the 
Greeks.
: debug shows the parsed query only has greek chars like omega, delta, sigma
: but does not contain others like degree, angle, cent, bullet, less_equal…

this is most likeley because of the analyzer you are using for your text 
field, an assumption which can be verified using the Analysis tool in the 
admin UI to see how the various pieces of your query analzer deal with the 
input.

My guess is you are using a tokenizer which ignores punctuation.

Don't foget to check your index analyzer as well -- you may not even be 
indexing these punctuation symbols either...

: the response dumps the document and  shows me the chars exist in the 
document..
: strangle (∠)/str

...that's the stored value, the *indexed* text may not contain those 
terms.


-Hoss

Re: Solr load balancer

2013-01-24 Thread Chris Hostetter

: For example perhaps a load balancer that sends multiple queries 
: concurrently to all/some replicas and only keeps the first response 
: might be effective. Or maybe a load balancer which takes account of the 

I know of other distributed query systems that use this approach, when 
query speed is more important to people then load and people who use them 
seem to think it works well.

given that it synthetically multiplies the load of each end user request, 
it's probably not something we'd want to turn on by default, but a 
configurable option certainly seems like it might be handy.


-Hoss


Re: Search strategy - improving search quality for short search terms such as doll

2013-01-24 Thread Chris Hostetter

: My next target is searches on simple terms such as doll which, in google,
: would return documents about, well, toy dolls, because that's the most
: common usage of the simple term doll. But in my index it predominantly
: returns documents about CDs with the song Doll Face, and My baby doll in
: them.

if you have good metdata about your documents, then you might get 
satisfing results using something like the edismax parser with appropriate 
weights on various fields -- you could for example say that matching 
on the product_title field is important, but matching on a category_name 
is much more important and thus use something like...

q=dollqf=product_title^5+category_name^50

..but that only helps you if you have category_name values that match the 
words people are searching for like Doll

This type of appoach doesn't help you in the case where you might have the 
inverse problem: document (category_name=doll, product_name=My baby) 
showing up first when a user searches for my baby doll but the user is 
really trying to find the document (category_name=cd, product_name=my 
baby doll)

it really all depends on your user base and the type of queries you 
expect.

An interesting solution to this problem that i've seen is to pre-process 
the query using a baysiean classifier to suggest which categories to boost 
on.

Here's a blog on this where the classifier was trained based on the 
keywords  categories of the documents...

http://engineering.wayfair.com/better-lucenesolr-searches-with-a-boost-from-an-external-naive-bayes-classifier/

...but you could also train the classifier using query logs and data about 
what documents users ultimately clicked on (to help you learn that for 
your userbase, people who search for baby are typically looking for CDs 
not dolls -- or vice versa)


: 
:  
: 
: I'm not directly asking how to solve this as much as I'm asking what
: direction I should be looking in to learn what I need to know to tackle the
: general issue myself.
: 
:  
: 
: Left on my own I would start looking at categorizing the CD's into a facet
: called music, reasonably doable in my dataset. Then I need to reduce the
: boost-value of the entire facet/category of music unless certain pre-defined
: query terms exist, such as [music, cd, song, listen, dvd, analyze actual
: user queries to come up with a more exhaustive list, etc.]. 
: 
:  
: 
: I don't yet know how to do all of this, but after a couple more good books I
: should be dangerous.
: 
:  
: 
: So the question to this list:
: 
:  
: 
: -  Am I on the right track here?  If not, can you point me in a
: direction to go?
: 
:  
: 
:  
: 
: 

-Hoss


RE: solr parsed query dropping special chars

2013-01-24 Thread Tegelberg, Allan
Thanks for the education Chris,  
I pasted the chars into  Index and Query fields on analyzer panel.

Index/Query Analyzers almost the same.. 
On both, non-greeks drop out after worddelimiterfilter
Index analyzer has grey background of words that seem to make it thru all the 
filters.

WhitespaceTokenizerFactory -  ∠ ψ Σ • ≤ ≠ • ≥ μ ω φ θ ¢ β √ Ω ° ± Δ #  
SynonymFilterFactory (query only) - ditto
StopFilterFactory- ditto
WordDelimiterFilterFactory  - ψ Σ μ ω φ θ β Ω Δ  now only greeks
LowerCaseFilterFactory  - ψ σ μ ω φ θ β ω δ  lower case Greeks only
SnowballPorterFilterFactory - ψ σ μ ω φ θ β ω δ

so I'm thinking I need to change the worddelimiterfilter properties  
{catenateWords=0, catenateNumbers=0, splitOnCaseChange=1, catenateAll=0, 
generateNumberParts=1, generateWordParts=1, splitOnNumerics=0}

or copy these strings into a different field name/type without word delimiter, 
that way I wouldn't affect any ways that existing text is being searched. 
Sound right?

Allan Tegelberg





-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Thursday, January 24, 2013 3:46 PM
To: solr-user@lucene.apache.org
Subject: Re: solr parsed query dropping special chars

: When I search for these characters in the admin query, I can only find the 
Greeks.
: debug shows the parsed query only has greek chars like omega, delta, sigma
: but does not contain others like degree, angle, cent, bullet, less_equal…

this is most likeley because of the analyzer you are using for your text field, 
an assumption which can be verified using the Analysis tool in the admin UI to 
see how the various pieces of your query analzer deal with the input.

My guess is you are using a tokenizer which ignores punctuation.

Don't foget to check your index analyzer as well -- you may not even be 
indexing these punctuation symbols either...

: the response dumps the document and  shows me the chars exist in the 
document..
: strangle (∠)/str

...that's the stored value, the *indexed* text may not contain those terms.


-Hoss


JSON query syntax

2013-01-24 Thread Yonik Seeley
Although lucene syntax tends to be quite concise, nice looking, and
easy to build by hand (the web browser is a major debugging tool for
me), some people prefer to use a more structured query language
that's easier to build up programmatically.  XML fits the bill, but
people tend to prefer JSON these days.

Hence my first quick prototype: https://issues.apache.org/jira/browse/SOLR-4351

I'm pretty happy so far with how easily it's fit in with our QParser
framework, which should generally allow parsers to not care about the
underlying syntax of queries they need to deal with.
For example: the join qparser uses the query specified by v, but
doesn't care of it's in lucene syntax, or if it was part of the JSON.

{'join':{'from':'qqq_s', 'to':'www_s', 'v':'id:10'}}
{'join':{'from':'qqq_s', 'to':'www_s', 'v':{'term':{'id':'10'

Note: replace the single quotes with double quotes before trying it
out - these are just test strings that have the replacement done in
the test code so that they are easier to read.

There's a fair bit left to do of course... like how to deal with
boost, cache, cost, parameter dereferencing, etc.
Feedback welcome... and hopefully this will be good to go for 4.2

-Yonik
http://lucidworks.com


Re: JSON query syntax

2013-01-24 Thread Otis Gospodnetic
Nice, Yonik!
Here is one suggestion. OK, I'm beginning you - please don't make
it be as hard on the eyes as Local Params. :)  I thought it was just me who
could never get along with Local Params, but I've learned that a number of
people find Local Params very hard to grok.  Yes, this is JSON, so right
there it may be better, but for instance I see v here which to a regular
human may not be as nice as value if that is what v stands for.
Looking at examples from the JIRA issue

{'frange':{'v':'mul(foo_i,2)', 'l':20,'u':24}}}


v is value?

mul is multiply?

what's l? left? No, low(er)?

what's u? Aha, upper?


I'd rather use a few extra character and be clear, easily memorable, and
user friendly.  People love ES's JSON API and I have never ever heard
anyone say it's too verbose.

Thanks,
Otis





On Thu, Jan 24, 2013 at 8:44 PM, Yonik Seeley yo...@lucidworks.com wrote:

 Although lucene syntax tends to be quite concise, nice looking, and
 easy to build by hand (the web browser is a major debugging tool for
 me), some people prefer to use a more structured query language
 that's easier to build up programmatically.  XML fits the bill, but
 people tend to prefer JSON these days.

 Hence my first quick prototype:
 https://issues.apache.org/jira/browse/SOLR-4351

 I'm pretty happy so far with how easily it's fit in with our QParser
 framework, which should generally allow parsers to not care about the
 underlying syntax of queries they need to deal with.
 For example: the join qparser uses the query specified by v, but
 doesn't care of it's in lucene syntax, or if it was part of the JSON.

 {'join':{'from':'qqq_s', 'to':'www_s', 'v':'id:10'}}
 {'join':{'from':'qqq_s', 'to':'www_s', 'v':{'term':{'id':'10'

 Note: replace the single quotes with double quotes before trying it
 out - these are just test strings that have the replacement done in
 the test code so that they are easier to read.

 There's a fair bit left to do of course... like how to deal with
 boost, cache, cost, parameter dereferencing, etc.
 Feedback welcome... and hopefully this will be good to go for 4.2

 -Yonik
 http://lucidworks.com



Re: JSON query syntax

2013-01-24 Thread Yonik Seeley
On Thu, Jan 24, 2013 at 8:55 PM, Otis Gospodnetic
otis.gospodne...@gmail.com wrote:
 Yes, this is JSON, so right
 there it may be better, but for instance I see v here which to a regular
 human may not be as nice as value if that is what v stands for.

One goal was to reuse the parsers/parameter names.  A completely
disjoint set would certainly lead to confusion.
Concise *common* abbreviations are fine I think - for example we
quickly get used to (and prefer) f(x) over function(variable1)

We could add some aliases though.

-Yonik
http://lucidworks.com


Re: Get tokenized words in Solr Response

2013-01-24 Thread Romita Saha
Hi Mikhail,

Thanks for your guidance. I found the required information in 
debugQuery=on.

Thanks and regards,
Romita 


From:   Mikhail Khludnev mkhlud...@griddynamics.com
To: solr-user solr-user@lucene.apache.org, 
Date:   01/24/2013 03:19 PM
Subject:Re: Get tokenized words in Solr Response



Romita,

IIRC you've already asked this, and I replied that everything what you 
need
is on debugQuery=on output. That format is a little bit verbose, and I
suppose you can experience some difficulties on finding the necessary info
there. Please provide debugQuery=on output, I can try to highlight the
necessary info for you.


On Thu, Jan 24, 2013 at 6:11 AM, Romita Saha
romita.s...@sg.panasonic.comwrote:

 Hi,

 I want the tokenized keywords to be displayed in solr response. As for
 example, my solr search could be Seach this document named XYZ-123. 
And
 the tokenizer in schema.xml tokenizes the query as follows:
 search documnent xyz 123. I want to get these tokenized words in the
 Solr response. Is it possible?

 Thanks and regards,
 Romita




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com



RE: SOLR 4 getting stuck during restart

2013-01-24 Thread vijeshnair
Thanks James for the heads up and apologies for a delayed response.Here's the
full details about this issue. Mine is an e-com app so the index contains
the product catalog comprising roughly 13million products. At this point I
thought of using the index based dictionary as the bet option for the Did
you Mean functionality. I am not sure if every one facing this issue, but
here is what I am observing as far as dictionary is concerned. 

Index based dictionary

- I was building the dictionary using the following url, once I completed
the full indexing. For the time being I have kept the buildOnCommit and
buildOnOptimize options intentionally to false, as I didn't want it to slow
down the full indexing.

http://localhost:8090/solr/select?rows=0spellcheck=truespellcheck.build=truespellcheck.dictionary=jarowinkler
 

- Once I created the dictionary when I tried to re-start my tomcat, I am
facing the issue which I have stated before (I was waiting for around 20mts,
the restart didn't happen).
- When I removed the dictionary from the data folder, the server restart
started working. 
- I have tried the spellcheck.collation=false as you suggested, but it
didn't help.

Direct Spell Checker

I have experimented with the new DirectSolrSpellChecker, where it does not
create a separate dictionary folder, rather build the spellchecker in the
main index itself. The results were exactly same as before, I was getting
stuck during the restarts. I think the traditional spellchecker would be
better in this case, as you can remove, restart and move back the dictionary
as and when required. Where in case of DirectSolrSpellChecker, it doesn't
create a separate dictionary folder, so not sure what to remove from the
index, so that server can restart.

James, I will request you to validate this, and it will be really great help
if you can point out if I am doing any mistakes here. If you think what I am
doing make sens, I will go ahead and log this bug in JIIRA.

Thanks
Vijesh K Nair



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-getting-stuck-during-restart-tp4034734p4036163.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr HTTP Replication Question

2013-01-24 Thread Amit Nithian
Okay so after some debugging I found the problem. While the replication
piece will download the index from the master server and move the files to
the index directory but during the commit phase, these older generation
files are deleted and the index is essentially left in tact.

I noticed that a full copy is needed if the index is stale (meaning that
files in common between the master and slave have different sizes) but also
I think a full copy should be needed if the slaves generation is higher
than the master as well. In short, to me it's not sufficient enough to
simply say a full copy is needed if the slave's index version is =
master's index version. I'll create a patch and file a bug along with a
more thorough writeup of how I got in this state.

Thanks!
Amit



On Thu, Jan 24, 2013 at 2:33 PM, Amit Nithian anith...@gmail.com wrote:

 Does Solr's replication look at the generation difference between master
 and slave when determining whether or not to replicate?

 To be more clear:
 What happens if a slave's generation is higher than the master yet the
 slave's index version is less than the master's index version?

 I looked at the source and didn't seem to see any reason why the
 generation matters other than fetching the file list from the master for a
 given generation. It's too wordy to explain how this happened so I'll go
 into details on that if anyone cares.

 Thanks!
 Amit