date:20130328

Hi,

I tested this config on Solr 4.2 this morning and it worked:
fieldType name=long class=solr.TrieLongField precisionStep=0
docValuesFormat=Disk positionIncrementGap=0/

field name=MMDDhh type=long indexed=true stored=true
required=true docValues=true multiValued=false /

I also loaded data and ran a sort  and looked at the heap with jvisualvm
and the longs were not loaded into the jvm's heap. The sort was also very
fast, although only on 600,000 records.

Possibly you are not on Solr 4.2? Can you post both your filedType
definition and your field definition?

Joel





On Thu, Mar 28, 2013 at 12:57 AM, adityab aditya_ba...@yahoo.com wrote:

 Hi Joel,
 you are correct, boost function populates the field cache. Well i am not
 aware of docValue, so while trying the example you provided i see the error
 when i define the field type

 Caused by: org.apache.solr.common.SolrException: FieldType 'dvLong' is
 configured with a docValues format, but the codec does not support it:
 class
 org.apache.solr.core.SolrCore$3
 at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:854)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:719)
 ... 13 more

 My field defination:
 fieldType name=dvLong class=solr.TrieLongField precisionStep=0
 positionIncrementGap=0 docValuesFormat=Disk/

 what am i missing here?

 thanks



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4051960.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Joel Bernstein
Professional Services LucidWorks

Re: Urgent:Solr cloud issue

2013-03-28 Thread anuj vats

Waiting for your assitence to get config entries for 3 server solr cloud setup..


Thanks in advance


Anuj From: anuj vatslt;vats_a...@rediffmail.comgt;Sent: Fri, 22 Mar 2013 
17:32:10 To: 
solr-user@lucene.apache.orglt;solr-user@lucene.apache.orggt;Cc: 
mayank...@gmail.comlt;mayank...@gmail.comgt;Subject: lt;Urgent:Solr cloud 
issuegt;
Hi Shawan,

I have seen your post on solr cloud Master-Master configuration on two servers. 
I have to use the same Solr structure, but from long I am not able to configure 
it to comunicate between two server, on single server it works fine.
Can you pls help me out to provide required config changes, so that SOLR can 
comunicate between two servers.

http://grokbase.com/t/lucene/solr-user/132pb1pe34/solrcloud-master-master

Regards
Anuj Vats
Get your own FREE website and domain with business email solutions, click here

Re: multicore vs multi collection

2013-03-28 Thread Jack Krupansky


Unable? In what way?

Did you look at the Solr example?

Did you look at solr.xml?

Did you see the core element? (Needs to be one per core/collection.)

Did you see the multicore directory in the example?

Did you look at the solr.xml file in multicore?

Did you see how there are separate directories for each collection/core in 
multicore?


Did you see how there is a core element in solr.xml in multicore, one for 
each collection directory (instance)?


Did you try setting up your own test directory parallel to multicore in 
example?


Did you read the README.txt files in the Solr example directories?

Did you see the command to start Solr with a specific Solr home 
directory? -


   java -Dsolr.solr.home=multicore -jar start.jar

Did you try that for your own test solr home directory created above?

So... what exactly was the problem you were encountering? Be specific.

My guess is that you simply need to re-read the README.txt files more 
carefully in the Solr example directories.


If you have questions about what the README.txt files say, please ask them, 
but please be specific.


-- Jack Krupansky

-Original Message- 
From: hupadhyay

Sent: Thursday, March 28, 2013 5:35 AM
To: solr-user@lucene.apache.org
Subject: Re: multicore vs multi collection

Does that means i can create multiple collections with different
configurations ?
can you please outline basic steps to create multiple collections,cause i am
not able to
create them on solr 4.0



--
View this message in context: 
http://lucene.472066.n3.nabble.com/multicore-vs-multi-collection-tp4051352p4052002.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to male Solr complex Join Query

2013-03-28 Thread ashimbose

Hi 

I need to do complex join in single core with multiple table. 

Like Inner , Outer, Left, Right and so on.

I am working with solr4.

Is there I can work with any type of join with solr4?

Is there any way to do so? Please give your suggestion, its very important.
Please help me..

Thanks in advance.

Ashim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-male-Solr-complex-Join-Query-tp4052023.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to male Solr complex Join Query

2013-03-28 Thread Karol Sikora


Hi Ashim,

You probably doing something in wrong way if You need using such a 
complex joins.

Remember that solr isn't relational database.
You should probably revisit Your schema and flatten Your data structure.

Regards,
Karol


W dniu 28.03.2013 13:45, ashimbose pisze:

Hi

I need to do complex join in single core with multiple table.

Like Inner , Outer, Left, Right and so on.

I am working with solr4.

Is there I can work with any type of join with solr4?

Is there any way to do so? Please give your suggestion, its very important.
Please help me..

Thanks in advance.

Ashim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-male-Solr-complex-Join-Query-tp4052023.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Urgent:Solr cloud issue

2013-03-28 Thread Tomás Fernández Löbbe

Could you give more details on what's not working? Have you followed the
instructions here: http://wiki.apache.org/solr/SolrCloud#Getting_Started
Are you using an embedded Zookeeper or an external server? How many of
them? Are you using numShards=1?2?

What do you see in the Solr UI, in the cloud section?

Tomás


On Thu, Mar 28, 2013 at 8:44 AM, anuj vats vats_a...@rediffmail.com wrote:

 Waiting for your assitence to get config entries for 3 server solr cloud
 setup..


 Thanks in advance


 Anuj From: anuj vatslt;vats_a...@rediffmail.comgt;Sent: Fri, 22 Mar
 2013 17:32:10 To: solr-user@lucene.apache.org
 lt;solr-user@lucene.apache.orggt;Cc: mayank...@gmail.com
 lt;mayank...@gmail.comgt;Subject: lt;Urgent:Solr cloud issuegt;
 Hi Shawan,

 I have seen your post on solr cloud Master-Master configuration on two
 servers. I have to use the same Solr structure, but from long I am not able
 to configure it to comunicate between two server, on single server it works
 fine.
 Can you pls help me out to provide required config changes, so that SOLR
 can comunicate between two servers.

 http://grokbase.com/t/lucene/solr-user/132pb1pe34/solrcloud-master-master

 Regards
 Anuj Vats
 Get your own FREE website and domain with business email solutions, click
 here

Re: Too many fields to Sort in Solr

Here is the field type definition. same as what you posted yesterday just a
different name. 

fieldType name=dvLong class=solr.TrieLongField precisionStep=0
docValuesFormat=Disk positionIncrementGap=0/ 

And Field Definition
field name=lcontNumOfDownloads type=dvLong indexed=true stored=true
default=0 docValues=true/


as soon as i restart the server i see the exception in log. removing the
*docValuesFormat=Disk* from the field type i don't see this exception. 

01:49:37,177 ERROR [org.apache.solr.core.CoreContainer]
(coreLoadExecutor-3-thread-1) Unable to create core: collection1:
org.apache.solr.common.SolrException: FieldType 'dvLong' is configured with
a docValues format, but the codec does not support it: class
org.apache.solr.core.SolrCore$3
at org.apache.solr.core.SolrCore.init(SolrCore.java:806)
[solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
at org.apache.solr.core.SolrCore.init(SolrCore.java:619)
[solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
at
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)
[solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
at
org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
[solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
[solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
[solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
[rt.jar:1.7.0_09]
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
[rt.jar:1.7.0_09]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
[rt.jar:1.7.0_09]
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
[rt.jar:1.7.0_09]
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
[rt.jar:1.7.0_09]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
[rt.jar:1.7.0_09]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
[rt.jar:1.7.0_09]
at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_09]
Caused by: org.apache.solr.common.SolrException: FieldType 'dvLong' is
configured with a docValues format, but the codec does not support it: class
org.apache.solr.core.SolrCore$3
at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:854)
[solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
at org.apache.solr.core.SolrCore.init(SolrCore.java:719)
[solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
... 13 more

01:49:37,202 ERROR [org.apache.solr.core.CoreContainer]
(coreLoadExecutor-3-thread-1) null:org.apache.solr.common.SolrException:
Unable to create core: collection1
at
org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1672)
at
org.apache.solr.core.CoreContainer.create(CoreContainer.java:1057)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.common.SolrException: FieldType 'dvLong' is
configured with a docValues format, but the codec does not support it: class
org.apache.solr.core.SolrCore$3
at org.apache.solr.core.SolrCore.init(SolrCore.java:806)
at org.apache.solr.core.SolrCore.init(SolrCore.java:619)
at
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)
at
org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
... 10 more
Caused by: org.apache.solr.common.SolrException: FieldType 'dvLong' is
configured with a docValues format, but the codec does not support it: class
org.apache.solr.core.SolrCore$3
at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:854)
at org.apache.solr.core.SolrCore.init(SolrCore.java:719)
... 13 more




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4052036.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Too many fields to Sort in Solr

OK, you'll need to re-index. Shutdown, delete the data, re-index.


On Thu, Mar 28, 2013 at 9:12 AM, adityab aditya_ba...@yahoo.com wrote:

 Here is the field type definition. same as what you posted yesterday just a
 different name.

 fieldType name=dvLong class=solr.TrieLongField precisionStep=0
 docValuesFormat=Disk positionIncrementGap=0/

 And Field Definition
 field name=lcontNumOfDownloads type=dvLong indexed=true
 stored=true
 default=0 docValues=true/


 as soon as i restart the server i see the exception in log. removing the
 *docValuesFormat=Disk* from the field type i don't see this exception.

 01:49:37,177 ERROR [org.apache.solr.core.CoreContainer]
 (coreLoadExecutor-3-thread-1) Unable to create core: collection1:
 org.apache.solr.common.SolrException: FieldType 'dvLong' is configured with
 a docValues format, but the codec does not support it: class
 org.apache.solr.core.SolrCore$3
 at org.apache.solr.core.SolrCore.init(SolrCore.java:806)
 [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
 at org.apache.solr.core.SolrCore.init(SolrCore.java:619)
 [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
 at
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)
 [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
 at
 org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
 [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
 at
 org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
 [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
 at
 org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
 [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 [rt.jar:1.7.0_09]
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 [rt.jar:1.7.0_09]
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 [rt.jar:1.7.0_09]
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 [rt.jar:1.7.0_09]
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 [rt.jar:1.7.0_09]
 at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 [rt.jar:1.7.0_09]
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 [rt.jar:1.7.0_09]
 at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_09]
 Caused by: org.apache.solr.common.SolrException: FieldType 'dvLong' is
 configured with a docValues format, but the codec does not support it:
 class
 org.apache.solr.core.SolrCore$3
 at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:854)
 [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
 at org.apache.solr.core.SolrCore.init(SolrCore.java:719)
 [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13]
 ... 13 more

 01:49:37,202 ERROR [org.apache.solr.core.CoreContainer]
 (coreLoadExecutor-3-thread-1) null:org.apache.solr.common.SolrException:
 Unable to create core: collection1
 at
 org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1672)
 at
 org.apache.solr.core.CoreContainer.create(CoreContainer.java:1057)
 at
 org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
 at
 org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: org.apache.solr.common.SolrException: FieldType 'dvLong' is
 configured with a docValues format, but the codec does not support it:
 class
 org.apache.solr.core.SolrCore$3
 at org.apache.solr.core.SolrCore.init(SolrCore.java:806)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:619)
 at
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)
 at
 org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
 ... 10 more
 Caused by: org.apache.solr.common.SolrException: FieldType 'dvLong' is
 configured with a docValues format, but the codec does not support it:
 class
 org.apache.solr.core.SolrCore$3
 at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:854)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:719)
 ... 13 more




 --
 View

Re: [ANNOUNCE] Solr wiki editing change

2013-03-28 Thread Andy Lester


On Mar 24, 2013, at 10:18 PM, Steve Rowe sar...@gmail.com wrote:

 The wiki at http://wiki.apache.org/solr/ has come under attack by spammers 
 more frequently of late, so the PMC has decided to lock it down in an attempt 
 to reduce the work involved in tracking and removing spam.
 
 From now on, only people who appear on 
 http://wiki.apache.org/solr/ContributorsGroup will be able to 
 create/modify/delete wiki pages.
 
 Please request either on the solr-user@lucene.apache.org or on 
 d...@lucene.apache.org to have your wiki username added to the 
 ContributorsGroup page - this is a one-time step.


Please add my username, AndyLester, to the approved editors list.  Thanks.

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance

Re: [ANNOUNCE] Solr wiki editing change

2013-03-28 Thread Steve Rowe

On Mar 28, 2013, at 9:25 AM, Andy Lester a...@petdance.com wrote:
 On Mar 24, 2013, at 10:18 PM, Steve Rowe sar...@gmail.com wrote:
 Please request either on the solr-user@lucene.apache.org or on 
 d...@lucene.apache.org to have your wiki username added to the 
 ContributorsGroup page - this is a one-time step.
 
 Please add my username, AndyLester, to the approved editors list.  Thanks.

Added to solr ContributorsGroup.

Is deltaQuery mandatory ?

Is deltaQuery mandatory in data-config.xml ?

I did it like this :
entity name=residential query=select * from tsunami.consumer_data_01 where 
state='MA' and  rownum = 5000
                deltaQuery=select  LEMSMATCHCODE, STREETNAME from residential 
where last_modified  '${dataimporter.last_index_time}'
 
Then my manager come and said we don't need it, this is only for incremental.
I took off the line that start with deltaQuery, now in :

http://localhost:8983/solr/#/db/dataimport//dataimport


entity is empty, when I click the button Exwcute, nothing happened,

thanks.

Re: Querying a transitive closure?

2013-03-28 Thread Jack Park

Thank you for this. I had thought about it but reasoned in a naive
way: who would do such a thing?

Doing so makes the query local: once the object has been retrieved, no
further HTTP queries are required. Implementation perhaps entails one
request to fetch the presumed parent in order to harvest its
transitive closure.  I need to think about that.

Many thanks
Jack

On Thu, Mar 28, 2013 at 5:06 AM, Jens Grivolla j+...@grivolla.net wrote:
 Exactly, you should usually design your schema to fit your queries, and if
 you need to retrieve all ancestors then you should index all ancestors so
 you can query for them easily.

 If that doesn't work for you then either Solr is not the right tool for the
 job, or you need to rethink your schema.

 The description of doing lookups within a tree structure doesn't sound at
 all like what you would use a text retrieval engine for, so you might want
 to rethink why you want to use Solr for this. But if that transitive
 closure is something you can calculate at indexing time then the correct
 solution is the one Upayavira provided.

 If you want people to be able to help you you need to actually describe your
 problem (i.e. what is my data, and what are my queries) instead of diving
 into technical details like reducing HTTP roundtrips. My guess is that if
 you need to reduce HTTP roundtrips you're probably doing it wrong.

 HTH,
 Jens


 On 03/28/2013 08:15 AM, Upayavira wrote:

 Why don't you index all ancestor classes with the document, as a
 multivalued field, then you could get it in one hit. Am I missing
 something?

 Upayavira

 On Thu, Mar 28, 2013, at 01:59 AM, Jack Park wrote:

 Hi Otis,
 That's essentially the answer I was looking for: each shard (are we
 talking master + replicas?) has the plug-in custom query handler.  I
 need to build it to find out.

 What I mean is that there is a taxonomy, say one with a single root
 for sake of illustration, which grows all the classes, subclasses, and
 instances. If I have an object that is somewhere in that taxonomy,
 then it has a zigzag chain of parents up that tree (I've seen that
 called a transitive closure. If class B is way up that tree from M,
 no telling how many queries it will take to find it.  Hmmm...
 recursive ascent, I suppose.

 Many thanks
 Jack

 On Wed, Mar 27, 2013 at 6:52 PM, Otis Gospodnetic
 otis.gospodne...@gmail.com wrote:

 Hi Jack,

 I don't fully understand the exact taxonomy structure and your needs,
 but in terms of reducing the number of HTTP round trips, you can do it
 by writing a custom SearchComponent that, upon getting the initial
 request, does everything locally, meaning that it talks to the
 local/specified shard before returning to the caller.  In SolrCloud
 setup with N shards, each of these N shards could be queried in such a
 way in parallel, running query/queries on their local shards.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Wed, Mar 27, 2013 at 3:11 PM, Jack Park jackp...@topicquests.org
 wrote:

 Hi Otis,

 I fully expect to grow to SolrCloud -- many shards. For now, it's
 solo. But, my thinking relates to cloud. I look for ways to reduce the
 number of HTTP round trips through SolrJ. Maybe you have some ideas?

 Thanks
 Jack

 On Wed, Mar 27, 2013 at 10:04 AM, Otis Gospodnetic
 otis.gospodne...@gmail.com wrote:

 Hi Jack,

 Is this really about HTTP and Solr vs. SolrCloud or more whether
 Solr(Cloud) is the right tool for the job and if so how to structure
 the schema and queries to make such lookups efficient?

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Wed, Mar 27, 2013 at 12:53 PM, Jack Park jackp...@topicquests.org
 wrote:

 This is a question about isA?

 We want to know if M isA B   isA?(M,B)

 For some M, one might be able to look into M to see its type or which
 class(es) for which it is a subClass. We're talking taxonomic queries
 now.
 But, for some M, one might need to ripple up the transitive
 closure,
 looking at all the super classes, etc, recursively.

 It seems unreasonable to do that over HTTP; it seems more reasonable
 to grab a core and write a custom isA query handler. But, how do you
 do that in a SolrCloud?

 Really curious...

 Many thanks in advance for ideas.
 Jack

RE: Is deltaQuery mandatory ?

2013-03-28 Thread Swati Swoboda

No, it's not mandatory. You can't do delta imports without delta queries 
though; you'd need to do a full-import. Per your query, you'd only ever do 
objects with rownum=5000.

-Original Message-
From: A. Lotfi [mailto:majidna...@yahoo.com] 
Sent: Thursday, March 28, 2013 10:07 AM
To: gene...@lucene.apache.org; solr-user@lucene.apache.org
Subject: Is deltaQuery mandatory ?

Is deltaQuery mandatory in data-config.xml ?

I did it like this :
entity name=residential query=select * from tsunami.consumer_data_01 where 
state='MA' and  rownum = 5000
                deltaQuery=select  LEMSMATCHCODE, STREETNAME from residential 
where last_modified  '${dataimporter.last_index_time}'

Then my manager come and said we don't need it, this is only for incremental.
I took off the line that start with deltaQuery, now in :

http://localhost:8983/solr/#/db/dataimport//dataimport

entity is empty, when I click the button Exwcute, nothing happened,

thanks.

RE: SOLR - Unable to execute query error - DIH

2013-03-28 Thread Dyer, James

You may want to run your jdbc driver in trace mode just to see if it is picking 
up these different options.  I know from experience that the selectMethod 
parameter can sometimes be important to prevent SQLServer drivers from caching 
the entire resultset in memory.  

But something seems very wrong here and maybe driver tuning is really not what 
you need.  18 minutes to index 500 documents is extreme.  Unless the documents 
were huge or you were doing very unusual, I'd expect this to happen in seconds 
(1 second?).  Are you indexing on a Raspberry Pi?

Possibly, you have a cartesian join somewhere in your sql, or some other little 
mistake?  If you post your entire data-config.xml possibly someone will see the 
error.  Or, could you be extremely memory constrained because of bad JVM heap 
choices?  Do your logs show you the jvm constantly in GC cycles?

Just a little note:  batchSize goes on the dataSource / tag, not on document 
/.  I really don't think tweaking batchSize is going to fix this though.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: kobe.free.wo...@gmail.com [mailto:kobe.free.wo...@gmail.com] 
Sent: Thursday, March 28, 2013 1:43 AM
To: solr-user@lucene.apache.org
Subject: RE: SOLR - Unable to execute query error - DIH

Thanks James.

We have tried the following options *(individually)* including the one you
suggested,

1.selectMethod=cursor 
2. batchSize=-1
3.responseBuffering=adaptive

But the indexing process doesn't seem to be improving at all. When we try to
index set of 500 rows it works well gets completed in 18 min. For 1000K rows
it took 22 hours (long) for indexing. But, when we try to index the complete
set of 750K rows it doesn't show any progress and keeps on executing.

Currently both the SQL server as well as the SOLR machine is running on 4 GB
RAM. With this configuration does the above scenario stands justified? If we
think of upgrading the RAM, which machine should that be, the SOLR machine
or the SQL Server machine?

Are there any other efficient methods to import/ index data from SQL Server
to SOLR?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Unable-to-execute-query-error-DIH-tp4051028p4051981.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Is deltaQuery mandatory ?

2013-03-28 Thread Dyer, James

You do not need deltaQuery unless you're doing delta (incremental) updates.  
To configure a full import, try starting with this example:

http://wiki.apache.org/solr/DataImportHandler#A_shorter_data-config

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: A. Lotfi [mailto:majidna...@yahoo.com] 
Sent: Thursday, March 28, 2013 9:07 AM
To: gene...@lucene.apache.org; solr-user@lucene.apache.org
Subject: Is deltaQuery mandatory ?

Is deltaQuery mandatory in data-config.xml ?

I did it like this :
entity name=residential query=select * from tsunami.consumer_data_01 where 
state='MA' and  rownum = 5000
                deltaQuery=select  LEMSMATCHCODE, STREETNAME from residential 
where last_modified  '${dataimporter.last_index_time}'
 
Then my manager come and said we don't need it, this is only for incremental.
I took off the line that start with deltaQuery, now in :

http://localhost:8983/solr/#/db/dataimport//dataimport


entity is empty, when I click the button Exwcute, nothing happened,

thanks.

RE: SOLR - Unable to execute query error - DIH

2013-03-28 Thread Swati Swoboda

What version of Solr4 are you running? We are on 3.6.2 so I can't be confident 
whether these settings still exist (they probably do...), but here is what we 
do to speed up full-indexing:

In solrconfig.xml, increase your ramBufferSize to 128MB.
Increase mergeFactor to 20.
Make sure autoCommit is disabled.

Basically, you want to minimize how often Lucene/Solr flushes (as that is very 
time consuming). Merging is also very time consuming, so you want large 
segments and fewer merges (hence the merge factor increase). We use these 
settings when we are doing our initial full-indexing and then switch them over 
to saner defaults do our regular/delta indexing.

Roll-backs concern me; why did your query roll back? Did it give an error -- it 
should have. Should be in your solr log file. Was it because the connection 
timed out? It's important to find out. We prevented roll backs by effectively 
splitting our data across entities and then indexing one-entity at a time. This 
allowed us to make sure that if one sector failed, it didn't impact the 
entire process. (This can be done by using autoCommit, but that slows down 
indexing.) 

If you're getting OOM errors, be sure that your Xmx value is set high enough 
(and that you have enough memory). You may be able to increase ramBufferSize 
depending on how much memory you had (we didn't have much). 

Hope this helps.
Swati


-Original Message-
From: kobe.free.wo...@gmail.com [mailto:kobe.free.wo...@gmail.com] 
Sent: Thursday, March 28, 2013 2:43 AM
To: solr-user@lucene.apache.org
Subject: RE: SOLR - Unable to execute query error - DIH

Thanks James.

We have tried the following options *(individually)* including the one you 
suggested,

1.selectMethod=cursor 
2. batchSize=-1
3.responseBuffering=adaptive

But the indexing process doesn't seem to be improving at all. When we try to 
index set of 500 rows it works well gets completed in 18 min. For 1000K rows it 
took 22 hours (long) for indexing. But, when we try to index the complete set 
of 750K rows it doesn't show any progress and keeps on executing.

Currently both the SQL server as well as the SOLR machine is running on 4 GB 
RAM. With this configuration does the above scenario stands justified? If we 
think of upgrading the RAM, which machine should that be, the SOLR machine or 
the SQL Server machine?

Are there any other efficient methods to import/ index data from SQL Server to 
SOLR?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Unable-to-execute-query-error-DIH-tp4051028p4051981.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.2 - Slave Index version is higher than Master

2013-03-28 Thread Uomesh

Yes, Only thing is, on master delta import is running every 1/2 hour but as
there is no data change in last 24 hour i think index version still remains
same. another thing i notice is after full import index Gen is bumped to
directly higher than slave. Can that means Master is not increasing Version
and Gen with delta-import correctly? See as below.

*Before Full Import*

Master:
1364331607690
154
88.28 KB
Slave:
1364395321127
241
98.75 KB

*After Full Import*
Master:
1364395566324
242
88.28 KB
Slave:
1364395321127
241
98.75 KB

On Tue, Mar 26, 2013 at 1:05 PM, Mark Miller-3 [via Lucene] 
ml-node+s472066n4051477...@n3.nabble.com wrote:

 That's pretty interesting. The slave should have no way of doing this
 without a commit…

 - Mark

 On Mar 26, 2013, at 11:07 AM, Uomesh [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4051477i=0
 wrote:

  Hi Mark,
 
  Further details: My master details has not changed since last 24 hours
 but
  Slave index version and Gen has increased. If i do the full import slave
  is replicated and Version and Gen is reset.
Version   GenSize
  Master:
  1364238678758
  111
  768.23 KB
  Slave:
  1364299206396
  155
  768.02 KB
 
 
 
  On Fri, Mar 22, 2013 at 3:32 PM, Mark Miller-3 [via Lucene] 
  [hidden email] http://user/SendEmail.jtp?type=nodenode=4051477i=1
 wrote:
 
  That was to you Phil.
 
  So it seems this is a problem with the configuration replication case I
  would guess - I didn't really look at that path in the 4.2 fixes I
 worked
  on.
 
  I did add it to the new testing I'm doing since I've suspected it (it
 will
  prompt a core reload that doesn't happen when configs don't replicate).
  I'll see what I can do to try and get a test to catch it.
 
  - mark
 
  On Mar 22, 2013, at 1:49 PM, Mark Miller [hidden email]
 http://user/SendEmail.jtp?type=nodenode=4050577i=0
  wrote:
 
  And your also on 4.2?
 
  - Mark
 
  On Mar 22, 2013, at 12:41 PM, Uomesh [hidden email]
 http://user/SendEmail.jtp?type=nodenode=4050577i=1
  wrote:
 
  Also, I am replicating only on commit and startup.
 
  Thanks,
  Umesh
 
  On Fri, Mar 22, 2013 at 11:23 AM, Umesh Sharma [hidden email]
 http://user/SendEmail.jtp?type=nodenode=4050577i=2
  wrote:
 
  Hi Mrk,
 
  I am replicating below config files but not replicating
  solrconfig.xml.
 
  confFiles: schema.xml, elevate.xml, stopwords.txt,
  mapping-FoldToASCII.txt, mapping-ISOLatin1Accent.txt, protwords.txt,
  spellings.txt, synonyms.txt
 
 
  also strange I am seeing big Gen difference between Master and
 slave.
  My
  master slave is 2 while Slave is 56. If i do the full import then
 the
  Gen
  is getting higher then slave one and its replicating. i have more
 than
  30
  cores on my solr instance and all are scheduled to replicate on same
  time.
 
  Index Version Gen Size Master:
  1363903243590
  2
  94 bytes
  Slave:
  1363967579193
  56
  94 bytes
 
  Thanks,
  Umesh
 
 
  On Fri, Mar 22, 2013 at 10:42 AM, Mark Miller-3 [via Lucene] 
  [hidden email] http://user/SendEmail.jtp?type=nodenode=4050577i=3

  wrote:
 
  Are you replicating configuration files as well?
 
  - Mark
 
  On Mar 22, 2013, at 6:38 AM, John, Phil (CSS) [hidden email]
  http://user/SendEmail.jtp?type=nodenode=4050075i=0
  wrote:
 
  To add to the discussion.
 
  We're running classic master/slave replication (not solrcloud)
 with
  1
  master and 2 slaves and I noticed the slave having a higher version
  number
  than the master the other day as well.
 
  In our case, knock on wood, it hasn't stopped replication.
 
  If you'd like a copy of our config I can provide off-list.
 
  Regards,
 
  Phil.
 
  
 
  From: Mark Miller [mailto:[hidden email]
  http://user/SendEmail.jtp?type=nodenode=4050075i=1]
 
  Sent: Fri 22/03/2013 06:32
  To: [hidden email]
  http://user/SendEmail.jtp?type=nodenode=4050075i=2
  Subject: Re: Solr 4.2 - Slave Index version is higher than Master
 
 
 
  The other odd thing here is that this should not stop replication
 at
  all. When the slave is ahead, it will still have it's index
 replaced.
 
  - Mark
 
  On Mar 22, 2013, at 1:26 AM, Mark Miller [hidden email]
  http://user/SendEmail.jtp?type=nodenode=4050075i=3
  wrote:
 
  I'm working on testing to try and catch what you are seeing here:
  https://issues.apache.org/jira/browse/SOLR-4629
 
  - Mark
 
  On Mar 22, 2013, at 12:23 AM, Mark Miller [hidden email]
  http://user/SendEmail.jtp?type=nodenode=4050075i=4
  wrote:
 
  Let me know if there is anything else you can add.
 
  A test with your setup that index n docs randomly, commits,
  randomly
  updates a conf file or not, and then replicates and repeats x times
  does
  not seem to fail, even with very high values for n and x. On every
  replication, the versions are compared.
 
  Is there anything else you are putting into this mix?
 
  - Mark
 
  On Mar 21, 2013, at 11:28 PM, Uomesh [hidden email]
  http://user/SendEmail.jtp?type=nodenode=4050075i=5
  wrote:

Re: Too many fields to Sort in Solr

still no luck

Performed.
1. Stop the Application Server (JBoss)
2. Deleted everything under data
3. Star the server 
4. Observe exception in log (i have uploaded the file) 

on a side note. do i need to have any additional jar files in the solr home
lib folder. currently its empty. 


docValueException.log
http://lucene.472066.n3.nabble.com/file/n4052070/docValueException.log  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4052070.html
Sent from the Solr - User mailing list archive at Nabble.com.

Batch Search Query

Hello. My company is currently thinking of switching over to Solr 4.2,
coming off of SQL Server. However, what we need to do is a bit weird.

Right now, we have ~12 million segments and growing. Usually these are
sentences but can be other things. These segments are what will be stored
in Solr. I’ve already done that.

Now, what happens is a user will upload say a word document to us. We then
parse it and process it into segments. It very well could be 5000 segments
or even more in that word document. Each one of those ~5000 segments needs
to be searched for similar segments in solr. I’m not quite sure how I will
do the query (whether proximate or something else). The point though, is to
get back similar results for each segment.

However, I think I’m seeing a bigger problem first. I have to search
against ~5000 segments. That would be 5000 http requests. That’s a lot! I’m
pretty sure that would take a LOT of hardware. Keep in mind this could be
happening with maybe 4 different users at once right now (and of course
more in the future). Is there a good way to send a batch query over one (or
at least a lot fewer) http requests?

If not, what kinds of things could I do to implement such a feature (if
feasible, of course)?


Thanks,

Mike

Re: Batch Search Query

2013-03-28 Thread Timothy Potter

Hi Mike,

Interesting problem - here's some pointers on where to get started.

For finding similar segments, check out Solr's More Like This support -
it's built in to the query request processing so you just need to enable it
with query params.

There's nothing built in for doing batch queries from the client side. You
might look into implementing a custom search component and register it as a
first-component in your search handler (take a look at solrconfig.xml for
how search handlers are configured, e.g. /browse).

Cheers,
Tim


On Thu, Mar 28, 2013 at 9:43 AM, Mike Haas mikehaas...@gmail.com wrote:

 Hello. My company is currently thinking of switching over to Solr 4.2,
 coming off of SQL Server. However, what we need to do is a bit weird.

 Right now, we have ~12 million segments and growing. Usually these are
 sentences but can be other things. These segments are what will be stored
 in Solr. I’ve already done that.

 Now, what happens is a user will upload say a word document to us. We then
 parse it and process it into segments. It very well could be 5000 segments
 or even more in that word document. Each one of those ~5000 segments needs
 to be searched for similar segments in solr. I’m not quite sure how I will
 do the query (whether proximate or something else). The point though, is to
 get back similar results for each segment.

 However, I think I’m seeing a bigger problem first. I have to search
 against ~5000 segments. That would be 5000 http requests. That’s a lot! I’m
 pretty sure that would take a LOT of hardware. Keep in mind this could be
 happening with maybe 4 different users at once right now (and of course
 more in the future). Is there a good way to send a batch query over one (or
 at least a lot fewer) http requests?

 If not, what kinds of things could I do to implement such a feature (if
 feasible, of course)?


 Thanks,

 Mike

Re: [ANNOUNCE] Solr wiki editing change

2013-03-28 Thread Jilal Oussama

Please add OussamaJilal to the group.

Thank you.


2013/3/28 Steve Rowe sar...@gmail.com

 On Mar 28, 2013, at 9:25 AM, Andy Lester a...@petdance.com wrote:
  On Mar 24, 2013, at 10:18 PM, Steve Rowe sar...@gmail.com wrote:
  Please request either on the solr-user@lucene.apache.org or on
 d...@lucene.apache.org to have your wiki username added to the
 ContributorsGroup page - this is a one-time step.
 
  Please add my username, AndyLester, to the approved editors list.
  Thanks.

 Added to solr ContributorsGroup.

Re: Too many fields to Sort in Solr

Update ---

I was able to fix the exception by adding following line in solrconfig.xml

codecFactory name=CodecFactory class=solr.SchemaCodecFactory /

Not sure if its mentioned in any document to have this declared in config
file. 
I am now re-indexing and data on the master and will perform test to see if
it works as expected. 

thanks for your support. 

Aditya 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4052091.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: [ANNOUNCE] Solr wiki editing change

2013-03-28 Thread Steve Rowe

On Mar 28, 2013, at 11:57 AM, Jilal Oussama jilal.ouss...@gmail.com wrote:
 Please add OussamaJilal to the group.

Added to solr ContributorsGroup.

Re: Solr and OpenPipe

2013-03-28 Thread Fabio Curti

git clone https://github.com/kolstae/openpipe
cd openpipe
mvn install

regards



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-and-OpenPipe-tp484777p4052079.html
Sent from the Solr - User mailing list archive at Nabble.com.

bootstrap_conf without restarting

2013-03-28 Thread jimtronic

I'm doing fairly frequent changes to my data-config.xml files on some of my
cores in a solr cloud setup. Is there anyway to to get these files active
and up to Zookeeper without restarting the instance?

I've noticed that if I just launch another instance of solr with the
bootstrap_conf flag set to true, it uploads the new settings, but it dies
because there's already a solr instance running on that port. It also seems
to make the original one unresponsive or at least down in zookeeper's
eyes. I then just restart that instance and everything is back up. It'd be
nice if I could bootstrap without actually starting solr.

What's the best practice for deploying changes to data-config.xml?

Thanks, Jim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/bootstrap-conf-without-restarting-tp4052092.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Batch Search Query

2013-03-28 Thread Roman Chyla

Apologies if you already do something similar, but perhaps of general
interest...

One (different approach) to your problem is to implement a local
fingerprint - if you want to find documents with overlapping segments, this
algorithm will dramatically reduce the number of segments you create/search
for every document

http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf

Then you simply end up indexing each document, and upon submission:
computing fingerprints and querying for them. I don't know (ie. remember)
exact numbers, but my feeling is that you end up storing ~13% of document
text (besides, it is a one token fingerprint, therefore quite fast to
search for - you could even try one huge boolean query with 1024 clauses,
ouch... :))

roman

On Thu, Mar 28, 2013 at 11:43 AM, Mike Haas mikehaas...@gmail.com wrote:

 Hello. My company is currently thinking of switching over to Solr 4.2,
 coming off of SQL Server. However, what we need to do is a bit weird.

 Right now, we have ~12 million segments and growing. Usually these are
 sentences but can be other things. These segments are what will be stored
 in Solr. I’ve already done that.

 Now, what happens is a user will upload say a word document to us. We then
 parse it and process it into segments. It very well could be 5000 segments
 or even more in that word document. Each one of those ~5000 segments needs
 to be searched for similar segments in solr. I’m not quite sure how I will
 do the query (whether proximate or something else). The point though, is to
 get back similar results for each segment.

 However, I think I’m seeing a bigger problem first. I have to search
 against ~5000 segments. That would be 5000 http requests. That’s a lot! I’m
 pretty sure that would take a LOT of hardware. Keep in mind this could be
 happening with maybe 4 different users at once right now (and of course
 more in the future). Is there a good way to send a batch query over one (or
 at least a lot fewer) http requests?

 If not, what kinds of things could I do to implement such a feature (if
 feasible, of course)?


 Thanks,

 Mike

Re: Solr and OpenPipe

2013-03-28 Thread Giovanni Bricconi

Bella lì!
vedo che ci divertiamo
 Il giorno 28/mar/2013 17:11, Fabio Curti fabio.cu...@gmail.com ha
scritto:

 git clone https://github.com/kolstae/openpipe
 cd openpipe
 mvn install

 regards



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-and-OpenPipe-tp484777p4052079.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Batch Search Query

Thanks for your reply, Roman. Unfortunately, the business has been running
this way forever so I don't think it would be feasible to switch to a whole
document store versus segments store. Even then, if I understand you
correctly it would not work for our needs. I'm thinking because we don't
care about any other parts of the document, just the segment. If a similar
segment is in an entirely different document, we want that segment.

I'll keep taking any and all feedback however so that I can develop an idea
and present it to my manager.


On Thu, Mar 28, 2013 at 11:16 AM, Roman Chyla roman.ch...@gmail.com wrote:

 Apologies if you already do something similar, but perhaps of general
 interest...

 One (different approach) to your problem is to implement a local
 fingerprint - if you want to find documents with overlapping segments, this
 algorithm will dramatically reduce the number of segments you create/search
 for every document

 http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf

 Then you simply end up indexing each document, and upon submission:
 computing fingerprints and querying for them. I don't know (ie. remember)
 exact numbers, but my feeling is that you end up storing ~13% of document
 text (besides, it is a one token fingerprint, therefore quite fast to
 search for - you could even try one huge boolean query with 1024 clauses,
 ouch... :))

 roman

 On Thu, Mar 28, 2013 at 11:43 AM, Mike Haas mikehaas...@gmail.com wrote:

  Hello. My company is currently thinking of switching over to Solr 4.2,
  coming off of SQL Server. However, what we need to do is a bit weird.
 
  Right now, we have ~12 million segments and growing. Usually these are
  sentences but can be other things. These segments are what will be stored
  in Solr. I’ve already done that.
 
  Now, what happens is a user will upload say a word document to us. We
 then
  parse it and process it into segments. It very well could be 5000
 segments
  or even more in that word document. Each one of those ~5000 segments
 needs
  to be searched for similar segments in solr. I’m not quite sure how I
 will
  do the query (whether proximate or something else). The point though, is
 to
  get back similar results for each segment.
 
  However, I think I’m seeing a bigger problem first. I have to search
  against ~5000 segments. That would be 5000 http requests. That’s a lot!
 I’m
  pretty sure that would take a LOT of hardware. Keep in mind this could be
  happening with maybe 4 different users at once right now (and of course
  more in the future). Is there a good way to send a batch query over one
 (or
  at least a lot fewer) http requests?
 
  If not, what kinds of things could I do to implement such a feature (if
  feasible, of course)?
 
 
  Thanks,
 
  Mike

Re: Batch Search Query

2013-03-28 Thread Walter Underwood

This might not be a good match for Solr, or for many other systems. It does 
seem like a natural fit for MarkLogic. That natively searches and selects over 
XML documents.

Disclaimer: I worked at MarkLogic for a couple of years.

wunder

On Mar 28, 2013, at 9:27 AM, Mike Haas wrote:

 Thanks for your reply, Roman. Unfortunately, the business has been running
 this way forever so I don't think it would be feasible to switch to a whole
 document store versus segments store. Even then, if I understand you
 correctly it would not work for our needs. I'm thinking because we don't
 care about any other parts of the document, just the segment. If a similar
 segment is in an entirely different document, we want that segment.
 
 I'll keep taking any and all feedback however so that I can develop an idea
 and present it to my manager.
 
 
 On Thu, Mar 28, 2013 at 11:16 AM, Roman Chyla roman.ch...@gmail.com wrote:
 
 Apologies if you already do something similar, but perhaps of general
 interest...
 
 One (different approach) to your problem is to implement a local
 fingerprint - if you want to find documents with overlapping segments, this
 algorithm will dramatically reduce the number of segments you create/search
 for every document
 
 http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf
 
 Then you simply end up indexing each document, and upon submission:
 computing fingerprints and querying for them. I don't know (ie. remember)
 exact numbers, but my feeling is that you end up storing ~13% of document
 text (besides, it is a one token fingerprint, therefore quite fast to
 search for - you could even try one huge boolean query with 1024 clauses,
 ouch... :))
 
 roman
 
 On Thu, Mar 28, 2013 at 11:43 AM, Mike Haas mikehaas...@gmail.com wrote:
 
 Hello. My company is currently thinking of switching over to Solr 4.2,
 coming off of SQL Server. However, what we need to do is a bit weird.
 
 Right now, we have ~12 million segments and growing. Usually these are
 sentences but can be other things. These segments are what will be stored
 in Solr. I’ve already done that.
 
 Now, what happens is a user will upload say a word document to us. We
 then
 parse it and process it into segments. It very well could be 5000
 segments
 or even more in that word document. Each one of those ~5000 segments
 needs
 to be searched for similar segments in solr. I’m not quite sure how I
 will
 do the query (whether proximate or something else). The point though, is
 to
 get back similar results for each segment.
 
 However, I think I’m seeing a bigger problem first. I have to search
 against ~5000 segments. That would be 5000 http requests. That’s a lot!
 I’m
 pretty sure that would take a LOT of hardware. Keep in mind this could be
 happening with maybe 4 different users at once right now (and of course
 more in the future). Is there a good way to send a batch query over one
 (or
 at least a lot fewer) http requests?
 
 If not, what kinds of things could I do to implement such a feature (if
 feasible, of course)?
 
 
 Thanks,
 
 Mike
 
 

--
Walter Underwood
wun...@wunderwood.org

Re: Batch Search Query

Thanks Timothy,

In regards to you mentioning using MoreLikeThis, do you know what kind of
algorithm it uses? My searching didn't reveal anything.


On Thu, Mar 28, 2013 at 10:51 AM, Timothy Potter thelabd...@gmail.comwrote:

 Hi Mike,

 Interesting problem - here's some pointers on where to get started.

 For finding similar segments, check out Solr's More Like This support -
 it's built in to the query request processing so you just need to enable it
 with query params.

 There's nothing built in for doing batch queries from the client side. You
 might look into implementing a custom search component and register it as a
 first-component in your search handler (take a look at solrconfig.xml for
 how search handlers are configured, e.g. /browse).

 Cheers,
 Tim


 On Thu, Mar 28, 2013 at 9:43 AM, Mike Haas mikehaas...@gmail.com wrote:

  Hello. My company is currently thinking of switching over to Solr 4.2,
  coming off of SQL Server. However, what we need to do is a bit weird.
 
  Right now, we have ~12 million segments and growing. Usually these are
  sentences but can be other things. These segments are what will be stored
  in Solr. I’ve already done that.
 
  Now, what happens is a user will upload say a word document to us. We
 then
  parse it and process it into segments. It very well could be 5000
 segments
  or even more in that word document. Each one of those ~5000 segments
 needs
  to be searched for similar segments in solr. I’m not quite sure how I
 will
  do the query (whether proximate or something else). The point though, is
 to
  get back similar results for each segment.
 
  However, I think I’m seeing a bigger problem first. I have to search
  against ~5000 segments. That would be 5000 http requests. That’s a lot!
 I’m
  pretty sure that would take a LOT of hardware. Keep in mind this could be
  happening with maybe 4 different users at once right now (and of course
  more in the future). Is there a good way to send a batch query over one
 (or
  at least a lot fewer) http requests?
 
  If not, what kinds of things could I do to implement such a feature (if
  feasible, of course)?
 
 
  Thanks,
 
  Mike

Re: Batch Search Query

2013-03-28 Thread Roman Chyla

On Thu, Mar 28, 2013 at 12:27 PM, Mike Haas mikehaas...@gmail.com wrote:

 Thanks for your reply, Roman. Unfortunately, the business has been running
 this way forever so I don't think it would be feasible to switch to a whole


sure, no arguing against that :)


 document store versus segments store. Even then, if I understand you
 correctly it would not work for our needs. I'm thinking because we don't
 care about any other parts of the document, just the segment. If a similar
 segment is in an entirely different document, we want that segment.


the algo should work for this case - the beauty of the local winnowing is
that it is *local*, ie it tends to select the same segments from the text
(ie. you process two documents, written by two different people - but if
they cited the same thing, and it is longer than 'm' tokens, you will have
at least one identical fingerprints from both documents - which means:
match!) then of course, you can store the position offset of the original
words of the fingerprint and retrieve the original, compute ratio of
overlap etc... but a database seems to be better suited for these kind of
jobs...

let us know what you adopt!

ps: MoreLikeThis selects 'significant' tokens from the document you
selected and then constructs a new boolean query searching for those.
http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/


 I'll keep taking any and all feedback however so that I can develop an idea
 and present it to my manager.


 On Thu, Mar 28, 2013 at 11:16 AM, Roman Chyla roman.ch...@gmail.com
 wrote:

  Apologies if you already do something similar, but perhaps of general
  interest...
 
  One (different approach) to your problem is to implement a local
  fingerprint - if you want to find documents with overlapping segments,
 this
  algorithm will dramatically reduce the number of segments you
 create/search
  for every document
 
  http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf
 
  Then you simply end up indexing each document, and upon submission:
  computing fingerprints and querying for them. I don't know (ie. remember)
  exact numbers, but my feeling is that you end up storing ~13% of document
  text (besides, it is a one token fingerprint, therefore quite fast to
  search for - you could even try one huge boolean query with 1024 clauses,
  ouch... :))
 
  roman
 
  On Thu, Mar 28, 2013 at 11:43 AM, Mike Haas mikehaas...@gmail.com
 wrote:
 
   Hello. My company is currently thinking of switching over to Solr 4.2,
   coming off of SQL Server. However, what we need to do is a bit weird.
  
   Right now, we have ~12 million segments and growing. Usually these are
   sentences but can be other things. These segments are what will be
 stored
   in Solr. I’ve already done that.
  
   Now, what happens is a user will upload say a word document to us. We
  then
   parse it and process it into segments. It very well could be 5000
  segments
   or even more in that word document. Each one of those ~5000 segments
  needs
   to be searched for similar segments in solr. I’m not quite sure how I
  will
   do the query (whether proximate or something else). The point though,
 is
  to
   get back similar results for each segment.
  
   However, I think I’m seeing a bigger problem first. I have to search
   against ~5000 segments. That would be 5000 http requests. That’s a lot!
  I’m
   pretty sure that would take a LOT of hardware. Keep in mind this could
 be
   happening with maybe 4 different users at once right now (and of course
   more in the future). Is there a good way to send a batch query over one
  (or
   at least a lot fewer) http requests?
  
   If not, what kinds of things could I do to implement such a feature (if
   feasible, of course)?
  
  
   Thanks,
  
   Mike

multiple SolrCloud clusters with one ZooKeeper ensemble?

2013-03-28 Thread Bill Au

Can I use a single ZooKeeper ensemble for multiple SolrCloud clusters or
would each SolrCloud cluster requires its own ZooKeeper ensemble?

Bill

Re: multiple SolrCloud clusters with one ZooKeeper ensemble?


: Can I use a single ZooKeeper ensemble for multiple SolrCloud clusters or
: would each SolrCloud cluster requires its own ZooKeeper ensemble?

https://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot

(I'm going to FAQ this)


-Hoss

Re: Too many fields to Sort in Solr

I didn't have to do anything with the codecs to make it work. Checked my
solrconfig.xml and the codecFactory element is not present.  I'm running
the out of the box jetty setup.


On Thu, Mar 28, 2013 at 11:58 AM, adityab aditya_ba...@yahoo.com wrote:

 Update ---

 I was able to fix the exception by adding following line in solrconfig.xml

 codecFactory name=CodecFactory class=solr.SchemaCodecFactory /

 Not sure if its mentioned in any document to have this declared in config
 file.
 I am now re-indexing and data on the master and will perform test to see if
 it works as expected.

 thanks for your support.

 Aditya




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4052091.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Joel Bernstein
Professional Services LucidWorks

Re: Batch Search Query

I will definitely let you all know what we end up doing. I realized I
forgot to mention something that might make what we do more clear.

Right now we use sql server full text to get back fairly similar matches
for each segment. We do this with some funky sql stuff which I didn't write
and haven't even looked at. It gives us back 100 results. They are not
really all that good of matches though, it just gives us something to work
with. So although some results are good, some are horrible. Then, to truly
make sure we have a good match we take each one of those ~100 results and
run it through a levenshtein algorithm implemented in c# code. Levenshtein
gives back a % match. We then use the highest match so long as it is above
85%

Hope this makes it a little more clear what we are doing.


On Thu, Mar 28, 2013 at 11:39 AM, Roman Chyla roman.ch...@gmail.com wrote:

 On Thu, Mar 28, 2013 at 12:27 PM, Mike Haas mikehaas...@gmail.com wrote:

  Thanks for your reply, Roman. Unfortunately, the business has been
 running
  this way forever so I don't think it would be feasible to switch to a
 whole
 

 sure, no arguing against that :)


  document store versus segments store. Even then, if I understand you
  correctly it would not work for our needs. I'm thinking because we don't
  care about any other parts of the document, just the segment. If a
 similar
  segment is in an entirely different document, we want that segment.
 

 the algo should work for this case - the beauty of the local winnowing is
 that it is *local*, ie it tends to select the same segments from the text
 (ie. you process two documents, written by two different people - but if
 they cited the same thing, and it is longer than 'm' tokens, you will have
 at least one identical fingerprints from both documents - which means:
 match!) then of course, you can store the position offset of the original
 words of the fingerprint and retrieve the original, compute ratio of
 overlap etc... but a database seems to be better suited for these kind of
 jobs...

 let us know what you adopt!

 ps: MoreLikeThis selects 'significant' tokens from the document you
 selected and then constructs a new boolean query searching for those.
 http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/

 
  I'll keep taking any and all feedback however so that I can develop an
 idea
  and present it to my manager.
 
 
  On Thu, Mar 28, 2013 at 11:16 AM, Roman Chyla roman.ch...@gmail.com
  wrote:
 
   Apologies if you already do something similar, but perhaps of general
   interest...
  
   One (different approach) to your problem is to implement a local
   fingerprint - if you want to find documents with overlapping segments,
  this
   algorithm will dramatically reduce the number of segments you
  create/search
   for every document
  
   http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf
  
   Then you simply end up indexing each document, and upon submission:
   computing fingerprints and querying for them. I don't know (ie.
 remember)
   exact numbers, but my feeling is that you end up storing ~13% of
 document
   text (besides, it is a one token fingerprint, therefore quite fast to
   search for - you could even try one huge boolean query with 1024
 clauses,
   ouch... :))
  
   roman
  
   On Thu, Mar 28, 2013 at 11:43 AM, Mike Haas mikehaas...@gmail.com
  wrote:
  
Hello. My company is currently thinking of switching over to Solr
 4.2,
coming off of SQL Server. However, what we need to do is a bit weird.
   
Right now, we have ~12 million segments and growing. Usually these
 are
sentences but can be other things. These segments are what will be
  stored
in Solr. I’ve already done that.
   
Now, what happens is a user will upload say a word document to us. We
   then
parse it and process it into segments. It very well could be 5000
   segments
or even more in that word document. Each one of those ~5000 segments
   needs
to be searched for similar segments in solr. I’m not quite sure how I
   will
do the query (whether proximate or something else). The point though,
  is
   to
get back similar results for each segment.
   
However, I think I’m seeing a bigger problem first. I have to search
against ~5000 segments. That would be 5000 http requests. That’s a
 lot!
   I’m
pretty sure that would take a LOT of hardware. Keep in mind this
 could
  be
happening with maybe 4 different users at once right now (and of
 course
more in the future). Is there a good way to send a batch query over
 one
   (or
at least a lot fewer) http requests?
   
If not, what kinds of things could I do to implement such a feature
 (if
feasible, of course)?
   
   
Thanks,
   
Mike

Re: Solr sorting and relevance

2013-03-28 Thread scallawa

Thanks for the fast response.  I am still just learning solr so please bear
with me.  

This still sounds like the wrong products would appear at the top if they
have more inventory unless I am misunderstanding.  High boost low boost
seems to make sense to me.  That alone would return the more relevant items
at the top but once we do a query boost on inventory, wouldn't jeans (using
the aforementioned example) with more inventory that boots appear at top.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-sorting-and-relevance-tp4051918p4052122.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Too many fields to Sort in Solr

Wo. that's strange. 

I tried toggling with the code factory line in solrconfig.xml (attached in
this post)
commenting gives me error where as un-commenting works. 

can you please take a look into config and let me know if anything wrong
there?

thanks
Aditya 


solrconfig.xml
http://lucene.472066.n3.nabble.com/file/n4052131/solrconfig.xml  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4052131.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr sorting and relevance

If you had a high boost on the title with a moderate boost on the inventory
it sounds like you'd get boots first ordered by inventory followed by jeans
ordered by inventory. Because the heavy title boost would move the boots to
the top. You can play with the boost factors to try and get the mix you're
looking for.


On Thu, Mar 28, 2013 at 1:20 PM, scallawa dami...@altrec.com wrote:

 Thanks for the fast response.  I am still just learning solr so please bear
 with me.

 This still sounds like the wrong products would appear at the top if they
 have more inventory unless I am misunderstanding.  High boost low boost
 seems to make sense to me.  That alone would return the more relevant items
 at the top but once we do a query boost on inventory, wouldn't jeans (using
 the aforementioned example) with more inventory that boots appear at top.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-sorting-and-relevance-tp4051918p4052122.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Joel Bernstein
Professional Services LucidWorks

Re: multiple SolrCloud clusters with one ZooKeeper ensemble?

2013-03-28 Thread Bill Au

Thanks.

Now I have to go back and re-read the entire SolrCloud Wiki to see what
other info I missed and/or forgot.

Bill


On Thu, Mar 28, 2013 at 12:48 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : Can I use a single ZooKeeper ensemble for multiple SolrCloud clusters or
 : would each SolrCloud cluster requires its own ZooKeeper ensemble?

 https://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot

 (I'm going to FAQ this)


 -Hoss

Could not load config for solrconfig.xml

Hi,
solr setup in windows worked fine,
I tried to follow installing solr in unix, when I started tomcat I got this 
exxception :


SEVERE: Unable to create core: collection1
org.apache.solr.common.SolrException: Could not load config for solrconfig.xml
        at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java
:991)
        at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
        at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
        at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:44
1)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in classpat
h or '/home/javaguys/solr-home/collection1/conf/', cwd=/home/spbear/javaguys/apa
che-tomcat-7.0.39/bin
        at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoad
er.java:318)
        at org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader
.java:283)
        at org.apache.solr.core.Config.init(Config.java:103)
        at org.apache.solr.core.Config.init(Config.java:73)
        at org.apache.solr.core.SolrConfig.init(SolrConfig.java:117)
        at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java
:989)
        ... 11 more
Mar 28, 2013 1:39:43 PM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.SolrException: Unable to create core: collec
tion1
        at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:
1672)
        at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1057)
        at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
        at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:44
1)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.solr.common.SolrException: Could not load config for solrc
onfig.xml
        at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java
:991)
        at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
        ... 10 more
Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in classpat
h or '/home/javaguys/solr-home/collection1/conf/', cwd=/home/spbear/javaguys/apa
che-tomcat-7.0.39/bin
        at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoad
er.java:318)
        at org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader
.java:283)
        at org.apache.solr.core.Config.init(Config.java:103)
        at org.apache.solr.core.Config.init(Config.java:73)
        at org.apache.solr.core.SolrConfig.init(SolrConfig.java:117)
        at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java
:989)
        ... 11 more

Mar 28, 2013 1:39:43 PM org.apache.solr.servlet.SolrDispatchFilter init
INFO: user.dir=/home/spbear/javaguys/apache-tomcat-7.0.39/bin
Mar 28, 2013 1:39:43 PM org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init() done
Mar 28, 2013 1:39:43 PM org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory /home/spbear/javaguys/apache-tomcat-7.
INFO: Registering Log Listener
Mar 28, 2013 1:39:42 PM org.apache.solr.core.CoreContainer create
INFO: Creating SolrCore 'collection1' using instanceDir: /home/javaguys/solr-hom
e/collection1
Mar 28, 2013 1:39:42 PM org.apache.solr.core.SolrResourceLoader init
INFO: new SolrResourceLoader for directory: '/home/javaguys/solr-home/collection
1/'
Mar 28, 2013 1:39:43 PM org.apache.solr.core.CoreContainer recordAndThrow
SEVERE: Unable to create core: collection1
org.apache.solr.common.SolrException: Could not load config for solrconfig.xml
        at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java
:991)
        at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)

Re: bootstrap_conf without restarting

Couple notes though:

 java -classpath example/solr-webapp/WEB-INF/lib/*
 org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983
 -confdir example/solr/collection1/conf -confname conf1 -solrhome
 example/solr

I don't think you want that -solrhome - if I remember right, thats for 
testing/local purposes and is just for when you want to run zk internally from 
the cmd. Generally that should be ignored. I think you also might want to put 
the -classpath value in quotes, or your OS can do some auto expanding that 
causes issues…so I think it might be better to do like:

 java -classpath example/solr-webapp/WEB-INF/lib/*
 org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983
 -confdir example/solr/collection1/conf -confname conf1

I think the examples on the wiki should probably be updated. -solrhome is only 
needed with the bootstrap option I believe.

- Mark

On Mar 28, 2013, at 1:14 PM, Joel Bernstein joels...@gmail.com wrote:

 You can use the upconfig command witch is described on the Solr Cloud wiki
 page, followed by a collection reload also described on the wiki. Here is a
 sample command upconfig:
 
 java -classpath example/solr-webapp/WEB-INF/lib/*
 org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983
 -confdir example/solr/collection1/conf -confname conf1 -solrhome
 example/solr
 
 
 
 
 On Thu, Mar 28, 2013 at 12:05 PM, jimtronic jimtro...@gmail.com wrote:
 
 I'm doing fairly frequent changes to my data-config.xml files on some of my
 cores in a solr cloud setup. Is there anyway to to get these files active
 and up to Zookeeper without restarting the instance?
 
 I've noticed that if I just launch another instance of solr with the
 bootstrap_conf flag set to true, it uploads the new settings, but it dies
 because there's already a solr instance running on that port. It also seems
 to make the original one unresponsive or at least down in zookeeper's
 eyes. I then just restart that instance and everything is back up. It'd be
 nice if I could bootstrap without actually starting solr.
 
 What's the best practice for deploying changes to data-config.xml?
 
 Thanks, Jim
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/bootstrap-conf-without-restarting-tp4052092.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 -- 
 Joel Bernstein
 Professional Services LucidWorks

Re: Could not load config for solrconfig.xml

2013-03-28 Thread Gora Mohanty

On 29 March 2013 00:19, A. Lotfi majidna...@yahoo.com wrote:
 Hi,
 solr setup in windows worked fine,
 I tried to follow installing solr in unix, when I started tomcat I got this 
 exxception :
[...]

Seems it cannot find solrconfig.xml. The relevant part from the logs is:
Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in classpat
h or '/home/javaguys/solr-home/collection1/conf/', cwd=/home/spbear/javaguys/apa
che-tomcat-7.0.39/bin

Have you defined the solr/home property properly in your Solr
configuration file?

Regards,
Gora

Re: SOLR - Documents with large number of fields ~ 450

2013-03-28 Thread Marcin Rzewucki

Hi John,

Mark is right. DocValues can be enabled in two ways: RAM resident (default)
or on-disk. You can read more here:
http://www.slideshare.net/LucidImagination/column-stride-fields-aka-docvalues

Regards.

On 22 March 2013 16:55, John Nielsen j...@mcb.dk wrote:

with the on disk option.

Could you elaborate on that?
Den 22/03/2013 05.25 skrev Mark Miller markrmil...@gmail.com:

You might try using docvalues with the on disk option and try and let the
OS manage all the memory needed for all the faceting/sorting. This would
require Solr 4.2.

- Mark

On Mar 21, 2013, at 2:56 AM, kobe.free.wo...@gmail.com wrote:

Hello All,

Scenario:

My data model consist of approx. 450 fields with different types of
data. We
want to include each field for indexing as a result it will create a
single
SOLR document with *450 fields*. The total of number of records in the
data
set is *755K*. We will be using the features like faceting and sorting
on
approx. 50 fields.

We are planning to use SOLR 4.1. Following is the hardware
configuration
of
the web server that we plan to install SOLR on:-

CPU: 2 x Dual Core (4 cores) | RAM: 12GB | Storage: 212 GB

Questions :

1)What's the best approach when dealing with documents with large
number
of
fields. What's the drawback of having a single document with a very
large
number of fields. Does SOLR support documents with large number of
fields as
in my case?

2)Will there be any performance issue if i define all of the 450 fields
for
indexing? Also if faceting is done on 50 fields with document having
large
number of fields and huge number of records?

3)The name of the fields in the data set are quiet lengthy around 60
characters. Will it be a problem defining fields with such a huge name
in
the schema file? Is there any best practice to be followed related to
naming
convention? Will big field names create problem during querying?

Thanks!

--
View this message in context:

http://lucene.472066.n3.nabble.com/SOLR-Documents-with-large-number-of-fields-450-tp4049633.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr sorting and relevance

Otis brings up a good point. Possibly you could put logic in your function
query to account for this. But it may be that you can't achieve the mix
you're looking for without taking direct control.

That is the main reason that SOLR-4465 was put out there, for cases where
direct control is needed. I have to reiterate that SOLR-4465 is
experimental at this point and subject to change.



On Thu, Mar 28, 2013 at 3:00 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Hi,

 But can you ever get this universally right?
 In some cases there is very little inventory and in some case there is
 a ton of inventory, so even if you use a small boost for inventory,
 when the intentory is very large, that will overpower the title boost,
 no?

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Thu, Mar 28, 2013 at 2:27 PM, Joel Bernstein joels...@gmail.com
 wrote:
  If you had a high boost on the title with a moderate boost on the
 inventory
  it sounds like you'd get boots first ordered by inventory followed by
 jeans
  ordered by inventory. Because the heavy title boost would move the boots
 to
  the top. You can play with the boost factors to try and get the mix
 you're
  looking for.
 
 
  On Thu, Mar 28, 2013 at 1:20 PM, scallawa dami...@altrec.com wrote:
 
  Thanks for the fast response.  I am still just learning solr so please
 bear
  with me.
 
  This still sounds like the wrong products would appear at the top if
 they
  have more inventory unless I am misunderstanding.  High boost low boost
  seems to make sense to me.  That alone would return the more relevant
 items
  at the top but once we do a query boost on inventory, wouldn't jeans
 (using
  the aforementioned example) with more inventory that boots appear at
 top.
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Solr-sorting-and-relevance-tp4051918p4052122.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
  --
  Joel Bernstein
  Professional Services LucidWorks




-- 
Joel Bernstein
Professional Services LucidWorks

Re: Could not load config for solrconfig.xml

Thanks, 
my path to solr home was missing something, it's worlking, but no results, the 
same solr app with same configuration files worked in windows.

Abdel

 From: Gora Mohanty g...@mimirtech.com
To: solr-user@lucene.apache.org; A. Lotfi majidna...@yahoo.com 
Cc: gene...@lucene.apache.org gene...@lucene.apache.org 
Sent: Thursday, March 28, 2013 3:22 PM
Subject: Re: Could not load config for solrconfig.xml

On 29 March 2013 00:19, A. Lotfi majidna...@yahoo.com wrote:
 Hi,
 solr setup in windows worked fine,
 I tried to follow installing solr in unix, when I started tomcat I got this 
 exxception :
[...]

Seems it cannot find solrconfig.xml. The relevant part from the logs is:
Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in classpat
h or '/home/javaguys/solr-home/collection1/conf/', cwd=/home/spbear/javaguys/apa
che-tomcat-7.0.39/bin

Have you defined the solr/home property properly in your Solr
configuration file?

Regards,
Gora

Re: Too many fields to Sort in Solr

Not, sure that making changes to the solrconfig.xml is going down the right
path here. There might something else with your setup that's causing this
issue. I'm not sure what it would be though.


On Thu, Mar 28, 2013 at 1:38 PM, adityab aditya_ba...@yahoo.com wrote:

 Wo. that's strange.

 I tried toggling with the code factory line in solrconfig.xml (attached in
 this post)
 commenting gives me error where as un-commenting works.

 can you please take a look into config and let me know if anything wrong
 there?

 thanks
 Aditya


 solrconfig.xml
 http://lucene.472066.n3.nabble.com/file/n4052131/solrconfig.xml



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4052131.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Joel Bernstein
Professional Services LucidWorks

Re: bootstrap_conf without restarting

2013-03-28 Thread Timothy Potter

I do this frequently, but use the scripts provided in cloud-scripts, e.g.

export ZK_HOST=...

cloud-scripts/zkcli.sh -zkhost $ZK_HOST -cmd upconfig -confdir
$COLLECTION_INSTANCE_DIR/conf -confname $COLLECTION_NAME

Also, once you do this, you still have to reload the collection so that it
picks up the change:

curl -i -v 
http://URL/solr/admin/collections?action=RELOADname=COLLECTION_NAME;




On Thu, Mar 28, 2013 at 1:03 PM, Mark Miller markrmil...@gmail.com wrote:

 Couple notes though:

  java -classpath example/solr-webapp/WEB-INF/lib/*
  org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983
  -confdir example/solr/collection1/conf -confname conf1 -solrhome
  example/solr

 I don't think you want that -solrhome - if I remember right, thats for
 testing/local purposes and is just for when you want to run zk internally
 from the cmd. Generally that should be ignored. I think you also might want
 to put the -classpath value in quotes, or your OS can do some auto
 expanding that causes issues…so I think it might be better to do like:

  java -classpath example/solr-webapp/WEB-INF/lib/*
  org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983
  -confdir example/solr/collection1/conf -confname conf1

 I think the examples on the wiki should probably be updated. -solrhome is
 only needed with the bootstrap option I believe.

 - Mark

 On Mar 28, 2013, at 1:14 PM, Joel Bernstein joels...@gmail.com wrote:

  You can use the upconfig command witch is described on the Solr Cloud
 wiki
  page, followed by a collection reload also described on the wiki. Here
 is a
  sample command upconfig:
 
  java -classpath example/solr-webapp/WEB-INF/lib/*
  org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983
  -confdir example/solr/collection1/conf -confname conf1 -solrhome
  example/solr
 
 
 
 
  On Thu, Mar 28, 2013 at 12:05 PM, jimtronic jimtro...@gmail.com wrote:
 
  I'm doing fairly frequent changes to my data-config.xml files on some
 of my
  cores in a solr cloud setup. Is there anyway to to get these files
 active
  and up to Zookeeper without restarting the instance?
 
  I've noticed that if I just launch another instance of solr with the
  bootstrap_conf flag set to true, it uploads the new settings, but it
 dies
  because there's already a solr instance running on that port. It also
 seems
  to make the original one unresponsive or at least down in zookeeper's
  eyes. I then just restart that instance and everything is back up. It'd
 be
  nice if I could bootstrap without actually starting solr.
 
  What's the best practice for deploying changes to data-config.xml?
 
  Thanks, Jim
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/bootstrap-conf-without-restarting-tp4052092.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
  --
  Joel Bernstein
  Professional Services LucidWorks

Re: Could not load config for solrconfig.xml

2013-03-28 Thread Gora Mohanty

On 29 March 2013 01:59, A. Lotfi majidna...@yahoo.com wrote:
 Thanks,
 my path to solr home was missing something, it's worlking, but no results, 
 the same solr app with same configuration files worked in windows.

What do you mean by no results? Have you indexed stuff, and
are not able to search for it? Are you expecting to copy Solr files
from an old setup with an index, and have things work? That would
be OK, provided that the Solr index formats were compatible,
but you would also need to copy the index, and define dataDir
properly in solrconfig.xml.

Regards,
Gora

Re: Solr Cloud update process

2013-03-28 Thread Walter Underwood

There are lots of small issues, though. 

1. Is Solr tested with a mix of current and previous versions? It is safe to 
run a cluster that is a mix of 4.1 and 4.2, even for a little bit?

2. Can Solr 4.2 run with Solr 4.1 config files? This means all of conf/, not 
just the main XML files.

3. We don't want a cluster with config files that are ahead of the software 
version, so I think we need:

* Update all the war files and restart each Solr process.
* Upload the new config files 
* Reload each collection on each Solr process

But this requires that Solr 4.2 be able to start with Solr 4.1 config files.

4. Do we need to stop updates, wait for all nodes to sync, and not restart 
until the whole cluster is uploaded.

5. I'd like a bit more detail about exactly what upconfig is supposed to do, 
because I spent a lot of time with it doing things that did not result in a 
working Solr cluster. For example, for files in the directory argument, where 
exactly do they end up in the Zookeeper space? Currently, I've been doing 
updates with bootstrap, because it was the only thing I could get to work.

wunder

On Mar 27, 2013, at 11:56 AM, Shawn Heisey wrote:

 On 3/27/2013 12:34 PM, Walter Underwood wrote:
 What do people do for updating, say from 4.1 to 4.2.1, on a live cluster?
 
 I need to help our release engineering team create the Jenkins scripts for 
 deployment.
 
 Aside from replacing the .war file and restarting your container, there 
 hopefully won't be anything additional required.
 
 The subject says SolrCloud, so your config(s) should be in zookeeper. It 
 would generally be a good idea to update luceneMatchVersion to LUCENE_42 in 
 the config(s), unless you happen to know that you're relying on behavior from 
 the old version that changed in the new version.
 
 I also make a point of deleting the old extracted version of the .war before 
 restarting, just to be sure there won't be any problems.  In theory a servlet 
 container should be able to handle this without intervention, but I don't 
 like taking the chance.
 
 Thanks,
 Shawn

Re: Solr Cloud update process

2013-03-28 Thread Timothy Potter

Hi Walter,

I just did our upgrade from a nightly build of 4.1 (a few weeks before the
release) and 4.2 - thankfully it went off with 0 downtime and no issues ;-)

First and foremost, I had a staging environment that I upgraded first so I
already had a good feeling that things would be fine. Hopefully you have a
sandbox environment where you can mess around with the upgrade first.

On Thu, Mar 28, 2013 at 3:01 PM, Walter Underwood wun...@wunderwood.orgwrote:

 There are lots of small issues, though.

 1. Is Solr tested with a mix of current and previous versions? It is safe
 to run a cluster that is a mix of 4.1 and 4.2, even for a little bit?


I did a rolling upgrade and no issues. So I dropped a node, waited until
that was noticed by Zk (almost instant). This left me with a new leader
still on 4.1 and then I brought up a replica on 4.2. Then I took down the
leader on 4.1 (so Solr failed over to my 4.2 node) and brought it up to 4.2



 2. Can Solr 4.2 run with Solr 4.1 config files? This means all of conf/,
 not just the main XML files.


Afaik yes - I didn't change any configuration between 4.1 and 4.2 other
than some newSearcher warming queries and cache settings



 3. We don't want a cluster with config files that are ahead of the
 software version, so I think we need:

 * Update all the war files and restart each Solr process.
 * Upload the new config files
 * Reload each collection on each Solr process

 But this requires that Solr 4.2 be able to start with Solr 4.1 config
 files.


This is what I did too.



 4. Do we need to stop updates, wait for all nodes to sync, and not restart
 until the whole cluster is uploaded.


Can't help you on this one as I was not accepting updates during the
upgrade.



 5. I'd like a bit more detail about exactly what upconfig is supposed to
 do, because I spent a lot of time with it doing things that did not result
 in a working Solr cluster. For example, for files in the directory
 argument, where exactly do they end up in the Zookeeper space? Currently,
 I've been doing updates with bootstrap, because it was the only thing I
 could get to work.


So when you do upconfig, you pass the collection name, so the files get put
under: /configs/COLLECTION_NAME
You can test this by doing the upconfig and then going into the admin
console: Cloud  Tree  /configs and verifying your updates are correct.




 wunder

 On Mar 27, 2013, at 11:56 AM, Shawn Heisey wrote:

  On 3/27/2013 12:34 PM, Walter Underwood wrote:
  What do people do for updating, say from 4.1 to 4.2.1, on a live
 cluster?
 
  I need to help our release engineering team create the Jenkins scripts
 for deployment.
 
  Aside from replacing the .war file and restarting your container, there
 hopefully won't be anything additional required.
 
  The subject says SolrCloud, so your config(s) should be in zookeeper. It
 would generally be a good idea to update luceneMatchVersion to LUCENE_42 in
 the config(s), unless you happen to know that you're relying on behavior
 from the old version that changed in the new version.
 
  I also make a point of deleting the old extracted version of the .war
 before restarting, just to be sure there won't be any problems.  In theory
 a servlet container should be able to handle this without intervention, but
 I don't like taking the chance.
 
  Thanks,
  Shawn

Re: Solr Cloud update process

Comments hidden inline below. Overall - we need to focus on upgrades at some 
point, but there is little that should stop the old distrib update process from 
working (multi node clusters pre solrcloud).

Hoever, we should have tests and stuff. If only the days were twice as long.

On Mar 28, 2013, at 5:27 PM, Timothy Potter thelabd...@gmail.com wrote:

 Hi Walter,
 
 I just did our upgrade from a nightly build of 4.1 (a few weeks before the
 release) and 4.2 - thankfully it went off with 0 downtime and no issues ;-)
 
 First and foremost, I had a staging environment that I upgraded first so I
 already had a good feeling that things would be fine. Hopefully you have a
 sandbox environment where you can mess around with the upgrade first.
 
 On Thu, Mar 28, 2013 at 3:01 PM, Walter Underwood 
 wun...@wunderwood.orgwrote:
 
 There are lots of small issues, though.
 
 1. Is Solr tested with a mix of current and previous versions? It is safe
 to run a cluster that is a mix of 4.1 and 4.2, even for a little bit?
 
 
 I did a rolling upgrade and no issues. So I dropped a node, waited until
 that was noticed by Zk (almost instant). This left me with a new leader
 still on 4.1 and then I brought up a replica on 4.2. Then I took down the
 leader on 4.1 (so Solr failed over to my 4.2 node) and brought it up to 4.2
 
 
 
 2. Can Solr 4.2 run with Solr 4.1 config files? This means all of conf/,
 not just the main XML files.
 
 
 Afaik yes - I didn't change any configuration between 4.1 and 4.2 other
 than some newSearcher warming queries and cache settings

That's generally been how things work - old config works with new versions. 
Occasionally, things might get deprecated. That's why there is the version 
thing in solrconfig.xml.

 
 
 
 3. We don't want a cluster with config files that are ahead of the
 software version, so I think we need:
 
 * Update all the war files and restart each Solr process.
 * Upload the new config files
 * Reload each collection on each Solr process
 
 But this requires that Solr 4.2 be able to start with Solr 4.1 config
 files.
 
 
 This is what I did too.
 
 
 
 4. Do we need to stop updates, wait for all nodes to sync, and not restart
 until the whole cluster is uploaded.
 
 
 Can't help you on this one as I was not accepting updates during the
 upgrade.

This should generally work fine.

 
 
 
 5. I'd like a bit more detail about exactly what upconfig is supposed to
 do, because I spent a lot of time with it doing things that did not result
 in a working Solr cluster. For example, for files in the directory
 argument, where exactly do they end up in the Zookeeper space? Currently,
 I've been doing updates with bootstrap, because it was the only thing I
 could get to work.
 
 
 So when you do upconfig, you pass the collection name, so the files get put
 under: /configs/COLLECTION_NAME
 You can test this by doing the upconfig and then going into the admin
 console: Cloud  Tree  /configs and verifying your updates are correct.

The main different between using bootstrap and upconfig is that upconfig does 
not link a collection to a config set.

You must have a link from a collection to a config set. The following rules 
apply for this:

1. If there is only one config set, when you start a new collection without an 
explicit link, it will link to it.
2. If a collection does not have an explicit link, but shares the name of a 
config set, it will link to it.
3. You can set an explicit link.

Also, you can link before creating the collection - it will sit in zk waiting 
for the collection to find it.

- Mark

 
 
 
 
 wunder
 
 On Mar 27, 2013, at 11:56 AM, Shawn Heisey wrote:
 
 On 3/27/2013 12:34 PM, Walter Underwood wrote:
 What do people do for updating, say from 4.1 to 4.2.1, on a live
 cluster?
 
 I need to help our release engineering team create the Jenkins scripts
 for deployment.
 
 Aside from replacing the .war file and restarting your container, there
 hopefully won't be anything additional required.
 
 The subject says SolrCloud, so your config(s) should be in zookeeper. It
 would generally be a good idea to update luceneMatchVersion to LUCENE_42 in
 the config(s), unless you happen to know that you're relying on behavior
 from the old version that changed in the new version.
 
 I also make a point of deleting the old extracted version of the .war
 before restarting, just to be sure there won't be any problems.  In theory
 a servlet container should be able to handle this without intervention, but
 I don't like taking the chance.
 
 Thanks,
 Shawn

Re: [ANNOUNCE] Solr wiki editing change

2013-03-28 Thread Tomás Fernández Löbbe

Steve, could you add me to the contrib group? TomasFernandezLobbe

Thanks!

Tomás


On Thu, Mar 28, 2013 at 1:04 PM, Steve Rowe sar...@gmail.com wrote:

 On Mar 28, 2013, at 11:57 AM, Jilal Oussama jilal.ouss...@gmail.com
 wrote:
  Please add OussamaJilal to the group.

 Added to solr ContributorsGroup.

Re: Solrcloud 4.1 Collection with multiple slices only use

2013-03-28 Thread Chris R

So, by using the numshards  at initialization time, using the sample
collection1 solr.xml, I'm able to create a sharded and distributed index.
Also, by removing any initial cores from the solr.xml file, i'm able to use
the collections API via the web to create multiple collection with sharded
indexes that work correctly; however, I can't create distributed
collections by using the solr.xml alone.   Adding the numshards parameter
to the first instance of a collection core in the solr.xml file is ignore,
cores are created, by update distribution doesn't happen.  When booting up
Solr, the configs INFO messages show numShards= null.  I get the impression
from the documentation that you should be able to do this, buy I haven't
seen a specific example.

With out that, it seems that I'm relegated to the shard names, locations,
etc provided by the collections API.  I've done this testing under 4.1

True or False?

Chris
 On Mar 27, 2013 9:46 PM, corg...@gmail.com corg...@gmail.com wrote:

 I realized my error shortly, more docs, better spread.  I continued to do
 some testing to see how I could manually lay out the shards in what I
 thought was a more organized manner and with more descriptive  names than
 the numshards parameter alone produced.  I also gen'd up a few thousand
 docs and schema to test with.

 Appreciate the help.



 - Reply message -
 From: Erick Erickson erickerick...@gmail.com
 To: solr-user@lucene.apache.org
 Subject: Solrcloud 4.1 Collection with multiple slices only use
 Date: Wed, Mar 27, 2013 9:30 pm


 First, three documents isn't enough to really test. The formula for
 assigning shards is to hash on the unique ID. It _is_ possible that
 all three just happened to land on the same shard. If you index all 32
 docs in the example dir and they're all on the same shard, we should
 talk.

 Second, a regular query to the cluster will always search all the
 shards. Use distrib=false on the URL to restrict the search to just
 the node you fire the request at.

 Let us know if you index more docs and still see the problem.

 Best
 Erick

 On Wed, Mar 27, 2013 at 9:39 AM, Chris R corg...@gmail.com wrote:
  So - I must be missing something very basic here and I've gone back to
 the
  Wiki example.  After setting up the two shard example in the first
 tutorial
  and indexing the three example documents, look at the shards in the Admin
  UI.  The documents are stored in the index where the update with
 directed -
  they aren't distributed across both shards.
 
  Release notes state that the compositeId router is the default when using
  the numshards parameter?  I want an even distribution of documents based
 on
  ID across all shards suggestions on what I'm screwing up.
 
  Chris
 
  On Mon, Mar 25, 2013 at 11:34 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
  I'm guessing you didn't specify numShards. Things changed in 4.1 - if
 you
  don't specify numShards it goes into a mode where it's up to you to
  distribute updates.
 
  - Mark
 
  On Mar 25, 2013, at 10:29 PM, Chris R corg...@gmail.com wrote:
 
   I have two issues and I'm unsure if they are related:
  
   Problem:  After setting up a multiple collection Solrcloud 4.1
 instance
  on
   seven servers, when I index the documents they aren't distributed
 across
   the index slices.  It feels as though, I don't actually have a cloud
   implementation, yet everything I see in the admin interface and
 zookeeper
   implies I do.  I feel as I'm overlooking something obvious, but have
 not
   been able to figure out what.
  
   Configuration: Seven servers and four collections, each with 12 slices
  (no
   replica shards yet).  Zookeeper configured in a three node ensemble.
   When
   I send documents to Server1/Collection1 (which holds two slices of
   collection1), all the documents show up in a single index shard
 (core).
   Perhaps related, I have found it impossible to get Solr to recognize
 the
   server names with anything but a literal host=servername parameter
 in
  the
   solr.xml.  hostname parameters, host files, network, dns, are all
   configured correctly
  
   I have a Solr 4.0 single collection set up similarly and it works just
   fine.  I'm using the same schema.xml and solrconfig.xml files on the
 4.1
   implementation with only the luceneMatchVersion changed to LUCENE_41.
  
   sample solr.xml from server1
  
   ?xml version=1.0 encoding=UTF-8 ?
   solr persistent=true
   cores adminPath=/admin/cores hostPort=8080 host=server1
   shareSchema=true zkClientTimeout=6
   core collection=col201301 shard=col201301s04
   instanceDir=/solr/col201301/col201301s04sh01 name=col201301s04sh01
   dataDir=/solr/col201301/col201301s04sh01/data/
   core collection=col201301 shard=col201301s11
   instanceDir=/solr/col201301/col201301s11sh01 name=col201301s11sh01
   dataDir=/solr/col201301/col201301s11sh01/data/
   core collection=col201302 shard=col201302s06
   instanceDir=/solr/col201302/col201302s06sh01

Re: Solrcloud 4.1 Collection with multiple slices only use

True - though I think for 4.2. numShards has never been respected in the cores 
def's for various reasons.

In 4.0 and 4.1, things should have still worked though - you didn't need to 
give numShards and everything should work just based on configuring different 
shard names for core or accepting the default shard names.

In 4.2 this went away - not passing numShards now means that you must distrib 
updates yourself. There are various technical reasons for this given new 
features that are being added.

So, you can only really pre configure *one* collection in solr.xml and then use 
the numShards sys prop. If you wanted to create another collection the same way 
with a *different* number of shards, you would have to stop Solr, do a new 
numShards sys prop after pre configuring the next collection, then start Solr. 
Not really a good option.

And so, the collections API is the way to go - and it's fairly poor in 4.2 due 
to it's lack of result responses (you have to search the overseer logs). It's 
slightly better in 4.2 (you will get some response) and much better in 4.2.1 
(you will get decent responses).

Now that it's much more central, it will continue to improve rapidly.

- Mark

On Mar 28, 2013, at 6:08 PM, Chris R corg...@gmail.com wrote:

 So, by using the numshards  at initialization time, using the sample
 collection1 solr.xml, I'm able to create a sharded and distributed index.
 Also, by removing any initial cores from the solr.xml file, i'm able to use
 the collections API via the web to create multiple collection with sharded
 indexes that work correctly; however, I can't create distributed
 collections by using the solr.xml alone.   Adding the numshards parameter
 to the first instance of a collection core in the solr.xml file is ignore,
 cores are created, by update distribution doesn't happen.  When booting up
 Solr, the configs INFO messages show numShards= null.  I get the impression
 from the documentation that you should be able to do this, buy I haven't
 seen a specific example.
 
 With out that, it seems that I'm relegated to the shard names, locations,
 etc provided by the collections API.  I've done this testing under 4.1
 
 True or False?
 
 Chris
 On Mar 27, 2013 9:46 PM, corg...@gmail.com corg...@gmail.com wrote:
 
 I realized my error shortly, more docs, better spread.  I continued to do
 some testing to see how I could manually lay out the shards in what I
 thought was a more organized manner and with more descriptive  names than
 the numshards parameter alone produced.  I also gen'd up a few thousand
 docs and schema to test with.
 
 Appreciate the help.
 
 
 
 - Reply message -
 From: Erick Erickson erickerick...@gmail.com
 To: solr-user@lucene.apache.org
 Subject: Solrcloud 4.1 Collection with multiple slices only use
 Date: Wed, Mar 27, 2013 9:30 pm
 
 
 First, three documents isn't enough to really test. The formula for
 assigning shards is to hash on the unique ID. It _is_ possible that
 all three just happened to land on the same shard. If you index all 32
 docs in the example dir and they're all on the same shard, we should
 talk.
 
 Second, a regular query to the cluster will always search all the
 shards. Use distrib=false on the URL to restrict the search to just
 the node you fire the request at.
 
 Let us know if you index more docs and still see the problem.
 
 Best
 Erick
 
 On Wed, Mar 27, 2013 at 9:39 AM, Chris R corg...@gmail.com wrote:
 So - I must be missing something very basic here and I've gone back to
 the
 Wiki example.  After setting up the two shard example in the first
 tutorial
 and indexing the three example documents, look at the shards in the Admin
 UI.  The documents are stored in the index where the update with
 directed -
 they aren't distributed across both shards.
 
 Release notes state that the compositeId router is the default when using
 the numshards parameter?  I want an even distribution of documents based
 on
 ID across all shards suggestions on what I'm screwing up.
 
 Chris
 
 On Mon, Mar 25, 2013 at 11:34 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
 I'm guessing you didn't specify numShards. Things changed in 4.1 - if
 you
 don't specify numShards it goes into a mode where it's up to you to
 distribute updates.
 
 - Mark
 
 On Mar 25, 2013, at 10:29 PM, Chris R corg...@gmail.com wrote:
 
 I have two issues and I'm unsure if they are related:
 
 Problem:  After setting up a multiple collection Solrcloud 4.1
 instance
 on
 seven servers, when I index the documents they aren't distributed
 across
 the index slices.  It feels as though, I don't actually have a cloud
 implementation, yet everything I see in the admin interface and
 zookeeper
 implies I do.  I feel as I'm overlooking something obvious, but have
 not
 been able to figure out what.
 
 Configuration: Seven servers and four collections, each with 12 slices
 (no
 replica shards yet).  Zookeeper configured in a three node ensemble.
 When
 I send documents

Re: How to shut down the SolrCloud?

Currently, yes. Stop each web container in the normal fashion. That will do a 
clean shutdown.

- Mark

On Mar 28, 2013, at 5:48 PM, Li, Qiang qiang...@msci.com wrote:

 How to shut down the SolrCloud? Just kill all nodes?
 
 Regards,
 Ivan
 
 This email message and any attachments are for the sole use of the intended 
 recipients and may contain proprietary and/or confidential information which 
 may be privileged or otherwise protected from disclosure. Any unauthorized 
 review, use, disclosure or distribution is prohibited. If you are not an 
 intended recipient, please contact the sender by reply email and destroy the 
 original message and any copies of the message as well as any attachments to 
 the original message. Local registered entity information: 
 http://www.msci.com/legal/local_registered_entities.html

Re: Solrcloud 4.1 Collection with multiple slices only use

2013-03-28 Thread Chris R

Interesting, I've been doing battle with it while coming from a 4.0
environment.  I only had a single collection then and just created the
solr.xml files for each server up front.  They each supported a half dozen
cores for a single collection.

As for 4.1 and collections API, the only issue I've had is the
maxCoresPerNode.   As you said, the responses all say ok even when its
not.  I'll probably move up to 4.2 tomorrow.

Thanks for the reply.
On Mar 28, 2013 6:23 PM, Mark Miller markrmil...@gmail.com wrote:

 True - though I think for 4.2. numShards has never been respected in the
 cores def's for various reasons.

 In 4.0 and 4.1, things should have still worked though - you didn't need
 to give numShards and everything should work just based on configuring
 different shard names for core or accepting the default shard names.

 In 4.2 this went away - not passing numShards now means that you must
 distrib updates yourself. There are various technical reasons for this
 given new features that are being added.

 So, you can only really pre configure *one* collection in solr.xml and
 then use the numShards sys prop. If you wanted to create another collection
 the same way with a *different* number of shards, you would have to stop
 Solr, do a new numShards sys prop after pre configuring the next
 collection, then start Solr. Not really a good option.

 And so, the collections API is the way to go - and it's fairly poor in 4.2
 due to it's lack of result responses (you have to search the overseer
 logs). It's slightly better in 4.2 (you will get some response) and much
 better in 4.2.1 (you will get decent responses).

 Now that it's much more central, it will continue to improve rapidly.

 - Mark

 On Mar 28, 2013, at 6:08 PM, Chris R corg...@gmail.com wrote:

  So, by using the numshards  at initialization time, using the sample
  collection1 solr.xml, I'm able to create a sharded and distributed index.
  Also, by removing any initial cores from the solr.xml file, i'm able to
 use
  the collections API via the web to create multiple collection with
 sharded
  indexes that work correctly; however, I can't create distributed
  collections by using the solr.xml alone.   Adding the numshards parameter
  to the first instance of a collection core in the solr.xml file is
 ignore,
  cores are created, by update distribution doesn't happen.  When booting
 up
  Solr, the configs INFO messages show numShards= null.  I get the
 impression
  from the documentation that you should be able to do this, buy I haven't
  seen a specific example.
 
  With out that, it seems that I'm relegated to the shard names, locations,
  etc provided by the collections API.  I've done this testing under 4.1
 
  True or False?
 
  Chris
  On Mar 27, 2013 9:46 PM, corg...@gmail.com corg...@gmail.com wrote:
 
  I realized my error shortly, more docs, better spread.  I continued to
 do
  some testing to see how I could manually lay out the shards in what I
  thought was a more organized manner and with more descriptive  names
 than
  the numshards parameter alone produced.  I also gen'd up a few thousand
  docs and schema to test with.
 
  Appreciate the help.
 
 
 
  - Reply message -
  From: Erick Erickson erickerick...@gmail.com
  To: solr-user@lucene.apache.org
  Subject: Solrcloud 4.1 Collection with multiple slices only use
  Date: Wed, Mar 27, 2013 9:30 pm
 
 
  First, three documents isn't enough to really test. The formula for
  assigning shards is to hash on the unique ID. It _is_ possible that
  all three just happened to land on the same shard. If you index all 32
  docs in the example dir and they're all on the same shard, we should
  talk.
 
  Second, a regular query to the cluster will always search all the
  shards. Use distrib=false on the URL to restrict the search to just
  the node you fire the request at.
 
  Let us know if you index more docs and still see the problem.
 
  Best
  Erick
 
  On Wed, Mar 27, 2013 at 9:39 AM, Chris R corg...@gmail.com wrote:
  So - I must be missing something very basic here and I've gone back to
  the
  Wiki example.  After setting up the two shard example in the first
  tutorial
  and indexing the three example documents, look at the shards in the
 Admin
  UI.  The documents are stored in the index where the update with
  directed -
  they aren't distributed across both shards.
 
  Release notes state that the compositeId router is the default when
 using
  the numshards parameter?  I want an even distribution of documents
 based
  on
  ID across all shards suggestions on what I'm screwing up.
 
  Chris
 
  On Mon, Mar 25, 2013 at 11:34 PM, Mark Miller markrmil...@gmail.com
  wrote:
 
  I'm guessing you didn't specify numShards. Things changed in 4.1 - if
  you
  don't specify numShards it goes into a mode where it's up to you to
  distribute updates.
 
  - Mark
 
  On Mar 25, 2013, at 10:29 PM, Chris R corg...@gmail.com wrote:
 
  I have two issues and I'm unsure if they

Re: Solrcloud 4.1 Collection with multiple slices only use


On Mar 28, 2013, at 6:30 PM, Chris R corg...@gmail.com wrote:

 I'll probably move up to 4.2 tomorrow.

4.2.1 should be ready as soon as I have time to publish it - we have a passing 
vote and I think we are close to 72 hours after. I just have to stock up on 
some beer first - Robert tells me it's like a 20 beer event…

- Mark

Re: Batch Search Query


: Now, what happens is a user will upload say a word document to us. We then
: parse it and process it into segments. It very well could be 5000 segments
: or even more in that word document. Each one of those ~5000 segments needs
: to be searched for similar segments in solr. I’m not quite sure how I will
: do the query (whether proximate or something else). The point though, is to
: get back similar results for each segment.

You've described your black box (an index of small textual documents) 
and you've described your input (a large document that will be broken down 
into N=~5000 small textual snippets) but you haven't really clarified what 
your desired output should be...

* N textual documents from your index, where each doc is the 1 'best' 
match to 1 of hte N textual input snippets.

* Some fixed number Y textual documents from your index representing the 
best of the best matches against your textual input snippets (ie: if one 
input snippet is a really good match for multiple indexed docs, return 
all of those really good matches, but don't return any matches from 
other snippets if the only matches are poor.)

* Some variable number Y textual documents from your index representing 
the best of hte best matches against your textual input snippets based 
on some minimum threshhold of matching criteria.

* etc...

Forgot for a momoent that we are talking about solr at all -- describe 
some hypothetical data, some hypothetical query examples, and some 
hypothetical results you would like to get back (or not get back) 
from each of those query examples (ideally in psuedo-code) and lets see if 
that doesn't help suggest an implemntation strategy.


-Hoss

Re: Solrcloud 4.1 Collection with multiple slices only use

2013-03-28 Thread corgone

That's my kind of release!

Sent from my Verizon Wireless Phone

- Reply message -
From: Mark Miller markrmil...@gmail.com
To: solr-user@lucene.apache.org
Subject: Solrcloud 4.1 Collection with multiple slices only use
Date: Thu, Mar 28, 2013 6:34 pm

On Mar 28, 2013, at 6:30 PM, Chris R corg...@gmail.com wrote:

 I'll probably move up to 4.2 tomorrow.

4.2.1 should be ready as soon as I have time to publish it - we have a passing 
vote and I think we are close to 72 hours after. I just have to stock up on 
some beer first - Robert tells me it's like a 20 beer event…

- Mark

Re: How to update synonyms.txt without restart?


: But solr wiki says:
: ```
: Starting with Solr4.0, the RELOAD command is implemented in a way that
: results a live reloads of the SolrCore, reusing the existing various
: objects such as the SolrIndexWriter. As a result, some configuration
: options can not be changed and made active with a simple RELOAD...

Directly below that sentence are bullet points listing exactly which 
config options can't be changed with a simple reload...

   * IndexWriter related settings in indexConfig
   * dataDir location 

: http://wiki.apache.org/solr/CoreAdmin#RELOAD


-Hoss

Re: Solr Cloud update process

2013-03-28 Thread Shawn Heisey


On 3/28/2013 3:01 PM, Walter Underwood wrote:

There are lots of small issues, though.

1. Is Solr tested with a mix of current and previous versions? It is safe to 
run a cluster that is a mix of 4.1 and 4.2, even for a little bit?

2. Can Solr 4.2 run with Solr 4.1 config files? This means all of conf/, not 
just the main XML files.

3. We don't want a cluster with config files that are ahead of the software 
version, so I think we need:

* Update all the war files and restart each Solr process.
* Upload the new config files
* Reload each collection on each Solr process

But this requires that Solr 4.2 be able to start with Solr 4.1 config files.

4. Do we need to stop updates, wait for all nodes to sync, and not restart 
until the whole cluster is uploaded.

5. I'd like a bit more detail about exactly what upconfig is supposed to do, 
because I spent a lot of time with it doing things that did not result in a 
working Solr cluster. For example, for files in the directory argument, where 
exactly do they end up in the Zookeeper space? Currently, I've been doing 
updates with bootstrap, because it was the only thing I could get to work.


Solr 4.2 will work just fine with config files from 4.1.

I have a SolrCloud that was running a 4.1 snapshot.  I upgraded it to 
4.2.1 built from source with no problem.  The exact steps that I did were:


1) Replace solr.war.
2) Replace lucene-analyzers-icu-4.1-SNAPSHOT.jar with 
lucene-analyzers-icu-4.2.1-SNAPSHOT.jar

3) Upgrade all of my jetty jars from 8.1.7 to 8.1.9.
4) Repeat the steps above on the other server.
5) Use zkcli.sh to 'upconfig' a replacement config set with only one 
change - luceneMatchVersion went from LUCENE_40 to LUCENE_42.

6) Restart both Solr instances.

Upgrading jetty is something applicable to only my install, and was not 
a necessary step.  The jetty version currently included in Solr as of 
4.1 is 8.1.8 - see SOLR-4155.


The upconfig command on zkcli.sh will add/replace the config set with 
the one that you specify.  It will go into /configs in your zookeeper 
ensemble.  If you specify a chroot on your zkhost parameter, then it 
will go into /path/to/chroot/configs instead.  Most of the time a chroot 
will only have one element, so /chroot/configs would the most likely 
location.


I actually would like more detail on upconfig myself - what if you 
delete files from the config directory on disk?  Will they be deleted 
from zookeeper?  I use a solrconfig that has xinclude statements, and 
occasionally those files do get deleted or renamed.


Thanks,
Shawn

Re: Solrcloud 4.1 Collection with multiple slices only use

2013-03-28 Thread Shawn Heisey


On 3/28/2013 4:23 PM, Mark Miller wrote:

True - though I think for 4.2. numShards has never been respected in the cores 
def's for various reasons.

In 4.0 and 4.1, things should have still worked though - you didn't need to 
give numShards and everything should work just based on configuring different 
shard names for core or accepting the default shard names.

In 4.2 this went away - not passing numShards now means that you must distrib 
updates yourself. There are various technical reasons for this given new 
features that are being added.

So, you can only really pre configure *one* collection in solr.xml and then use 
the numShards sys prop. If you wanted to create another collection the same way 
with a *different* number of shards, you would have to stop Solr, do a new 
numShards sys prop after pre configuring the next collection, then start Solr. 
Not really a good option.

And so, the collections API is the way to go - and it's fairly poor in 4.2 due 
to it's lack of result responses (you have to search the overseer logs). It's 
slightly better in 4.2 (you will get some response) and much better in 4.2.1 
(you will get decent responses).

Now that it's much more central, it will continue to improve rapidly.


Can't you leave numShards out completely, then include a numShards 
parameter on a collection api CREATE url, possibly giving a different 
numShards to each collection?


Thanks,
Shawn

Re: SOLR - Unable to execute query error - DIH


: I am trying to index data from SQL Server view to the SOLR using the DIH

Have you ruled out the view itself being the bottle neck?

Try running whatever command line SQLServer client exists on your SOLR 
server to connect remotely to your existing SQL server and run select * 
from view and redirect thek output to a file.

that will give you a minimal absolute baseline for the best possible 
performace you could expect to hope for when indexing into Solr -- and tip 
you off to wether the view is the problem when asking for more then a 
handful of documents.



-Hoss

Re: Solrcloud 4.1 Collection with multiple slices only use


On Mar 28, 2013, at 7:30 PM, Shawn Heisey s...@elyograg.org wrote:

 Can't you leave numShards out completely, then include a numShards parameter 
 on a collection api CREATE url, possibly giving a different numShards to each 
 collection?
 
 Thanks,
 Shawn
 

Yes - that's why I say the collections API is the way forward - it has none of 
these limitations. The limitations are all around pre-configuring everything in 
solr.xml and not using the collections API.

- Mark

Re: Solr Cloud update process


On Mar 28, 2013, at 7:27 PM, Shawn Heisey s...@elyograg.org wrote:

 
 I actually would like more detail on upconfig myself - what if you delete 
 files from the config directory on disk?  Will they be deleted from 
 zookeeper?  I use a solrconfig that has xinclude statements, and occasionally 
 those files do get deleted or renamed.
 
 Thanks,
 Shawn
 

Currently, it's a straight upload - if files went away locally, they will stay 
in zk. It will just replace what you upload. Happy to help implement a sync 
option or something if you create a JIRA for it.

- mark

Re: How to update synonyms.txt without restart?

2013-03-28 Thread Upayavira

Not sure, but if you put it in the data dir, I think it picks it up and
reloads on commit.

Upayavira

On Thu, Mar 28, 2013, at 09:11 AM, Kaneyama Genta wrote:
 Dear all,
 
 I investigating how to update synonyms.txt.
 Some people says CORE RELOAD will reload synonyms.txt.
 
 But solr wiki says:
 ```
 Starting with Solr4.0, the RELOAD command is implemented in a way that
 results a live reloads of the SolrCore, reusing the existing various
 objects such as the SolrIndexWriter. As a result, some configuration
 options can not be changed and made active with a simple RELOAD...
 ```
 http://wiki.apache.org/solr/CoreAdmin#RELOAD
 
 And https://issues.apache.org/jira/browse/SOLR-3592 is marked as
 unresolved.
 
 Problem is How can I update synonyms.txt in production environment?
 Workaround is restart Solr process. But it is not looks good for me.
 
 Will someone tell me what is the best practice of synonyms.txt updating?
 
 Thanks in advance.

Re: How to update synonyms.txt without restart?

But this is fixed in 4.2 - now the index writer is rebooted on core reload.

So that's just 4.0 and 4.1.

- Mark

On Mar 28, 2013, at 6:48 PM, Chris Hostetter hossman_luc...@fucit.org wrote:

 
 : But solr wiki says:
 : ```
 : Starting with Solr4.0, the RELOAD command is implemented in a way that
 : results a live reloads of the SolrCore, reusing the existing various
 : objects such as the SolrIndexWriter. As a result, some configuration
 : options can not be changed and made active with a simple RELOAD...
 
 Directly below that sentence are bullet points listing exactly which 
 config options can't be changed with a simple reload...
 
  * IndexWriter related settings in indexConfig
  * dataDir location 
 
 : http://wiki.apache.org/solr/CoreAdmin#RELOAD
 
 
 -Hoss

Re: How to update synonyms.txt without restart?

Though I think *another* JIRA made data dir not changeable over core reload for 
some reason I don't recall exactly. But the other stuff is back to being 
changeable :)

- Mark

On Mar 28, 2013, at 8:04 PM, Mark Miller markrmil...@gmail.com wrote:

 But this is fixed in 4.2 - now the index writer is rebooted on core reload.
 
 So that's just 4.0 and 4.1.
 
 - Mark
 
 On Mar 28, 2013, at 6:48 PM, Chris Hostetter hossman_luc...@fucit.org wrote:
 
 
 : But solr wiki says:
 : ```
 : Starting with Solr4.0, the RELOAD command is implemented in a way that
 : results a live reloads of the SolrCore, reusing the existing various
 : objects such as the SolrIndexWriter. As a result, some configuration
 : options can not be changed and made active with a simple RELOAD...
 
 Directly below that sentence are bullet points listing exactly which 
 config options can't be changed with a simple reload...
 
 * IndexWriter related settings in indexConfig
 * dataDir location 
 
 : http://wiki.apache.org/solr/CoreAdmin#RELOAD
 
 
 -Hoss

Basic auth on SolrCloud /admin/* calls

2013-03-28 Thread Vaillancourt, Tim

Hey guys,

I've recently setup basic auth under Jetty 8 for all my Solr 4.x '/admin/*' 
calls, in order to protect my Collections and Cores API.

Although the security constraint is working as expected ('/admin/*' calls 
require Basic Auth or return 401), when I use the Collections API to create a 
collection, I receive a 200 OK to the Collections API CREATE call, but the 
background Cores API calls that are ran on the Collection API's behalf fail on 
the Basic Auth on other nodes with a 401 code, as I should have foreseen, but 
didn't.

Is there a way to tell SolrCloud to use authentication on internal Cores API 
calls that are spawned on Collections API's behalf, or is this a new feature 
request?

To reproduce:

1.   Implement basic auth on '/admin/*' URIs.

2.   Perform a CREATE Collections API call to a node (which will return 200 
OK).

3.   Notice all Cores API calls fail (Collection isn't created). See stack 
trace below from the node that was issued the CREATE call.

The stack trace I get is:

org.apache.solr.common.SolrException: Server at http://HOST 
HERE:8983/solrhttp://%3cHOST%20HERE%3e:8983/solr returned non ok status:401, 
message:Unauthorized
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:169)
at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:135)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)

Cheers!

Tim

Re: Could not load config for solrconfig.xml

In windows when I hit Execute Query button I got this results :

?xml version=1.0 encoding=UTF-8?responselst name=responseHeaderint 
name=status0/intint name=QTime181/intlst name=paramsstr 
name=indenttrue/strstr name=qstreetname:mdw/strstr 
name=wtxml/str/lst/lstresult name=response numFound=13674 
start=0docstr name=streetnameMEADOW/strstr 
name=lemsmatchcode2501001ABN 1MD 262/str/docdocstr 
name=streetnameMEADOW/strstr name=lemsmatchcode2501001ABRM1MD 
472/str/docdocstr name=streetnameMEADOW/strstr 
name=lemsmatchcode2501001ADMS1MD 350/str/docdoc

..
.

In Unix with same setup, I got this result :

?xml version=1.0 encoding=UTF-8?responselst name=responseHeaderint 
name=status0/intint name=QTime2/intlst name=paramsstr 
name=indenttrue/strstr name=q*:*/strstr 
name=wtxml/str/lst/lstresult name=response numFound=0 
start=0/result/response

 
I did not understand why .
thanks, your help is appreciated.



 From: Gora Mohanty g...@mimirtech.com
To: solr-user@lucene.apache.org; A. Lotfi majidna...@yahoo.com 
Sent: Thursday, March 28, 2013 4:40 PM
Subject: Re: Could not load config for solrconfig.xml
 
On 29 March 2013 01:59, A. Lotfi majidna...@yahoo.com wrote:
 Thanks,
 my path to solr home was missing something, it's worlking, but no results, 
 the same solr app with same configuration files worked in windows.

What do you mean by no results? Have you indexed stuff, and
are not able to search for it? Are you expecting to copy Solr files
from an old setup with an index, and have things work? That would
be OK, provided that the Solr index formats were compatible,
but you would also need to copy the index, and define dataDir
properly in solrconfig.xml.

Regards,
Gora

Re: Could not load config for solrconfig.xml

2013-03-28 Thread Gora Mohanty

On 29 March 2013 07:23, A. Lotfi majidna...@yahoo.com wrote:
 In windows when I hit Execute Query button I got this results :
[...]

There seem to be no documents in your Solr index on the
UNIX system. As I mentioned in my previous message, you
either need to copy the index files from the WIndows system
(provided that the Solr index format has not changed, this
will work), or reindex on the UNIX system.

Regards,
Gora

Re: Could not load config for solrconfig.xml