Re: Understanding fieldCache SUBREADER "insanity"

2012-09-21 Thread Aaron Daubman
Yonik, et al.

I believe I found the section of code pushing me into 'insanity' status:
---snip---
int[] collapseIDs = null;
float[] hotnessValues = null;
String[] artistIDs = null;
try {
collapseIDs =
FieldCache.DEFAULT.getInts(searcher.getIndexReader(),
COLLAPSE_KEY_NAME);
hotnessValues =
FieldCache.DEFAULT.getFloats(searcher.getIndexReader(),
HOTNESS_KEY_NAME);
artistIDs =
FieldCache.DEFAULT.getStrings(searcher.getIndexReader(),
ARTIST_KEY_NAME);
} ...
---snip---

Since it seems like this code is using the 'old-style' pre-Lucene 2.9
top-level indexReaders, is there any example code you can point me to
that could show how to convert to using the leaf level segmentReaders?
If the limited information I've been able to find is correct, this
could explain some of the significant memory usage I am seeing...

Thanks again,
 Aaron

On Wed, Sep 19, 2012 at 4:54 PM, Yonik Seeley  wrote:
>> already-optimized, single-segment index
>
> That part is interesting... if true, then the type of "insanity" you
> saw should be impossible, and either the insanity detection or
> something else is broken.
>
> -Yonik
> http://lucidworks.com


Re: Use DataImportHandler for Mapping XML file but not work

2012-09-21 Thread Gora Mohanty
On 21 September 2012 09:16, bhaveshjogi  wrote:
> Hi,I am using this link for mapping my xml file:
> http://wiki.apache.org/solr/DataImportHandler#wikipedia
>    but its not work
> because of complex XML file. Like
> this:US2011006A120110106US12497914200907061220060101AA43B1700FI20110106USBHCan
> i Map this xml file in my search engine?Thanks Bhavesh Jogi

It is difficult to understand what you mean by a complex XML
file. Could you:
* List the exact steps you tried to index this file, e.g.,
  did you change anything in the Solr setup from the
  default configuration, and the steps in the link that
  you mention.
* Describe what errors you are getting

Regards,
Gora


Re: solr4.0 compile: Unable to create javax script engine for javascript

2012-09-21 Thread Chris Hostetter

: I cant compile SOLR 4.0, but i can compile trunk fine.

Hmmm... that's suprising, the part of hte build file you pointed out as 
causing you problems on 4x also exists in trunk.

: Documentation for ant - https://ant.apache.org/manual/Tasks/script.html it
: requires some external library. I placed bsf-all-3.1.jar into ant lib
: directory but it still fails.

I believe the problem is not that you need BSF -- because you are using 
java6 (you have to to compile Solr) so BSF isn't needed.  The problem (i 
think) is that FreeBSD's JDK doesn't include javascript by default - i 
believe you just need to install the rhino "js.jar"

http://stackoverflow.com/questions/4649519/jvm-missing-rhino

Or, you should be able to work arround this by specifying 
"-Dlucene.javadoc.url=..." on the command line.  (that's all the script is 
used for -- to compute that)

I've opened LUCENE-4415 to try and address this one way or another...

https://issues.apache.org/jira/browse/LUCENE-4415



-Hoss


solr4.0 compile: Unable to create javax script engine for javascript

2012-09-21 Thread Radim Kolar

I cant compile SOLR 4.0, but i can compile trunk fine.

ant create-package fails with:

BUILD FAILED
/usr/local/jboss/.jenkins/jobs/Solr/workspace/solr/common-build.xml:229: 
The following error occurred while executing this line:
/usr/local/jboss/.jenkins/jobs/Solr/workspace/solr/common-build.xml:279: 
Unable to create javax script engine for javascript


Total time: 1 minute 19 seconds


I tried google and discovered this: 
http://stackoverflow.com/questions/9663801/freebsd-ant-javax-script-engine


Documentation for ant - https://ant.apache.org/manual/Tasks/script.html 
it requires some external library. I placed bsf-all-3.1.jar into ant lib 
directory but it still fails.


Can you please update document 
http://wiki.apache.org/solr/HowToCompileSolr?highlight=%28compile%29 
with information what libraries are needed?


Re: Solr Swap Function doesn't work when using Solr Cloud Beta

2012-09-21 Thread sam fang
Hi Chris,

Thanks for your help. Today I tried again and try to figure out the reason.

1. set up an external zookeeper server.

2. change /opt/solr/apache-solr-4.0.0-BETA/example/solr/solr.xml persistent
to true. and run below command to upload config to zk. (renamed multicore
to solr, and need to put zkcli.sh related jar package.)
/opt/solr/apache-solr-4.0.0-BETA/example/cloud-scripts/zkcli.sh -cmd
upconfig -confdir /opt/solr/apache-solr-4.0.0-BETA/example/solr/core0/conf/
-confname
core0 -z localhost:2181
/opt/solr/apache-solr-4.0.0-BETA/example/cloud-scripts/zkcli.sh -cmd
upconfig -confdir /opt/solr/apache-solr-4.0.0-BETA/example/solr/core1/conf/
-confname
core1 -z localhost:2181

3. Start jetty server
cd /opt/solr/apache-solr-4.0.0-BETA/example
java -DzkHost=localhost:2181 -jar start.jar

4. publish message to core0
/opt/solr/apache-solr-4.0.0-BETA/example/solr/exampledocs
cp ../../exampledocs/post.jar ./
java -Durl=http://localhost:8983/solr/core0/update -jar post.jar
 ipod_video.xml

5. query to core0 and core1 is ok.

6. Click "swap" in the admin page, the query to core0 and core1 is
changing. Previous I saw sometimes returns 0 result. sometimes return 1
result. Today
seems core0 still return 1 result, core1 return 0 result.

7. Then click "reload" in the admin page, the query to core0 and core1.
Sometimes return 1 result, and sometimes return nothing. Also can see the zk
configuration also changed.

8. Restart jetty server. If do the query, it's same as what I saw in step 7.

9. Stop jetty server, then log into zkCli.sh, then run command "set
/clusterstate.json {}". then start jetty again. everything back to normal,
that is what previous swap did in solr 3.6 or solr 4.0 w/o cloud.


>From my observation, after swap, seems it put shard information into
actualShards, when user request to search, it will use all shard
information to do the
search. But user can't see zk update until click "reload" button in admin
page. When restart web server, this shard information eventually went to
zk, and
the search go to all shards.

I found there is a option "distrib", and used url like "
http://host1:18000/solr/core0/select?distrib=false&q=*%3A*&wt=xml";, then
only get the data on the
core0. Digged in the code (handleRequestBody method in SearchHandler class,
seems it make sense)

I tried to stop tomcat server, then use command "set /clusterstate.json {}"
to clean all cluster state, then use command "cloud-scripts/zkcli.sh -cmd
upconfig" to upload config to zk server, and start tomcat server. It
rebuild the right shard information in zk. then search function back to
normal like what
we saw in 3.6 or 4.0 w/o cloud.

Seems solr always add shard information into zk.

I tested cloud swap on single machine, if each core have one shard in the
zk, after swap, eventually zk has 2 slices(shards) for that core because
 now only
do the add. so the search will go to both 2 shards.

and tested cloud swap with 2 machine which each core have 1 shard and 2
slices. Below the configuration in the zk. After swap, eventually zk has 4
for that
core. and search will mess up.

  "core0":{"shard1":{
  "host1:18000_solr_core0":{
"shard":"shard1",
"roles":null,
"leader":"true",
"state":"active",
"core":"core0",
"collection":"core0",
"node_name":"host1:18000_solr",
"base_url":"http://host1:18000/solr"},
  "host2:18000_solr_core0":{
"shard":"shard1",
"roles":null,
"state":"active",
"core":"core0",
"collection":"core0",
"node_name":"host2:18000_solr",
"base_url":"http://host2:18000/solr"}}},

For previous 2 cases, if I stoped tomcat/jetty server, then manullay upload
configuration to zk, then start tomcat server, zk and search become normal.

On Fri, Sep 21, 2012 at 3:34 PM, Chris Hostetter
wrote:

>
> : Below is my solr.xml configuration, and already set persistent to true.
> ...
> : Then publish 1 record to test1, and query. it's ok now.
>
> Ok, first off -- please provide more details on how exactly you are
> running Solr.  Your initial email said...
>
> >>> In Solr 3.6, core swap function works good. After switch to use Solr
> 4.0
> >>> Beta, and found it doesn't work well.
>
> ...but based on your solr.xml file and your logs, it appears you are now
> trying to use some of the ZooKeeper/SolrCloud features that didn't even
> exist in Solr 3.6, so it's kind of an apples to oranges comparison.  i'm
> pretty sure that for a simple multicore setup, SWAP still works exactly as
> it did in Solr 3.6.
>
> Wether SWAP works with ZooKeeper/SolrCloud is something i'm not really
> clear on -- mainly because i'm not sure what it should mean conceptually.
> Should the two SolrCores swap which collections they are apart of? what
> happens if the doc->shard assignment for the two collections means the
> same docs woulnd't wind up im those SolrCores? what if the SolrCores are
> two different shards of the same collectio

Re: How to boost date field while boosting a text field?

2012-09-21 Thread Chris Hostetter

: The following is the working query with more weight to title. I am using
: default parser. But the published_date of the results of this query is not
: in order. I want date is in order.

your request seems to contradict itself -- you say the results are not in 
order because you want "date is in order" but you also say you want to 
boost on the title field.

if you want the dates to be perfectly in order, then you can't let the 
score (based on the title field) have any effect.

Having said that - i'm guessing what you ment to say is that you want the 
date to *contribute* to the score/order, in which case this is discussed 
in the wiki...

https://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents


-Hoss


Re: Solr 3.6 observe connections in CLOSE_WAIT state

2012-09-21 Thread Chris Hostetter

: > I am using solr 3.6.0 , I have observed many connection in CLOSE_WAIT state
: > after using solr server for some time. On further analysis and googling
: > found that I need to close the idle connections from the client which is
: > connecting to solr to query data and it does reduce the number of CLOSE_WAIT
: > connections but still some connection remain in that state.
...
: This is a "me too" email.  We've got haproxy sitting in front of Solr.  A
: little more than half of the connections in the various WAIT states are from
: the webserver SolrJ clients to haproxy listening on ports 8983 and 8984.  This
: is SolrJ to haproxy, so there is a non-Solr component.

This *sounds* like the general problem addressed in the following 
issues...

https://issues.apache.org/jira/browse/SOLR-861
https://issues.apache.org/jira/browse/SOLR-2020
https://issues.apache.org/jira/browse/SOLR-3532

In particular: SolrJ clients should now call shutdown() on their 
SolrServer object to let it know they don't want to re-use any existing 
connections anymore, and when Solr internally uses SolrJ to talk to 
other nodes in SolrCloud it should be doing this (as of 4.0-ALPHA)


-Hoss


RE: Using Solr-3304

2012-09-21 Thread Eric Khoury

Thanks David, I'll play around with it.  I appreciate the help,Eric.
 > Date: Fri, 21 Sep 2012 14:47:36 -0700
> From: dsmi...@mitre.org
> To: solr-user@lucene.apache.org
> Subject: RE: Using Solr-3304
> 
> When I said "boundary" I meant worldBounds.
> 
> Oh, and set distErrPct="0" to get precise shapes; the default is non-zero.
> It'll use more disk space of course, and all the more reason to carefully
> choose your world bounds carefully.
> 
> 
> 
> -
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009490.html
> Sent from the Solr - User mailing list archive at Nabble.com.
  

RE: Using Solr-3304

2012-09-21 Thread David Smiley (@MITRE.org)
When I said "boundary" I meant worldBounds.

Oh, and set distErrPct="0" to get precise shapes; the default is non-zero.
It'll use more disk space of course, and all the more reason to carefully
choose your world bounds carefully.



-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009490.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr hanging, possible memory issues?

2012-09-21 Thread Otis Gospodnetic
Hi Kevin,

Try taking a heap dump snapshot and analyzing it with something like
YourKit to see what's eating the memory.
SPM for Solr (see signature) will show you JVM heap and GC
numbers/graphs/activity that may shed some light on the issue.
You could also turn on verbose GC logging and/or use jstat to
understang GC activity some more.

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Fri, Sep 21, 2012 at 4:50 PM, Kevin Goess  wrote:
> We're running Solr 3.4, a fairly out-of-the-box solr/jetty setup, with
> -Xms1200m -Xmx3500m .  When we start pushing more than a couple documents
> per second at it (PDFs, they go through SolrCell/Tika/PDFBox), the java
> process hangs, becoming completely unresponsive.
>
> We thought it might be an issue with PDFBox was problematic with lots of
> blocked threads, but now we've also noticed that all the thread dumps show
> a similar situation in the heap, where three of the areas are at 99%, even
> though we don't get any out-of-memory messages.
>
> Heap
>PSYoungGen  total 796416K, used 386330K
> eden space 398208K, 97% used
> from space 398208K, 0% used
> to   space 398208K, 0% used
>PSOldGen
> object space 2389376K, 99% used
>PSPermGen
> object space 53824K, 99% used
>
> We've also just noticed that after restarting Solr, the PermGen space grows
> steadily until it hits 99% and then just stays there.
>
> 1) Is that behavior of PermGen normal, growing steadily to 99% and then
> staying there?  Apparently we can increase PermSpace
> to -XX:MaxPermSize=128M , but if there's a memory leak that only postpones
> the problem.
>
> 2) If all three of those indicators are pegged at 99%, I would think that
> the JVM would throw an out-of-memory exception, rather than just
> withdrawing into its own navel, is that expected behavior or is it
> indicative of anything else?
>
> Any tips would be greatly appreciated, thanks!
>
> --
> Kevin M. Goess
> Software Engineer
> Berkeley Electronic Press
> kgo...@bepress.com
>
> 510-665-1200 x179
> www.bepress.com
>
> bepress: sustainable scholarly publishing


RE: Using Solr-3304

2012-09-21 Thread David Smiley (@MITRE.org)
If you can stick to two dimensions then great.  Remember to set the boundary
attribute on the field type as I described so that spatial knows the
numerical boundaries that all the data must fit in.  e.g. boundary="0 0
10 2.5" (substituting whatever appropriate number of time units you need
for 10 there).




-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009488.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Using Solr-3304

2012-09-21 Thread Eric Khoury

The requirments have evolved.  :-)  This is still the best solution for my 
needs, I'm close, I belive this can work.  Removing quality from the equation, 
I have to deal with pairs of GroupIds and Times.  If I set the Y access to 0, 
as you mentioned, can I create a pair of X values with the groupId as a whole 
part and the ticks as decimals?  In other word, GroupId.StartTicks 0 GroupId.EndTicks 2.5.
 > From: dsmi...@mitre.org
> To: solr-user@lucene.apache.org
> Subject: Re: Using Solr-3304
> Date: Fri, 21 Sep 2012 21:07:21 +
> 
> Spatial doesn't (yet) support 3d.  If you have multi-value relationships 
> across all 3 parameters you mentioned, then you're a bit stuck.  I thought 
> you had 1d (time) multi-value ranges without needing to correlate that to 
> other numeric ranges that are also multi-value.
> 
> On Sep 21, 2012, at 5:03 PM, Eric Khoury wrote:
> 
> > 
> > I have to deal with 3 parameters, time filtering, a groupid (1 to 2000) and 
> > a quality value (1 to 5), and was hoping to use  a X format = Group.Ticks 
> > and Y = quality level, where ticks is the number of ticks for a given time, 
> > rounded to the minute.  In other words, my field indexing would look like: 
> > 45.634801234 1.5 45.634805667 2.5.  I 
> > guess I'm missing something, as I thought that would define a rectangle.  
> > Where do the min max values come into play?  > Date: Fri, 21 Sep 2012 
> > 13:55:24 -0700
> >> From: dsmi...@mitre.org
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Using Solr-3304
> >> 
> >> For your use-case of time ranges, set geo="false" (as you've done).  At 
> >> this point you have a quad tree but it doesn't (yet) work properly for the 
> >> default min & max numbers that a double can store, so you need to specify 
> >> the boundary rectangle explicitly and to the particular numbers for your 
> >> use-case.  Use 0 for the 'Y's.  Think about the smallest granularity of 
> >> time you need (a minute?) and what the earliest time you need to recognize 
> >> and the furthest out as measured in your time granularity (in minutes?).  
> >> The boundary minX can be zero which will be your epoch, and the maxX will 
> >> be the farthest out in time you can go -- who knows.  Set maxDetailDist to 
> >> 1.
> >> 
> >> On Sep 21, 2012, at 4:44 PM, Eric Khoury [via Lucene] wrote:
> >> 
> >> 
> >> David, I tried increasing the maxDetailDist, as I need 9 decimal value 
> >> precision. >> class="solr.SpatialRecursivePrefixTreeFieldType" geo="false" 
> >> distErrPct="0.025" maxDetailDist="0.1" />  But when I do, I get 
> >> the following error: Data "SEVERE: org.apache.solr.common.SolrException: 
> >> ERROR: [doc=MV0005] Error adding field 'rectangle'='45.634801234 
> >> 1.5 45.634805667 2.5' msg=Y values [-1.7976931348623157E308 to Infinity] 
> >> not in boundary 
> >> Rect(minX=-1.7976931348623157E308,maxX=1.7976931348623157E308,minY=-1.7976931348623157E308,maxY=1.7976931348623157E308)"
> >>  string
> >> Any ideas?Eric. PS: what does geo=true\false change?
> >>> Date: Fri, 21 Sep 2012 10:34:07 -0700
> >> 
> >>> From: [hidden 
> >>> email]
> >>> To: [hidden 
> >>> email]
> >>> Subject: Re: Using Solr-3304
> >>> 
> >>> http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
> >>> Definitely needs some updating; I will try to get to that this weekend.
> >>> 
> >>> 
> >>> 
> >>> -
> >>> Author: 
> >>> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> >>> --
> >>> View this message in context: 
> >>> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009454.html
> >>> Sent from the Solr - User mailing list archive at 
> >>> Nabble.com.
> >> 
> >> 
> >> 
> >> If you reply to this email, your message will be added to the discussion 
> >> below:
> >> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009479.html
> >> To unsubscribe from Solr 4.0 - Join performance, click 
> >> here.
> >> NAML
> >> 
> >> 
> >> 
> >> 
> >> 
> >> -
> >> Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> >> --
> >> View this message in context: 
> >> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009483.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >   
> 
  

Re: Using Solr-3304

2012-09-21 Thread Smiley, David W.
Spatial doesn't (yet) support 3d.  If you have multi-value relationships across 
all 3 parameters you mentioned, then you're a bit stuck.  I thought you had 1d 
(time) multi-value ranges without needing to correlate that to other numeric 
ranges that are also multi-value.

On Sep 21, 2012, at 5:03 PM, Eric Khoury wrote:

> 
> I have to deal with 3 parameters, time filtering, a groupid (1 to 2000) and a 
> quality value (1 to 5), and was hoping to use  a X format = Group.Ticks and Y 
> = quality level, where ticks is the number of ticks for a given time, rounded 
> to the minute.  In other words, my field indexing would look like:  name="RightsData2">45.634801234 1.5 45.634805667 2.5.  I guess I'm 
> missing something, as I thought that would define a rectangle.  Where do the 
> min max values come into play?  > Date: Fri, 21 Sep 2012 13:55:24 -0700
>> From: dsmi...@mitre.org
>> To: solr-user@lucene.apache.org
>> Subject: Re: Using Solr-3304
>> 
>> For your use-case of time ranges, set geo="false" (as you've done).  At this 
>> point you have a quad tree but it doesn't (yet) work properly for the 
>> default min & max numbers that a double can store, so you need to specify 
>> the boundary rectangle explicitly and to the particular numbers for your 
>> use-case.  Use 0 for the 'Y's.  Think about the smallest granularity of time 
>> you need (a minute?) and what the earliest time you need to recognize and 
>> the furthest out as measured in your time granularity (in minutes?).  The 
>> boundary minX can be zero which will be your epoch, and the maxX will be the 
>> farthest out in time you can go -- who knows.  Set maxDetailDist to 1.
>> 
>> On Sep 21, 2012, at 4:44 PM, Eric Khoury [via Lucene] wrote:
>> 
>> 
>> David, I tried increasing the maxDetailDist, as I need 9 decimal value 
>> precision.> class="solr.SpatialRecursivePrefixTreeFieldType" geo="false" 
>> distErrPct="0.025" maxDetailDist="0.1" />  But when I do, I get the 
>> following error: Data "SEVERE: org.apache.solr.common.SolrException: ERROR: 
>> [doc=MV0005] Error adding field 'rectangle'='45.634801234 1.5 
>> 45.634805667 2.5' msg=Y values [-1.7976931348623157E308 to Infinity] not in 
>> boundary 
>> Rect(minX=-1.7976931348623157E308,maxX=1.7976931348623157E308,minY=-1.7976931348623157E308,maxY=1.7976931348623157E308)"
>>  string
>> Any ideas?Eric. PS: what does geo=true\false change?
>>> Date: Fri, 21 Sep 2012 10:34:07 -0700
>> 
>>> From: [hidden 
>>> email]
>>> To: [hidden 
>>> email]
>>> Subject: Re: Using Solr-3304
>>> 
>>> http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
>>> Definitely needs some updating; I will try to get to that this weekend.
>>> 
>>> 
>>> 
>>> -
>>> Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
>>> --
>>> View this message in context: 
>>> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009454.html
>>> Sent from the Solr - User mailing list archive at 
>>> Nabble.com.
>> 
>> 
>> 
>> If you reply to this email, your message will be added to the discussion 
>> below:
>> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009479.html
>> To unsubscribe from Solr 4.0 - Join performance, click 
>> here.
>> NAML
>> 
>> 
>> 
>> 
>> 
>> -
>> Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009483.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 



RE: Using Solr-3304

2012-09-21 Thread Eric Khoury

I have to deal with 3 parameters, time filtering, a groupid (1 to 2000) and a 
quality value (1 to 5), and was hoping to use  a X format = Group.Ticks and Y = 
quality level, where ticks is the number of ticks for a given time, rounded to 
the minute.  In other words, my field indexing would look like: 45.634801234 1.5 45.634805667 2.5.  I guess I'm 
missing something, as I thought that would define a rectangle.  Where do the 
min max values come into play?  > Date: Fri, 21 Sep 2012 13:55:24 -0700
> From: dsmi...@mitre.org
> To: solr-user@lucene.apache.org
> Subject: Re: Using Solr-3304
> 
> For your use-case of time ranges, set geo="false" (as you've done).  At this 
> point you have a quad tree but it doesn't (yet) work properly for the default 
> min & max numbers that a double can store, so you need to specify the 
> boundary rectangle explicitly and to the particular numbers for your 
> use-case.  Use 0 for the 'Y's.  Think about the smallest granularity of time 
> you need (a minute?) and what the earliest time you need to recognize and the 
> furthest out as measured in your time granularity (in minutes?).  The 
> boundary minX can be zero which will be your epoch, and the maxX will be the 
> farthest out in time you can go -- who knows.  Set maxDetailDist to 1.
> 
> On Sep 21, 2012, at 4:44 PM, Eric Khoury [via Lucene] wrote:
> 
> 
> David, I tried increasing the maxDetailDist, as I need 9 decimal value 
> precision. class="solr.SpatialRecursivePrefixTreeFieldType" geo="false" 
> distErrPct="0.025" maxDetailDist="0.1" />  But when I do, I get the 
> following error: Data "SEVERE: org.apache.solr.common.SolrException: ERROR: 
> [doc=MV0005] Error adding field 'rectangle'='45.634801234 1.5 
> 45.634805667 2.5' msg=Y values [-1.7976931348623157E308 to Infinity] not in 
> boundary 
> Rect(minX=-1.7976931348623157E308,maxX=1.7976931348623157E308,minY=-1.7976931348623157E308,maxY=1.7976931348623157E308)"
>  string
> Any ideas?Eric. PS: what does geo=true\false change?
>  > Date: Fri, 21 Sep 2012 10:34:07 -0700
> 
> > From: [hidden 
> > email]
> > To: [hidden 
> > email]
> > Subject: Re: Using Solr-3304
> >
> > http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
> > Definitely needs some updating; I will try to get to that this weekend.
> >
> >
> >
> > -
> >  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> > --
> > View this message in context: 
> > http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009454.html
> > Sent from the Solr - User mailing list archive at 
> > Nabble.com.
> 
> 
> 
> If you reply to this email, your message will be added to the discussion 
> below:
> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009479.html
> To unsubscribe from Solr 4.0 - Join performance, click 
> here.
> NAML
> 
> 
> 
> 
> 
> -
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009483.html
> Sent from the Solr - User mailing list archive at Nabble.com.
  

Re: Using Solr-3304

2012-09-21 Thread David Smiley (@MITRE.org)
For your use-case of time ranges, set geo="false" (as you've done).  At this 
point you have a quad tree but it doesn't (yet) work properly for the default 
min & max numbers that a double can store, so you need to specify the boundary 
rectangle explicitly and to the particular numbers for your use-case.  Use 0 
for the 'Y's.  Think about the smallest granularity of time you need (a 
minute?) and what the earliest time you need to recognize and the furthest out 
as measured in your time granularity (in minutes?).  The boundary minX can be 
zero which will be your epoch, and the maxX will be the farthest out in time 
you can go -- who knows.  Set maxDetailDist to 1.

On Sep 21, 2012, at 4:44 PM, Eric Khoury [via Lucene] wrote:


David, I tried increasing the maxDetailDist, as I need 9 decimal value 
precision.  But when I do, I get the following error: Data 
"SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=MV0005] 
Error adding field 'rectangle'='45.634801234 1.5 45.634805667 2.5' msg=Y values 
[-1.7976931348623157E308 to Infinity] not in boundary 
Rect(minX=-1.7976931348623157E308,maxX=1.7976931348623157E308,minY=-1.7976931348623157E308,maxY=1.7976931348623157E308)"
 string
Any ideas?Eric. PS: what does geo=true\false change?
 > Date: Fri, 21 Sep 2012 10:34:07 -0700

> From: [hidden 
> email]
> To: [hidden email]
> Subject: Re: Using Solr-3304
>
> http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
> Definitely needs some updating; I will try to get to that this weekend.
>
>
>
> -
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009454.html
> Sent from the Solr - User mailing list archive at 
> Nabble.com.



If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009479.html
To unsubscribe from Solr 4.0 - Join performance, click 
here.
NAML





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009483.html
Sent from the Solr - User mailing list archive at Nabble.com.

solr hanging, possible memory issues?

2012-09-21 Thread Kevin Goess
We're running Solr 3.4, a fairly out-of-the-box solr/jetty setup, with
-Xms1200m -Xmx3500m .  When we start pushing more than a couple documents
per second at it (PDFs, they go through SolrCell/Tika/PDFBox), the java
process hangs, becoming completely unresponsive.

We thought it might be an issue with PDFBox was problematic with lots of
blocked threads, but now we've also noticed that all the thread dumps show
a similar situation in the heap, where three of the areas are at 99%, even
though we don't get any out-of-memory messages.

Heap
   PSYoungGen  total 796416K, used 386330K
eden space 398208K, 97% used
from space 398208K, 0% used
to   space 398208K, 0% used
   PSOldGen
object space 2389376K, 99% used
   PSPermGen
object space 53824K, 99% used

We've also just noticed that after restarting Solr, the PermGen space grows
steadily until it hits 99% and then just stays there.

1) Is that behavior of PermGen normal, growing steadily to 99% and then
staying there?  Apparently we can increase PermSpace
to -XX:MaxPermSize=128M , but if there's a memory leak that only postpones
the problem.

2) If all three of those indicators are pegged at 99%, I would think that
the JVM would throw an out-of-memory exception, rather than just
withdrawing into its own navel, is that expected behavior or is it
indicative of anything else?

Any tips would be greatly appreciated, thanks!

-- 
Kevin M. Goess
Software Engineer
Berkeley Electronic Press
kgo...@bepress.com

510-665-1200 x179
www.bepress.com

bepress: sustainable scholarly publishing


RE: Using Solr-3304

2012-09-21 Thread Eric Khoury

David, I tried increasing the maxDetailDist, as I need 9 decimal value 
precision.  But when I do, I get the following error: Data 
"SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=MV0005] 
Error adding field 'rectangle'='45.634801234 1.5 45.634805667 2.5' msg=Y values 
[-1.7976931348623157E308 to Infinity] not in boundary 
Rect(minX=-1.7976931348623157E308,maxX=1.7976931348623157E308,minY=-1.7976931348623157E308,maxY=1.7976931348623157E308)"
 string
Any ideas?Eric. PS: what does geo=true\false change?
 > Date: Fri, 21 Sep 2012 10:34:07 -0700
> From: dsmi...@mitre.org
> To: solr-user@lucene.apache.org
> Subject: Re: Using Solr-3304
> 
> http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
> Definitely needs some updating; I will try to get to that this weekend.
> 
> 
> 
> -
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009454.html
> Sent from the Solr - User mailing list archive at Nabble.com.
  

Re: Help with Query syntax. How to make a query that works in between AND and OR.

2012-09-21 Thread cleonard
Otis Gospodnetic-5 wrote
> Hi,
> 
> I'm curious... why do you issue multiple queries for autocomplete
> purposes?
> Have you tried using Suggester?  May also want
> http://sematext.com/products/autocomplete/index.html which works
> nicely with Solr.
> 
> Otis
> Search Analytics - http://sematext.com/search-analytics/index.html
> Performance Monitoring - http://sematext.com/spm/index.html

I'm not doing multiple queries.  I'm trying to add some functionality and
not have to do multiple queries.  I've worked with the Suggester and so far
I've not been able to get that to do what I need.  I'll revisit the
suggester component and see if I can make it do what I need.

I have a kind of specialized search that I'm implenmenting.  It's actually
working great.  I just need this one more piece to make it just about
perfect.  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-with-Query-syntax-How-to-make-a-query-that-works-in-between-AND-and-OR-tp4009451p4009470.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Backup strategy for SolrCloud

2012-09-21 Thread Chris Hostetter

: The ReplicationHandler still works when you use SolrCloud, right? can't you
: just replicate from one (or N, depending on the number of shards) of the
: nodes in the cluster? That way you could keep a Solr instance that's only
: used to replicate the indexes, and you could have it somewhere else (other

if you only replicated from one node in the cluster, you would only get 
backups of the shards that exist on that cluster -- not any shards that 
only exist on other machines.

I think that's what Tommaso was suggesting: a tool/client that could ask 
ZK about the cluster state, and then use that to generate a list of 
collection => shards+nodes so that it could ensure it SnapPulled from some 
node a copy of every shard for every collection.

Of course: if your collections are big enough that you are sharding, 
trying ot have a single backup server probably wouldn't be viable anyway, 
so a tool like that would need options to split the work up.

An alternate strategy might be to leverage the existing backup 
functionality of the ReplicatoinHandler, but add logic to make it zk/cloud 
aware, so that a single request to "backup" for a collection would 
propogate to all of the shard leaders to (delegate to a node to) backup 
that shard -- then you just need to configure the backup location for the 
ReplicationHandler to be a directory that is on your NAS.


-Hoss


Re: Solr Swap Function doesn't work when using Solr Cloud Beta

2012-09-21 Thread Chris Hostetter

: Below is my solr.xml configuration, and already set persistent to true.
...
: Then publish 1 record to test1, and query. it's ok now.

Ok, first off -- please provide more details on how exactly you are 
running Solr.  Your initial email said...

>>> In Solr 3.6, core swap function works good. After switch to use Solr 4.0
>>> Beta, and found it doesn't work well.

...but based on your solr.xml file and your logs, it appears you are now 
trying to use some of the ZooKeeper/SolrCloud features that didn't even 
exist in Solr 3.6, so it's kind of an apples to oranges comparison.  i'm 
pretty sure that for a simple multicore setup, SWAP still works exactly as 
it did in Solr 3.6.

Wether SWAP works with ZooKeeper/SolrCloud is something i'm not really 
clear on -- mainly because i'm not sure what it should mean conceptually.  
Should the two SolrCores swap which collections they are apart of? what 
happens if the doc->shard assignment for the two collections means the 
same docs woulnd't wind up im those SolrCores? what if the SolrCores are 
two different shards of the same collection befor teh SWAP?

FWIW: It wasn't clear from your messsage *how* you had your SolrCloud 
system setup, but it appears from your pre-swap log messages that you are 
running a single node, so I tried to reproduce the behavior you were 
seeing by seting up a solr home dir like you described, and then 
running...

java -Dsolr.solr.home=swap-test/ -DzkRun -Dbootstrap_conf=true -DnumShards=1 
-jar start.jar

...because that was my best guess as to what you were running. But even 
then i couldn't get the behavior you described after the swap...

: And found the shardurl is different with the log which search before swap.
: It’s shard.url=host1:18000/solr/test1-ondeck/| host1:18000/solr/test1/.

...what i observed after the swap, is that it apperaed as if hte SWAP had 
no effect to client requests, because a client request to 
"/solr/test1/select?q=*:*" was distributed under the covers to 
"/solr/test1-ondeck/..." which is the new name for the core where the doc 
had been indexed...

Sep 21, 2012 12:03:32 PM org.apache.solr.core.SolrCore execute
INFO: [test1-ondeck] webapp=/solr path=/select 
params={distrib=false&wt=javabin&rows=10&version=2&df=text&fl=id,score&shard.url=frisbee:8983/solr/test1-ondeck/&NOW=1348254212677&start=0&q=*:*&isShard=true&fsv=true}
 hits=1 status=0 QTime=0 
Sep 21, 2012 12:03:32 PM org.apache.solr.core.SolrCore execute
INFO: [test1-ondeck] webapp=/solr path=/select 
params={df=text&shard.url=frisbee:8983/solr/test1-ondeck/&NOW=1348254212677&q=*:*&ids=SOLR1000&distrib=false&isShard=true&wt=javabin&rows=10&version=2}
 status=0 QTime=0 
Sep 21, 2012 12:03:32 PM org.apache.solr.core.SolrCore execute
INFO: [test1] webapp=/solr path=/select params={q=*:*} status=0 QTime=10 

...I'm guessing this behavior is because nothing in the SWAP call bothered 
to tell ZK that these two SolrCores swaped names, so when asking to query 
the "test1" collection, ZK says "ok, well the name of the core that's part 
of that collection is test1-ondeck" and that's where the query was routed.

I only saw behavior similar to what you described (of the "shard.url" 
refering to both SolrCores) after i restarted solr -- presumably because 
as the solrCores started up, they both notified ZK about their existence, 
and said what collection they (thought they) are a part of, but ZK also 
already thinks they are each a part of a differet collection as well 
(because nothing bothered to tell ZK otherwise).

So the long and short of it seems to be...

* CoreAdminHandler's SWAP is poorly defined if you are using SolrCloud 
(most likely: so is RENAME and ALIAS) - i've opened SOLR-3866.

* This doesn't seem like a regression bug from Solr 3.6, because as 
far as i can tell SWAP still works as well as it did in 3.6.



-Hoss

Re: Problem to start solr-4.0.0-BETA with tomcat-6.0.20

2012-09-21 Thread sabman
I ma having the same problem after upgrading from 3.2 to 4.0. I have the
sharedLib="lib" added in the tag and I still get the same error. I deleted
all the files from the SOLR home directory and copied the files from 4.0
package. I still see this error. Where else could the old lib files be
referenced?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-to-start-solr-4-0-0-BETA-with-tomcat-6-0-20-tp4002646p4009466.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to boost date field while boosting a text field?

2012-09-21 Thread Otis Gospodnetic
Hi,

If you want to get results sorted by published_date field, then you
need to sort by it &sort=published_date+ASC. And then you don't really
need field boosting/weights.

Otis
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Fri, Sep 21, 2012 at 2:12 PM, srinalluri  wrote:
> The following is the working query with more weight to title. I am using
> default parser. But the published_date of the results of this query is not
> in order. I want date is in order.
>
> ?q=content_type:video AND (title:("obama budget")^2 OR
> export_headline:("obama budget"))&fl=title,score,export_headline
>
> I want to give weight to date field also. I can't use 'sort' as I used the
> '^2'. What are the changes needed to the above query
>
> thanks
> Srini
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-boost-date-field-while-boosting-a-text-field-tp4009455.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help with Query syntax. How to make a query that works in between AND and OR.

2012-09-21 Thread Otis Gospodnetic
Hi,

I'm curious... why do you issue multiple queries for autocomplete purposes?
Have you tried using Suggester?  May also want
http://sematext.com/products/autocomplete/index.html which works
nicely with Solr.

Otis
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Fri, Sep 21, 2012 at 2:50 PM, cleonard  wrote:
> I've played with the mm parameter quite a bit.  It does sort of do what I
> need if I do multiple queries decreasing the mm parameter with each call.
> However, I'm doing this for a web form auto complete or suggester so I
> really want to make this happen in a single request if at all possible.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Help-with-Query-syntax-How-to-make-a-query-that-works-in-between-AND-and-OR-tp4009451p4009460.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help with Query syntax. How to make a query that works in between AND and OR.

2012-09-21 Thread cleonard
I've played with the mm parameter quite a bit.  It does sort of do what I
need if I do multiple queries decreasing the mm parameter with each call. 
However, I'm doing this for a web form auto complete or suggester so I
really want to make this happen in a single request if at all possible.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-with-Query-syntax-How-to-make-a-query-that-works-in-between-AND-and-OR-tp4009451p4009460.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help with Query syntax. How to make a query that works in between AND and OR.

2012-09-21 Thread Shawn Heisey

On 9/21/2012 11:22 AM, cleonard wrote:

Now a mistyped term is no problem.  I still get results.  The issue now is
that I get too many results back.  What I want is something that effectively
does an AND if a term is matched, but does an OR when a term is not found.
To say it a differnt way -- If a term is found I only want results that
contain that term.


It won't be precisely what you describe, but you may want to switch to 
edismax and employ the mm parameter.


http://wiki.apache.org/solr/ExtendedDisMax

http://wiki.apache.org/solr/ExtendedDisMax#mm_.28Minimum_.27Should.27_Match.29

Thanks,
Shawn



RE: Using Solr-3304

2012-09-21 Thread Eric Khoury

Thanks David, that's exactly what I needed.  One thing, from my experiments, 
the order seems to be Xmin Ymin Xmax Ymax for both the indexing and the query.
 Eric.> Date: Fri, 21 Sep 2012 10:34:07 -0700
> From: dsmi...@mitre.org
> To: solr-user@lucene.apache.org
> Subject: Re: Using Solr-3304
> 
> http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
> Definitely needs some updating; I will try to get to that this weekend.
> 
> 
> 
> -
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009454.html
> Sent from the Solr - User mailing list archive at Nabble.com.
  

How to boost date field while boosting a text field?

2012-09-21 Thread srinalluri
The following is the working query with more weight to title. I am using
default parser. But the published_date of the results of this query is not
in order. I want date is in order.

?q=content_type:video AND (title:("obama budget")^2 OR
export_headline:("obama budget"))&fl=title,score,export_headline

I want to give weight to date field also. I can't use 'sort' as I used the
'^2'. What are the changes needed to the above query 

thanks
Srini



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-boost-date-field-while-boosting-a-text-field-tp4009455.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using Solr-3304

2012-09-21 Thread David Smiley (@MITRE.org)
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
Definitely needs some updating; I will try to get to that this weekend.



-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009454.html
Sent from the Solr - User mailing list archive at Nabble.com.


Help with Query syntax. How to make a query that works in between AND and OR.

2012-09-21 Thread cleonard
I have a search text field that contains all the search terms.  I'm taking
user input and breaking it up into tokens of term1, term2, term3, etc and
the then submitting to Dismax.  

q=search_text:term1* AND search_text:term2* AND search_text:term3*

This works great.  The problem is when a user mistypes a term.  Then nothing
is found.  This is not good. 

To address this I'm now doing OR instead

q=search_text:term1* OR search_text:term2* OR search_text:term3*

Now a mistyped term is no problem.  I still get results.  The issue now is
that I get too many results back.  What I want is something that effectively
does an AND if a term is matched, but does an OR when a term is not found. 
To say it a differnt way -- If a term is found I only want results that
contain that term.  

I've tried many different boolean expressions to no success.  I've spent
hours searching for a solution, but so far I've not found the answer.  I
feel that there is a simple solution, so I'm hoping that someone can
enlighten me here.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-with-Query-syntax-How-to-make-a-query-that-works-in-between-AND-and-OR-tp4009451.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 3.6 observe connections in CLOSE_WAIT state

2012-09-21 Thread Shawn Heisey

On 9/21/2012 11:20 AM, Shawn Heisey wrote:

On 9/20/2012 1:02 AM, Alok Bhandari wrote:
I am using solr 3.6.0 , I have observed many connection in CLOSE_WAIT 
state

This is a "me too" email.


One difference - I am running 3.5.0.

Thanks,
Shawn



Re: Solr 3.6 observe connections in CLOSE_WAIT state

2012-09-21 Thread Shawn Heisey

On 9/20/2012 1:02 AM, Alok Bhandari wrote:

Hello,

I am using solr 3.6.0 , I have observed many connection in CLOSE_WAIT state
after using solr server for some time. On further analysis and googling
found that I need to close the idle connections from the client which is
connecting to solr to query data and it does reduce the number of CLOSE_WAIT
connections but still some connection remain in that state.

I am using 2 shards and one observation is that if I don't use shards then I
am getting 0 CLOSE_WAIT connections. Need help of this as we need to use
distributed search using shards.


This is a "me too" email.  We've got haproxy sitting in front of Solr.  
A little more than half of the connections in the various WAIT states 
are from the webserver SolrJ clients to haproxy listening on ports 8983 
and 8984.  This is SolrJ to haproxy, so there is a non-Solr component.


The other connections in WAIT states are connections to Solr listening 
on port 8981.  Because haproxy runs on one of the Solr servers, the 
netstat table can't tell me which of those connections are from haproxy 
and which are distributed search, but when I check the lsof output for 
WAIT states on port 8981, the connections owned by haproxy are never in 
a WAIT state.  All of the connections in WAIT states are owned by java.  
One of the java processes is Solr, the other is a SolrJ application that 
handles index updates and deletes.


This is looking like a problem in SolrJ, Solr, or perhaps both.  My 
setup is not large enough for these connections to really be a problem, 
but in a big enough installation, you might find the machine starved for 
TCP ports.


netstat output:
http://dpaste.com/hold/804126/

lsof and ps output:
http://dpaste.com/hold/804127/

I've got four servers here, all running Solr -- idxa1, idxa2, idxb1, and 
idxb2.  The haproxy load balancer and the SolrJ indexing app are 
currently running on idxa1, which is also where I obtained the netstat, 
lsof, and ps output.  The load balancer talks to a special broker core 
which contains no index and has the shards parameter in the request 
handler definition.  The requests made by the SolrJ clients in the 
webservers do not include a shards parameter.


[root@idxa1 solr]# java -version
java version "1.6.0_29"
Java(TM) SE Runtime Environment (build 1.6.0_29-b11)
Java HotSpot(TM) 64-Bit Server VM (build 20.4-b02, mixed mode)

[root@idxa1 solr]# uname -a
Linux idxa1 2.6.32-131.17.1.el6.x86_64 #1 SMP Thu Oct 6 19:24:09 BST 
2011 x86_64 x86_64 x86_64 GNU/Linux


Thanks,
Shawn



Using Solr-3304

2012-09-21 Thread Eric Khoury

Hi David, I've installed the latest nightly, and am trying to use the spacial 
queries.I've defined a field called Rectangle as such: Can you 
provide some guidance on how to index a field and how to query it? Indexing: 
X1,Y1,X2,Y2?Querying:? Thanks!Eric. 
 

Re: DIH problem

2012-09-21 Thread Mikhail Khludnev
Gian,

The only way to handle it is to provide a test case and attach to jira.

Thanks

On Fri, Sep 21, 2012 at 6:03 PM, Gian Marco Tagliani
wrote:

> Hi,
> I'm updating my Solr from version 3.4 to version 3.6.1 and I'm facing a
> little problem with the DIH.
>
> In the delta-import I'm using the /parentDeltaQuery/ feature of the DIH to
> update the parent entity.
> I don't think this is working properly.
>
> I realized that it's just executing the /parentDeltaQuery/ with the first
> record of the /deltaQuery /result.
> Comparing the code with the previous versions I noticed that the
> rowIterator was never set to null.
>
> To solve this I wrote a simple patch:
>
> -
> Index: solr/contrib/**dataimporthandler/src/java/**
> org/apache/solr/handler/**dataimport/**EntityProcessorBase.java
> ==**==**===
> --- solr/contrib/**dataimporthandler/src/java/**org/apache/solr/handler/**
> dataimport/**EntityProcessorBase.java (revision 31454)
> +++ solr/contrib/**dataimporthandler/src/java/**org/apache/solr/handler/**
> dataimport/**EntityProcessorBase.java (working copy)
> @@ -121,6 +121,7 @@
>  if (rowIterator.hasNext())
>return rowIterator.next();
>  query = null;
> +rowIterator = null;
>  return null;
>} catch (Exception e) {
>  SolrException.log(log, "getNext() failed for query '" + query +
> "'", e);
> -
>
>
> Do you think this is correct?
>
> Thanks for your help
>
> --
> Gian Marco Tagliani
>
>
>
>


-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics


 


Re: what happends with slave during repliacation?

2012-09-21 Thread yangqian_nj
Hi Bernd,

You mentioned: "Only one slave is online the other is for backup. The backup
gets replicated first.
After that the servers will be switched and the online becomes backup. "

Do you please let us know how to do you do the Switch? We use SWAP to switch
in solr cloud. After SWAP, when we query, from the tomcat log, we could see
the query actually go to both cores for some reason.

Thanks,
Amanda





--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-happends-with-slave-during-repliacation-tp4009100p4009417.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How does Solr handle overloads so well?

2012-09-21 Thread Mike Gagnon
Thanks. If Solr doesn't have any special logic for dealing with
algorithmic-complexity attack-like overloads, then it sounds like Jetty and
Tomcat are responsible for Solr's unusually good performance in my
experiments (unusual compared to other non-Java web applications).

Cheers,
Mike

On Wed, Sep 19, 2012 at 8:30 AM, Walter Underwood wrote:

> The front-end code protection that I mentioned was outside of Solr. At
> that time, requests with very large start values were slow, so we put code
> in the front end to never request those. Even if the user wanted page 5000
> of the results, they would get page 100.
>
> Now, those requests are fast, so that external protection is not needed.
>
> I was running overload tests this summer and could not get Solr to behave
> badly. The throughput would drop off with overload, but not too bad. This
> was all with simple queries on a 1.2M doc index.
>
> wunder
> Walter Underwood
> Search Guy, Chegg
>
> On Sep 19, 2012, at 8:20 AM, Erik Hatcher wrote:
>
> > How are you triggering an infinite loop in your requests to Solr?
> >
> >   Erik
> >
> > On Sep 19, 2012, at 11:12 , Mike Gagnon wrote:
> >
> >> [ I am sorry for breaking the thread, but my inbox has neither received
> my
> >> original post to the mailing list, nor Otis's response (so I can't
> reply to
> >> his response) ]
> >>
> >> Thanks a bunch for your response Otis.  Let me more thoroughly explain
> my
> >> experimental workload and why I am surprised Solr works so well.
> >>
> >> The most important characteristic of my workload is that many of the
> >> requests (60 per second) cause infinite loops within Solr. That is,
> each of
> >> those requests causes a separate infinite loop within it's request
> context.
> >>
> >> This workload is similar to an algorithmic-complexity attack --- a type
> of
> >> DoS.  In every web-app stack I've tested (except Solr/Jetty and
> >> Solr/Tomcat) such workloads cause an immediate and complete denial of
> >> service. What happens for these vulnerable applications, is that the
> thread
> >> pool fills up with infinite loops, and incoming requests become
> rejected.
> >>
> >> But Solr manages to survive such an attack. My best guess is that Solr
> has
> >> an especially good overload strategy that quickly kicks out the infinite
> >> loop requests -- which lowers CPU contention, and allows other requests
> to
> >> be admitted.
> >>
> >> My first guess would be that Tomcat or Jetty is responsible for the good
> >> response to overload. However,
> >> there was a good discussion in 2008 on this mailing list about Solr
> >> Security:
> >>
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200811.mbox/browser
> >>
> >> In this discuss Walter Underwood commented: "We have protected against
> >> several different DoS problems in our front-end code."
> >>
> >> Perhaps it is these front-end defenses that help Solr survive my
> workloads?
> >>
> >> Thanks!
> >> Mike Gagnon
> >>
> >>
> >>> Hm, I'm not sure how to approach this. Solr is not alone here - there's
> >>> container like jetty, solr inside it and lucene inside solr.
> >>> Next, that index is rally small, so there is no disk IO. The
> request
> >>> rate is also not super high and if you did this over a fast connection
> >> then
> >>> there are also no issues with slow response writing or with having
> lots of
> >>> concurrent connections or running out of threads ...
> >>>
> >>> ...so it's not really that surprising solr keeps working :)
> >>>
> >>> But...tell us more.
> >>>
> >>> Otis
> >>> --
> >>> Performance Monitoring - http://sematext.com/spm
> >>>
> >>>
> >>>
> >>> On Sep 12, 2012 8:51 PM, "Mike Gagnon"  wrote:
> >>>
> >>> Hi,
> >>>
> >>> I have been studying how server software responds to requests that
> cause
> >>> CPU overloads (such as infinite loops).
> >>>
> >>> In my experiments I have observed that Solr performs unusually well
> when
> >>> subjected to such loads. Every other piece of web software I've
> >>> experimented with drops to zero service under such loads. Do you know
> how
> >>> Solr achieves such good performance? I am guessing that when Solr is
> >>> overload sheds load to make room for incoming requests, but I could not
> >>> find any documentation that describes Solr's overload strategy.
> >>>
> >>> Experimental setup: I ran Solr 3.1 on a 12-core machine with 12 GB ram,
> >>> using it index and search about 10,000 pages on MediaWiki. I test both
> >>> Solr+Jetty and Solr+Tomcat. I submitted a variety of Solr queries at a
> >> rate
> >>> of 300 requests per second. At the same time, I submitted "overload
> >>> requests" at a rate of 60 requests per second. Each overload request
> >> caused
> >>> an infinite loop in Solr via
> >>> https://issues.apache.org/jira/browse/SOLR-2631.
> >>>
> >>> With Jetty about 70% of non-overload requests completed --- 95% of
> >> requests
> >>> completing within 0.6 seconds.
> >>> With Tomcat about 34% of non-overload requests completed --- 95% of
> >>> req

DIH problem

2012-09-21 Thread Gian Marco Tagliani

Hi,
I'm updating my Solr from version 3.4 to version 3.6.1 and I'm facing a 
little problem with the DIH.


In the delta-import I'm using the /parentDeltaQuery/ feature of the DIH 
to update the parent entity.

I don't think this is working properly.

I realized that it's just executing the /parentDeltaQuery/ with the 
first record of the /deltaQuery /result.
Comparing the code with the previous versions I noticed that the 
rowIterator was never set to null.


To solve this I wrote a simple patch:

-
Index: 
solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/EntityProcessorBase.java

===
--- 
solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/EntityProcessorBase.java 
(revision 31454)
+++ 
solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/EntityProcessorBase.java 
(working copy)

@@ -121,6 +121,7 @@
 if (rowIterator.hasNext())
   return rowIterator.next();
 query = null;
+rowIterator = null;
 return null;
   } catch (Exception e) {
 SolrException.log(log, "getNext() failed for query '" + query 
+ "'", e);

-


Do you think this is correct?

Thanks for your help

--
Gian Marco Tagliani





RE: dih groovy script question

2012-09-21 Thread Moore, Gary
Looks like some sort of foul-up with Groovy versions and Solr 3.6.1 as  I had 
to roll back to Groovy 1.7.10 to get this to work.   Started with Groovy 2 and 
then 1.8 before 1.7.10.   What's odd is that I implemented the same calls made 
in ScriptTransformer.java in a test program and they worked fine with all 
Groovy versions.  Can't imagine what the root cause might be -- Groovy 
implements jsr223 differently in later versions?  I suppose to find out I could 
compile Solr with my jdk but  time to march on. ;)
Gary

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Saturday, September 15, 2012 9:01 AM
To: solr-user@lucene.apache.org
Subject: Re: dih groovy script question

Stab in the dark... This looks like you're somehow getting the wrong Groovy 
jars. Can you print out the Groovy version as a test? Perhaps you have one 
groovy version in your command-line and copied a different version into the 
libraries Solr knows about?

Because this looks like a pure Groovy error

Best
Erick

On Thu, Sep 13, 2012 at 9:03 PM, Moore, Gary  wrote:
> I'm a bit stumped as to why I can't get a groovy script to run from the DIH.  
>  I'm sure it's something braindead I'm missing.   The script looks like this 
> in data-config.xml:
>
>  import java.security.MessageDigest
> import java.util.HashMap
> def createHashId(HashMaprow, 
> org.apache.solr.handler.dataimport.ContextImpl context )  {
>   // do groovy stuff
> return row } ]]> 
>
> When I run the import, I get the following error:
>
>
> Caused by: java.lang.NoSuchMethodException: No signature of method: 
> org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.createHashId() is 
> applicable for argument types: (java.util.HashMap, 
> org.apache.solr.handler.dataimport.ContextImpl) values: [[Format:Reports, 
> Credits:, EnteredBy:Corey Holland, ...], ...]
> at 
> org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.invokeImpl(GroovyScriptEngineImpl.java:364)
> at 
> org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.invokeFunction(GroovyScriptEngineImpl.java:160)
> ... 13 more
>
> The script runs fine from the shell so I don't believe there are any groovy 
> errors.  Thanks in advance for any tips.
> Gary
>
>
>
>
> This electronic message contains information generated by the USDA solely for 
> the intended recipients. Any unauthorized interception of this message or the 
> use or disclosure of the information it contains may violate the law and 
> subject the violator to civil or criminal penalties. If you believe you have 
> received this message in error, please notify the sender and delete the email 
> immediately.




Re: Backup strategy for SolrCloud

2012-09-21 Thread Tomás Fernández Löbbe
The ReplicationHandler still works when you use SolrCloud, right? can't you
just replicate from one (or N, depending on the number of shards) of the
nodes in the cluster? That way you could keep a Solr instance that's only
used to replicate the indexes, and you could have it somewhere else (other
datacenter for example). You would only be getting the "hard committed"
data, so it would not be the exact same index that you see in your cluster,
but if you commit frequently that shouldn't be a problem. You could inspect
ZK znodes to see exactly which nodes you need to replicate to get a full
backup of the index (one up node per shard).

Tomás


On Fri, Sep 21, 2012 at 3:56 AM, Tommaso Teofili
wrote:

> I also think that's a good question and currently without a "use this"
> answer :-)
> I think it shouldn't be hard to write a Solr service querying ZK and
> replicate both conf and indexes (via SnapPuller or ZK itself) so that such
> a node is responsible to back up the whole cluster in a secure storage
> (NAS, EBS, etc.).
>
> just my 2 cents,
> Tommaso
>
> 2012/9/21 Otis Gospodnetic 
>
> > Sounds good.
> >
> > But I think this was still a good question: Is there a way to back up
> > an index that lives in SolrCloud and if so, how?
> >
> > Otis
> > Search Analytics - http://sematext.com/search-analytics/index.html
> > Performance Monitoring - http://sematext.com/spm/index.html
> >
> >
> > On Thu, Sep 20, 2012 at 7:35 PM, Upayavira  wrote:
> > > What sorts of failures are you thinking of? Power loss? Index
> > > corruption? Server overload?
> > >
> > > Could you keep somewhat remote replicas of each shard, but not behind
> > > your load balancer?
> > >
> > > Then, should all your customer facing nodes go down, those replicas
> > > would be elected leaders. When you bring the customer facing ones back,
> > > they would justd pull their indexes from your remote replicas, and
> you'd
> > > be good to go once more.
> > >
> > > Upayavira
> > >
> > > On Thu, Sep 20, 2012, at 10:30 PM, jimtronic wrote:
> > >> I'm thinking about catastrophic failure and recovery. If, for some
> > >> reason,
> > >> the cluster should go down or become unusable and I simply want to
> bring
> > >> it
> > >> back up as quickly as possible, what's the best way to accomplish
> that?
> > >>
> > >> Maybe I'm thinking about this incorrectly? Is this not a concern?
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> View this message in context:
> > >>
> >
> http://lucene.472066.n3.nabble.com/Backup-strategy-for-SolrCloud-tp4009291p4009297.html
> > >> Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


Help with new Join Functionallity in Solr 4.0

2012-09-21 Thread Milen.Tilev
Dear Solr community,

I am rather new to Solr, however I already find it kind of attractive. We are 
developing a research application, which contains a Solr index with three 
different kinds of documents, here the basic idea:


-  A document of type "doc" consisting of fields id, docid, doctitle 
and some other metadata

-  A document of type "module" consisting of fields id, modid and text

-  A document of type "docmodule" consisting of fields id, docrefid, 
modrefid and some metadata about the relation between a document and a module; 
filed docrefid refers to the id of a "doc" document, while field modrefid 
contains the id of a "module" document

In other words, in our model there are documents (type "doc") consisting of 
several modules and there is some characterization of each link between a 
document and a module.

Almost all fields of a "doc" document are searchable, as well as the text of a 
module and the metadata of the "docmodule" entries.

We are looking for a fast way to retrieve all modules containing a certain text 
and associated with a given document, preferably with a single query. This 
means we want to query the text from a "module" document while we set a 
restriction on the docrefid from a "docmodule" or the id from a "doc" document. 
Is this possible by means of the new pseudo joins? Any ideas are highly 
appreciated!

Thanks in advance!

Milen Tilev
Master of Science
Softwareentwickler
Business Unit Information


MATERNA GmbH
Information & Communications

Voßkuhle 37
44141 Dortmund
Deutschland

Telefon: +49 231 5599-8257
Fax: +49 231 5599-98257
E-Mail: milen.ti...@materna.de

www.materna.de | 
Newsletter | 
Twitter | 
XING | 
Facebook


Sitz der MATERNA GmbH: Voßkuhle 37, 44141 Dortmund
Geschäftsführer: Dr. Winfried Materna, Helmut an de Meulen, Ralph Hartwig
Amtsgericht Dortmund HRB 5839



RE: poor language detection

2012-09-21 Thread tomtom
Hi Markus,

thank you very much, it helped. After many tries and reorderings in
config-files it works.

Greetings, tom






--
View this message in context: 
http://lucene.472066.n3.nabble.com/poor-language-detection-tp4008624p4009374.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problems with SolrEnitityProcessor + frange filterQuery

2012-09-21 Thread Dirceu Vieira
Hi Jack,

Your suggestion works perfectly!
Thank you very much!!

it ended up being something like this:

query="_query_:'status:1 AND NOT priority:\-1' AND _query_:'{!frange l=3000
u=5000}max(sum(suser_count), sum(user_count))' "

Regards,

Dirceu

On Thu, Sep 20, 2012 at 10:46 PM, Jack Krupansky wrote:

> Sorry, but it looks like the SolrEntityProcessor does a raw split on
> commas of its "fq" parameter, with no provision for escaping.
>
> You should be able to combine the fq into the query parameter as a nested
> query which does not have the split issue.
>
> -- Jack Krupansky
>
> -Original Message- From: Dirceu Vieira
> Sent: Thursday, September 20, 2012 4:16 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Problems with SolrEnitityProcessor + frange filterQuery
>
>
> Hi guys,
>
> Has anybody got any idea about that?
> I'm really open for any suggestions
>
> Thanks!
>
> Dirceu
>
> On Thu, Sep 20, 2012 at 11:58 AM, Dirceu Vieira 
> wrote:
>
>  Hi,
>>
>> I'm attempting to write a filter query for my SolrEntityProcessor using
>> {frange} over a function.
>> It works fine when I'm testing it on the admin, but once I move it into my
>> data-config.xml the query blows up because of the commas in the function.
>> The problem is that fq parameter can be a comma separated list, which
>> means that if I have commas within my query, it'll try to split it into
>> multiple filter queries.
>>
>> Does anybody knows a way of escaping the comma or another way I can work
>> around that?
>>
>> I've been using SolrEntityProcessor to import filtered data from a core to
>> another, here's the queries:
>>
>> query="status:1 AND NOT priority:\-1"
>> fq="{!frange l=3000 u=5000}max(sum(suser_count), sum(user_count))"
>>
>> I'm using Solr-4.0.0-BETA.
>>
>>
>>
>> Best regards,
>>
>> --
>> Dirceu Vieira Júnior
>> --**--**---
>> +47 9753 2473
>> dirceuvjr.blogspot.com
>> twitter.com/dirceuvjr
>>
>>
>>
>
> --
> Dirceu Vieira Júnior
> --**--**---
> +47 9753 2473
> dirceuvjr.blogspot.com
> twitter.com/dirceuvjr
>



-- 
Dirceu Vieira Júnior
---
+47 9753 2473
dirceuvjr.blogspot.com
twitter.com/dirceuvjr


Grouping

2012-09-21 Thread Peter Kirk
Hi

It appears from the Solr documentation that it is not possible to group by 
multi-value fields. Is this correct?

Also, grouping only works on text fields - not for example int fields. I was 
wondering what the basis for this decision was, and if it is actually possible 
to group by an int field. I know, of course, that it is easy enough to re-feed 
an int value to a string field for grouping, but I was just curious.

Thanks,
Peter