missing a directory, can not process pdf files

2012-09-19 Thread xxxx xxxx
seems the /update directory is missing? I use solr 4.0.0 beta
can not process pdf files because of it

also when will the final version be released? thought it it 30 days after beta?

how can we get the files which contain the searched queries / content?




Embedded Server Issue : SOLRJ : No Such Core Found

2012-09-19 Thread Senthil Kk Mani

Hi,

I am facing an issue while trying to use the solrj EmbeddedServer to load a
core. I am trying to load the example/solr packaged with solr-3.6.1.
It works perfectly fine through CommonHTTPSolrSever. I am able to query and
fetch the document.

I used the following jar files to compile and run.
apache-solr-solrj-3.6.1
apache-solr-core-3.6.1
slf4j-api-1.6.1
commons-io-2.1
lucene-core-3.6.1
commons-fileupload-1.2.2
servlet-api-2.5
commons-httpclient-3.1
commons-logging.jar
commons-code-1.6

However if I try to load the same 'solr' through Embedded Server - i get No
Such Core Error.

1) I tried passing the core name - 'collection1' - but same error

>From solr.xml

  
  

  


2) I also checked my solrconfig.xml and requesthandler for "/update" is set
as follows

>From solrconfig.xml
 




--Senthil



Re: Embedded Server Issue : SOLRJ : No Such Core Found

2012-09-19 Thread Tommaso Teofili
Hi Senthil,

try using the following:

 CoreContainer coreContainer = new CoreContainer.Initializer().initialize();
 SolrServer solrServer = new EmbeddedSolrServer(coreContainer,
"collection1");

Hope it helps,
Tommaso

2012/9/19 Senthil Kk Mani 

>
> Hi,
>
> I am facing an issue while trying to use the solrj EmbeddedServer to load a
> core. I am trying to load the example/solr packaged with solr-3.6.1.
> It works perfectly fine through CommonHTTPSolrSever. I am able to query and
> fetch the document.
>
> I used the following jar files to compile and run.
> apache-solr-solrj-3.6.1
> apache-solr-core-3.6.1
> slf4j-api-1.6.1
> commons-io-2.1
> lucene-core-3.6.1
> commons-fileupload-1.2.2
> servlet-api-2.5
> commons-httpclient-3.1
> commons-logging.jar
> commons-code-1.6
>
> However if I try to load the same 'solr' through Embedded Server - i get No
> Such Core Error.
>
> 1) I tried passing the core name - 'collection1' - but same error
>
> From solr.xml
> 
>   
>   
> 
>   
> 
>
> 2) I also checked my solrconfig.xml and requesthandler for "/update" is set
> as follows
>
> From solrconfig.xml
> class="solr.XmlUpdateRequestHandler">
> 
> 
> 
>
> --Senthil
>
>


Re: SOLR memory usage jump in JVM

2012-09-19 Thread Lance Norskog
The "Sawtooth curve" is normal. It means that memory use slowly goes up, this 
triggers a garbage collection pass, which frees the memory very quickly.

You can also turn off parallel garbage collection. This is slower, but will not 
trigger the SUN bug. (If that really is the problem.)

- Original Message -
| From: "Bernd Fehling" 
| To: solr-user@lucene.apache.org
| Sent: Tuesday, September 18, 2012 11:29:56 PM
| Subject: Re: SOLR memory usage jump in JVM
| 
| Hi Lance,
| 
| thanks for this hint. Something I also see, a sawtooth. This is
| coming from Eden space together with Survivor 0 and 1.
| I should switch to Java 7 release to get rid of this and see how
| heap usage looks there. May be something else is also fixed.
| 
| Regards
| Bernd
| 
| 
| Am 19.09.2012 05:29, schrieb Lance Norskog:
| > There is a known JVM garbage collection bug that causes this. It
| > has to do with reclaiming Weak references, I think in WeakHashMap.
| > Concurrent garbage collection collides with this bug and the
| > result is that old field cache data is retained after closing the
| > index. The bug is more common with more processors doing GC
| > simultaneously.
| > 
| > The symptom is that when you run a monitor, the memory usage rises
| > to a peak, drops to a floor, rises again in the classic sawtooth
| > pattern. When the GC bug happens, the ceiling becomes the floor,
| > and the sawtooth goes from the new floor to a new ceiling. The two
| > sizes are the same. So, 2G to 5G, over and over, suddenly it is 5G
| > to 8G, over and over.
| > 
| > The bug is fixed in recent Java 7 releases. I'm sorry, but I cannot
| > find the bug number.
| > 
| > - Original Message -
| > | From: "Yonik Seeley" 
| > | To: solr-user@lucene.apache.org
| > | Sent: Tuesday, September 18, 2012 7:38:41 AM
| > | Subject: Re: SOLR memory usage jump in JVM
| > | 
| > | On Tue, Sep 18, 2012 at 7:45 AM, Bernd Fehling
| > |  wrote:
| > | > I used GC in different situations and tried back and forth.
| > | > Yes, it reduces the used heap memory, but not by 5GB.
| > | > Even so that GC from jconsole (or jvisualvm) is "Full GC".
| > | 
| > | Whatever "Full GC" means ;-)
| > | In the past at least, I've found that I had to hit "Full GC" from
| > | jconsole many times in a row until heap usage stabilizes at it's
| > | lowest point.
| > | 
| > | You could check fieldCache and fieldValueCache to see how many
| > | entries
| > | there are before and after the memory bump.
| > | If that doesn't show anything different, I guess you may need to
| > | resort to a heap dump before and after.
| > | 
| > | > But while you bring GC into this, there is another interesting
| > | > thing.
| > | > - I have one slave running for a week which ends up around 18
| > | > to
| > | > 20GB of heap memory.
| > | > - the slave goes offline for replication (no user queries on
| > | > this
| > | > slave)
| > | > - the slave gets replicated and starts a new searcher
| > | > - the heap memory of the slave is still around 11 to 12GB
| > | > - then I initiate a Full GC from jconsole which brings it down
| > | > to
| > | > about 8GB
| > | > - then I call optimize (on a optimized index) and it then drops
| > | > to
| > | > 6.5GB like a fresh started system
| > | >
| > | >
| > | > I have already looked through Uwe's blog but he says "...As a
| > | > rule
| > | > of thumb: Don’t use more
| > | > than 1/4 of your physical memory as heap space for Java running
| > | > Lucene/Solr,..."
| > | > That would be on my server 8GB for JVM heap, can't believe that
| > | > the
| > | > system
| > | > will run for longer than 10 minutes with 8GB heap.
| > | 
| > | As you probably know, it depends hugely on the usecases/queries:
| > | some
| > | configurations would be fine with a small amount of heap, other
| > | configurations that facet and sort on tons of different fields
| > | would
| > | not be.
| > | 
| > | 
| > | -Yonik
| > | http://lucidworks.com
| > | 
| > 
| 


ramBufferSizeMB

2012-09-19 Thread Trym R. Møller

Hi

Using SolrCloud I have added the following to solrconfig.xml (actually 
the node in zookeeper)

512

After that I expected that my Lucene index segment files would be a bit 
bigger than 1KB as I'm indexing very small documents
Enabling the infoStream I see a lot of "flush at getReader" (one segment 
of the infoStream file pasted below)


1. Where can I look for why documents are flushed so frequently?
2. Does it have anything to do with "getReader" and can I do anything so 
Solr doesn't need to get a new reader so often?


Any comments are most welcome.

Best regards Trym

Furthermore I have specified
   
 18
   
   
 1000
   


IW 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flush at getReader
DW 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: pool-12-thread-1 
startFullFlush
DW 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: anyChanges? 
numDocsInRam=7 deletes=false hasTickets:false pendingChangesInFullFlush: 
false
DWFC 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: 
addFlushableState DocumentsWriterPerThread [pendingDeletes=gen=0, 
segment=_kc, aborting=false, numDocsInRAM=7, deleteQueue=DWDQ: [ 
generation: 1 ]]
DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flush postings 
as segment _kc numDocs=7
DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: new segment 
has 0 deleted docs
DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: new segment 
has no vectors; norms; no docValues; prox; freqs
DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: 
flushedFiles=[_kc_Lucene40_0.frq, _kc.fnm, _kc_Lucene40_0.tim, 
_kc_nrm.cfs, _kc.fdx, _kc.fdt, _kc_Lucene40_0.prx, _kc_nrm.cfe, 
_kc_Lucene40_0.tip]
DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flushed 
codec=Lucene40
DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flushed: 
segment=_kc ramUsed=0,095 MB newFlushedSize(includes docstores)=0,003 MB 
docs/MB=2.283,058




Re: Personalized Boosting

2012-09-19 Thread Tom Mortimer
I'm still not sure I understand what it is you're trying to do. Index-time or 
query-time boosts would probably be neater and more predictable than multiple 
field instances, though.

http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22field.22
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_change_the_score_of_a_document_based_on_the_.2Avalue.2A_of_a_field_.28say.2C_.22popularity.22.29

Tom


On 19 Sep 2012, at 02:49, deniz  wrote:

> Hello Tom
> 
> Thank you for your link, but after overviewing it, I dont think it will
> help... In my case, it will be dynamic, rather than setting a config file
> and as you think of a big country like Russia or China, i will need to add
> all cities manually to the elevator.xml file, and also the boosted users,
> which is not something i desire...
> 
> it seems like duplicating the values in the location field is the best (at
> least quickest) solution for this case..
> 
> 
> 
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Personalized-Boosting-tp4008495p4008783.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Wildcard searches don't work

2012-09-19 Thread Ahmet Arslan
> We're having difficulty with some wildcard searches in Solr
> 4.0Beta. We're using a copyField to write a "tdate" to a
> "text_general" field. We are using the default definition
> for the "text_general" field type. 
> 
>      indexed="true" stored="true" />
>      type="text_general" indexed="true" stored="true" />
> 
>      dest="date_text"/>
> 
> Here's the sample data it holds:
> 
>     2010-01-27T00:00:00Z
>     2010-01-28T00:00:00Z
>     2010-01-31T00:00:00Z
> 
> We run these queries and they return the expected results:
> 
>     date_text:"2010*"
>     date_text:"2010-*"
>     date_text:"2010-01*"
>     date_text:"2010-01-*"
> 
> However, when we run these, they return nothing. What are we
> doing wrong? 
> 
>     date_text:"*-01-27"
>     date_text:"2010-*-27"
>     date_text:"2010-01-27*"

I think in your case you need to use string type instead of text_general.




Re: SOLR memory usage jump in JVM

2012-09-19 Thread Shawn Heisey

On 9/18/2012 9:29 PM, Lance Norskog wrote:

There is a known JVM garbage collection bug that causes this. It has to do with 
reclaiming Weak references, I think in WeakHashMap. Concurrent garbage 
collection collides with this bug and the result is that old field cache data 
is retained after closing the index. The bug is more common with more 
processors doing GC simultaneously.

The symptom is that when you run a monitor, the memory usage rises to a peak, 
drops to a floor, rises again in the classic sawtooth pattern. When the GC bug 
happens, the ceiling becomes the floor, and the sawtooth goes from the new 
floor to a new ceiling. The two sizes are the same. So, 2G to 5G, over and 
over, suddenly it is 5G to 8G, over and over.

The bug is fixed in recent Java 7 releases. I'm sorry, but I cannot find the 
bug number.


I think I ran into this when I was looking at memory usage on my SolrJ 
indexing program.  Under Java6, memory usage in jconsole (remotely via 
JMX) was fairly constant long-term (aside from the unavoidable 
sawtooth).  When I ran it under Java 7u3, it would continually grow, 
slowly ... but if I measured it with jstat on the Linux commandline 
rather than remotely via jconsole under windows, memory usage was 
consistent over time, just like under java6 with the remote jconsole.  
After looking at heap dumps and scratching my head a lot, I finally 
concluded that I did not have a memory leak, there was a problem with 
remote JMX monitoring in java7.  Glad to hear I was not imagining it, 
and that it's fixed now.


Thanks,
Shawn



Re: missing a directory, can not process pdf files

2012-09-19 Thread Erick Erickson
Please review:

http://wiki.apache.org/solr/UsingMailingLists

There's nothing in your problem statement that's diagnosable. What did
you try? What
were the results? Details matter.

4.0 is in process of being prepped for release. 30 days was a
straw-man proposal.

Best
Erick

On Wed, Sep 19, 2012 at 3:46 AM,    wrote:
> seems the /update directory is missing? I use solr 4.0.0 beta
> can not process pdf files because of it
>
> also when will the final version be released? thought it it 30 days after 
> beta?
>
> how can we get the files which contain the searched queries / content?
>
>


Re: FilterCache Memory consumption high

2012-09-19 Thread Erick Erickson
I think this is the weak reference bug maybe?

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034

Best
Erick

On Tue, Sep 18, 2012 at 11:29 PM, Lance Norskog  wrote:
> The same answer as in another thread:
>
> There is a known JVM garbage collection bug that causes this. It has to do 
> with reclaiming Weak references, I think in WeakHashMap. Concurrent garbage 
> collection collides with this bug and the result is that old field cache data 
> is retained after closing the index. The bug is more common with more 
> processors doing GC simultaneously.
>
> The symptom is that when you run a monitor, the memory usage rises to a peak, 
> drops to a floor, rises again in the classic sawtooth pattern. When the GC 
> bug happens, the ceiling becomes the floor, and the sawtooth goes from the 
> new floor to a new ceiling. The two sizes are the same. So, 2G to 5G, over 
> and over, suddenly it is 5G to 8G, over and over.
>
> The bug is fixed in recent Java 7 releases. I'm sorry, but I cannot find the 
> bug number.
>
> - Original Message -
> | From: "Yonik Seeley" 
> | To: solr-user@lucene.apache.org
> | Sent: Monday, September 17, 2012 1:37:49 PM
> | Subject: Re: FilterCache Memory consumption high
> |
> | On Mon, Sep 17, 2012 at 3:44 PM, Mike Schultz
> |  wrote:
> | > So I'm figuring 3MB per entry.  With CacheSize=512 I expect
> | > something like
> | > 1.5GB of RAM, but with the server in steady state after 1/2 hour,
> | > it is 7GB
> | > larger than without the cache.
> |
> | Heap size and memory use aren't quite the same thing.
> | Try running jconsole (it comes with every JDK), attaching to the
> | process, and then make it run multiple garbage collections to see
> | what
> | the heap shrinks down to.
> |
> | -Yonik
> | http://lucidworks.com
> |


Re: Personalized Boosting

2012-09-19 Thread Erick Erickson
Would boosting (or sorting) by geodist work? See:
http://wiki.apache.org/solr/SpatialSearch#geodist_-_The_distance_function

Which you can use for a "boost query" as well as sorting.

Of course you need to get the lat/lon of your users to make this work,
but there are a number of services that can get you, say, the city
center in lat/lon.

Best
Erick

On Wed, Sep 19, 2012 at 6:43 AM, Tom Mortimer  wrote:
> I'm still not sure I understand what it is you're trying to do. Index-time or 
> query-time boosts would probably be neater and more predictable than multiple 
> field instances, though.
>
> http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22field.22
> http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_change_the_score_of_a_document_based_on_the_.2Avalue.2A_of_a_field_.28say.2C_.22popularity.22.29
>
> Tom
>
>
> On 19 Sep 2012, at 02:49, deniz  wrote:
>
>> Hello Tom
>>
>> Thank you for your link, but after overviewing it, I dont think it will
>> help... In my case, it will be dynamic, rather than setting a config file
>> and as you think of a big country like Russia or China, i will need to add
>> all cities manually to the elevator.xml file, and also the boosted users,
>> which is not something i desire...
>>
>> it seems like duplicating the values in the location field is the best (at
>> least quickest) solution for this case..
>>
>>
>>
>> -
>> Zeki ama calismiyor... Calissa yapar...
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Personalized-Boosting-tp4008495p4008783.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: SOLR memory usage jump in JVM

2012-09-19 Thread Erick Erickson
Two in one morning

The JVM bug I'm familiar with is here:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034

FWIW,
Erick

On Wed, Sep 19, 2012 at 8:20 AM, Shawn Heisey  wrote:
> On 9/18/2012 9:29 PM, Lance Norskog wrote:
>>
>> There is a known JVM garbage collection bug that causes this. It has to do
>> with reclaiming Weak references, I think in WeakHashMap. Concurrent garbage
>> collection collides with this bug and the result is that old field cache
>> data is retained after closing the index. The bug is more common with more
>> processors doing GC simultaneously.
>>
>> The symptom is that when you run a monitor, the memory usage rises to a
>> peak, drops to a floor, rises again in the classic sawtooth pattern. When
>> the GC bug happens, the ceiling becomes the floor, and the sawtooth goes
>> from the new floor to a new ceiling. The two sizes are the same. So, 2G to
>> 5G, over and over, suddenly it is 5G to 8G, over and over.
>>
>> The bug is fixed in recent Java 7 releases. I'm sorry, but I cannot find
>> the bug number.
>
>
> I think I ran into this when I was looking at memory usage on my SolrJ
> indexing program.  Under Java6, memory usage in jconsole (remotely via JMX)
> was fairly constant long-term (aside from the unavoidable sawtooth).  When I
> ran it under Java 7u3, it would continually grow, slowly ... but if I
> measured it with jstat on the Linux commandline rather than remotely via
> jconsole under windows, memory usage was consistent over time, just like
> under java6 with the remote jconsole.  After looking at heap dumps and
> scratching my head a lot, I finally concluded that I did not have a memory
> leak, there was a problem with remote JMX monitoring in java7.  Glad to hear
> I was not imagining it, and that it's fixed now.
>
> Thanks,
> Shawn
>


Re: ramBufferSizeMB

2012-09-19 Thread Erick Erickson
I _think_ the getReader calls are being triggered by the autoSoftCommit being
at one second. If so, this is probably OK. But bumping that up would nail
whether that's the case...

About RamBufferSizeMB. This has nothing to do with the size of the segments!
It's just how much memory is consumed before the RAMBuffer is flushed to
the _currently open_ segment. So until a hard commit happens, the currently
open segment will continue to grow as successive RAMBuffers are flushed.

bq:  I expected that my Lucene index segment files would be a bit
bigger than 1KB

Is this a typo? The 512 is specifying MB..

Best
Erick

On Wed, Sep 19, 2012 at 6:01 AM, "Trym R. Møller"  wrote:
> Hi
>
> Using SolrCloud I have added the following to solrconfig.xml (actually the
> node in zookeeper)
> 512
>
> After that I expected that my Lucene index segment files would be a bit
> bigger than 1KB as I'm indexing very small documents
> Enabling the infoStream I see a lot of "flush at getReader" (one segment of
> the infoStream file pasted below)
>
> 1. Where can I look for why documents are flushed so frequently?
> 2. Does it have anything to do with "getReader" and can I do anything so
> Solr doesn't need to get a new reader so often?
>
> Any comments are most welcome.
>
> Best regards Trym
>
> Furthermore I have specified
>
>  18
>
>
>  1000
>
>
>
> IW 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flush at getReader
> DW 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: pool-12-thread-1
> startFullFlush
> DW 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: anyChanges?
> numDocsInRam=7 deletes=false hasTickets:false pendingChangesInFullFlush:
> false
> DWFC 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: addFlushableState
> DocumentsWriterPerThread [pendingDeletes=gen=0, segment=_kc, aborting=false,
> numDocsInRAM=7, deleteQueue=DWDQ: [ generation: 1 ]]
> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flush postings as
> segment _kc numDocs=7
> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: new segment has 0
> deleted docs
> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: new segment has no
> vectors; norms; no docValues; prox; freqs
> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]:
> flushedFiles=[_kc_Lucene40_0.frq, _kc.fnm, _kc_Lucene40_0.tim, _kc_nrm.cfs,
> _kc.fdx, _kc.fdt, _kc_Lucene40_0.prx, _kc_nrm.cfe, _kc_Lucene40_0.tip]
> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flushed
> codec=Lucene40
> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flushed:
> segment=_kc ramUsed=0,095 MB newFlushedSize(includes docstores)=0,003 MB
> docs/MB=2.283,058
>


Re: Wildcard searches don't work

2012-09-19 Thread Erick Erickson
Take a look at admin/analysis on the text_general type You'll see that
StandardTokenizer is breaking the input strings up into individual tokens
on the colons and hyphens, so
2010-01-27T00:00:00Z
becomes the tokens
2010 01 27T00 00 00Z

admin/analysis should be your first reflex when you encounter things like
this ...

Best
Erick


On Wed, Sep 19, 2012 at 7:00 AM, Ahmet Arslan  wrote:
>> We're having difficulty with some wildcard searches in Solr
>> 4.0Beta. We're using a copyField to write a "tdate" to a
>> "text_general" field. We are using the default definition
>> for the "text_general" field type.
>>
>> > indexed="true" stored="true" />
>> > type="text_general" indexed="true" stored="true" />
>>
>> > dest="date_text"/>
>>
>> Here's the sample data it holds:
>>
>> 2010-01-27T00:00:00Z
>> 2010-01-28T00:00:00Z
>> 2010-01-31T00:00:00Z
>>
>> We run these queries and they return the expected results:
>>
>> date_text:"2010*"
>> date_text:"2010-*"
>> date_text:"2010-01*"
>> date_text:"2010-01-*"
>>
>> However, when we run these, they return nothing. What are we
>> doing wrong?
>>
>> date_text:"*-01-27"
>> date_text:"2010-*-27"
>> date_text:"2010-01-27*"
>
> I think in your case you need to use string type instead of text_general.
>
> 


Highlighting without URL condition

2012-09-19 Thread Spadez
Hi, 

I was wondering if it is possible to set up highlighting so it is on by
default, and doesnt need to add to the URL. For example: 

http://localhost:8080/solr/select?q=book&hl=true

I would like to have it so highlighting is on even if the URL is this: 

http://localhost:8080/solr/select?q=book

Is this possible, and if so, how can it be achieved? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-without-URL-condition-tp4008899.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Highlighting without URL condition

2012-09-19 Thread Savvas Andreas Moysidis
Hello,

You can add this request parameter in the "defaults" section of your
request handler named "/select" in solrconfig.xml  like this:


   
   true
   


and as long as you use this request handler you won't need to
explicitly specify this parameter in your request.

On 19 September 2012 14:27, Spadez  wrote:
> Hi,
>
> I was wondering if it is possible to set up highlighting so it is on by
> default, and doesnt need to add to the URL. For example:
>
> http://localhost:8080/solr/select?q=book&hl=true
>
> I would like to have it so highlighting is on even if the URL is this:
>
> http://localhost:8080/solr/select?q=book
>
> Is this possible, and if so, how can it be achieved?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Highlighting-without-URL-condition-tp4008899.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Nodes cannot recover and become unavailable

2012-09-19 Thread Markus Jelsma
Hi,

Since the 2012-09-17 11:10:41 build shards start to have trouble coming back 
online. When i restart one node the slices on the other nodes are throwing 
exceptions and cannot be queried. I'm not sure how to remedy the problem but 
stopping a node or restarting it a few times seems to help it. The problem is 
when i restart a node, and it happens, i must not restart another node because 
that may trigger other slices becoming unavailable.

Here are some parts of the log:

2012-09-19 14:13:18,149 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] 
- : Recovery failed - trying again... core=oi_i
2012-09-19 14:13:25,818 WARN [solr.cloud.RecoveryStrategy] - [main-EventThread] 
- : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
2012-09-19 14:13:44,497 WARN [solr.cloud.RecoveryStrategy] - [Thread-4] - : 
Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_jcore=oi_j
2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] 
- : Error while trying to recover. 
core=oi_i:org.apache.solr.common.SolrException: We are not the leader
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
at 
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)

2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] 
- : Recovery failed - trying again... core=oi_i
2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] 
- : Recovery failed - max retries exceeded. core=oi_i
2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] 
- : Recovery failed - I give up. core=oi_i
2012-09-19 14:14:00,333 WARN [solr.cloud.RecoveryStrategy] - [RecoveryThread] - 
: Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
 ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : Sync request error: 
java.lang.NullPointerException
 ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : 
http://nl10.host:8080/solr/oi_i/: Could not tell a replica to 
recover:java.lang.NullPointerException
at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305)
at 
org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:102)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.(HttpSolrServer.java:155)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.(HttpSolrServer.java:128)
at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:262)
at 
org.apache.solr.cloud.SyncStrategy.requestRecovery(SyncStrategy.java:272)
at org.apache.solr.cloud.SyncStrategy.syncToMe(SyncStrategy.java:203)
at 
org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:125)
at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:87)
at 
org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:169)
at 
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158)
at 
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102)
at 
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:275)
at 
org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:326)
at 
org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:159)
at 
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158)
at 
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102)
at org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:56)
at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:131)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)

 ERROR [apache.zookeeper.ClientCnxn] - [main-EventThread] - : Error while 
calling watcher 
java.lang.NullPointerException
at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:139)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 ERROR [apache.zookeeper.ClientCnxn] - [main-EventThread] - : Error while 
calling watcher 
java.lang.NullPointerException
at 
org.apache.solr.common.cloud.ZkStateReader$3.process(ZkStateReader.java:238)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 ERROR [apache.zookeeper.ClientCnxn] - [main-

Re: missing a directory, can not process pdf files

2012-09-19 Thread xxxx xxxx
I want to process a pdf file see "Indexing Data" from 
http://lucene.apache.org/solr/api-4_0_0-BETA/doc-files/tutorial.html

the directory "update" doesnt even exist:
SimplePostTool: POSTing files to http://localhost:8983/solr/update..

fails because the /update directory is not there and also has no contents (and 
is missing in the repos on github and so on)

how can we retrieve the files when we do a query which contain the searched 
query?
 Original-Nachricht 
> Datum: Wed, 19 Sep 2012 08:33:57 -0400
> Von: Erick Erickson 
> An: solr-user@lucene.apache.org
> Betreff: Re: missing a directory, can not process pdf files

> Please review:
> 
> http://wiki.apache.org/solr/UsingMailingLists
> 
> There's nothing in your problem statement that's diagnosable. What did
> you try? What
> were the results? Details matter.
> 
> 4.0 is in process of being prepped for release. 30 days was a
> straw-man proposal.
> 
> Best
> Erick
> 
> On Wed, Sep 19, 2012 at 3:46 AM,    wrote:
> > seems the /update directory is missing? I use solr 4.0.0 beta
> > can not process pdf files because of it
> >
> > also when will the final version be released? thought it it 30 days
> after beta?
> >
> > how can we get the files which contain the searched queries / content?
> >
> >


Re: missing a directory, can not process pdf files

2012-09-19 Thread Erik Hatcher
There's nothing in that tutorial that mentions an update "directory".  /update 
is a URL endpoint that requires Solr be up and running.

Please post the entire set of steps that you're trying and the exact 
(copy/pasted) error messages you're receiving.

And once you index a PDF file, you don't retrieve the file back from Solr, you 
retrieve search results.  The original file is where it was indexed from, not 
inside Solr.  What you'll get back is the file name (if you stored it, that is).

Erik

On Sep 19, 2012, at 10:40 ,   wrote:

> I want to process a pdf file see "Indexing Data" from 
> http://lucene.apache.org/solr/api-4_0_0-BETA/doc-files/tutorial.html
> 
> the directory "update" doesnt even exist:
> SimplePostTool: POSTing files to http://localhost:8983/solr/update..
> 
> fails because the /update directory is not there and also has no contents 
> (and is missing in the repos on github and so on)
> 
> how can we retrieve the files when we do a query which contain the searched 
> query?
>  Original-Nachricht 
>> Datum: Wed, 19 Sep 2012 08:33:57 -0400
>> Von: Erick Erickson 
>> An: solr-user@lucene.apache.org
>> Betreff: Re: missing a directory, can not process pdf files
> 
>> Please review:
>> 
>> http://wiki.apache.org/solr/UsingMailingLists
>> 
>> There's nothing in your problem statement that's diagnosable. What did
>> you try? What
>> were the results? Details matter.
>> 
>> 4.0 is in process of being prepped for release. 30 days was a
>> straw-man proposal.
>> 
>> Best
>> Erick
>> 
>> On Wed, Sep 19, 2012 at 3:46 AM,    wrote:
>>> seems the /update directory is missing? I use solr 4.0.0 beta
>>> can not process pdf files because of it
>>> 
>>> also when will the final version be released? thought it it 30 days
>> after beta?
>>> 
>>> how can we get the files which contain the searched queries / content?
>>> 
>>> 



Re: missing a directory, can not process pdf files

2012-09-19 Thread Gora Mohanty
On 19 September 2012 20:10,    wrote:

> I want to process a pdf file see "Indexing Data" from
> http://lucene.apache.org/solr/api-4_0_0-BETA/doc-files/tutorial.html
>
> the directory "update" doesnt even exist:
> SimplePostTool: POSTing files to http://localhost:8983/solr/update..
>

Sorry, what directory are you referring to? The update above is
a URL component, and there is a handler that responds to that.

Are you by any chance looking for a PHP-style file layout?
That is not how things work here.

Otherwise, please expand further on how exactly you are
trying to index the PDF files, and what errors you see in
the logs.

Regards,
Gora


Re: How does Solr handle overloads so well?

2012-09-19 Thread Mike Gagnon
[ I am sorry for breaking the thread, but my inbox has neither received my
original post to the mailing list, nor Otis's response (so I can't reply to
his response) ]

Thanks a bunch for your response Otis.  Let me more thoroughly explain my
experimental workload and why I am surprised Solr works so well.

The most important characteristic of my workload is that many of the
requests (60 per second) cause infinite loops within Solr. That is, each of
those requests causes a separate infinite loop within it's request context.

This workload is similar to an algorithmic-complexity attack --- a type of
DoS.  In every web-app stack I've tested (except Solr/Jetty and
Solr/Tomcat) such workloads cause an immediate and complete denial of
service. What happens for these vulnerable applications, is that the thread
pool fills up with infinite loops, and incoming requests become rejected.

But Solr manages to survive such an attack. My best guess is that Solr has
an especially good overload strategy that quickly kicks out the infinite
loop requests -- which lowers CPU contention, and allows other requests to
be admitted.

My first guess would be that Tomcat or Jetty is responsible for the good
response to overload. However,
there was a good discussion in 2008 on this mailing list about Solr
Security:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200811.mbox/browser

In this discuss Walter Underwood commented: "We have protected against
several different DoS problems in our front-end code."

Perhaps it is these front-end defenses that help Solr survive my workloads?

Thanks!
Mike Gagnon


> Hm, I'm not sure how to approach this. Solr is not alone here - there's
> container like jetty, solr inside it and lucene inside solr.
> Next, that index is rally small, so there is no disk IO. The request
> rate is also not super high and if you did this over a fast connection
then
> there are also no issues with slow response writing or with having lots of
> concurrent connections or running out of threads ...
>
> ...so it's not really that surprising solr keeps working :)
>
> But...tell us more.
>
> Otis
> --
> Performance Monitoring - http://sematext.com/spm
>
>
>
> On Sep 12, 2012 8:51 PM, "Mike Gagnon"  wrote:
>
> Hi,
>
> I have been studying how server software responds to requests that cause
> CPU overloads (such as infinite loops).
>
> In my experiments I have observed that Solr performs unusually well when
> subjected to such loads. Every other piece of web software I've
> experimented with drops to zero service under such loads. Do you know how
> Solr achieves such good performance? I am guessing that when Solr is
> overload sheds load to make room for incoming requests, but I could not
> find any documentation that describes Solr's overload strategy.
>
> Experimental setup: I ran Solr 3.1 on a 12-core machine with 12 GB ram,
> using it index and search about 10,000 pages on MediaWiki. I test both
> Solr+Jetty and Solr+Tomcat. I submitted a variety of Solr queries at a
rate
> of 300 requests per second. At the same time, I submitted "overload
> requests" at a rate of 60 requests per second. Each overload request
caused
> an infinite loop in Solr via
> https://issues.apache.org/jira/browse/SOLR-2631.
>
> With Jetty about 70% of non-overload requests completed --- 95% of
requests
> completing within 0.6 seconds.
> With Tomcat about 34% of non-overload requests completed --- 95% of
> requests completing within 0.6 seconds.
>
> I also ran Solr+Jetty with non-overload requests coming in 65 requests per
> second (overload requests remain at 60 requests per second). In this
> workload, the completion rate drops to 15% and the 95th percentile latency
> increases to 25.
>
> Cheers,
> Mike Gagnon
>


Re: How does Solr handle overloads so well?

2012-09-19 Thread Erik Hatcher
How are you triggering an infinite loop in your requests to Solr?

Erik

On Sep 19, 2012, at 11:12 , Mike Gagnon wrote:

> [ I am sorry for breaking the thread, but my inbox has neither received my
> original post to the mailing list, nor Otis's response (so I can't reply to
> his response) ]
> 
> Thanks a bunch for your response Otis.  Let me more thoroughly explain my
> experimental workload and why I am surprised Solr works so well.
> 
> The most important characteristic of my workload is that many of the
> requests (60 per second) cause infinite loops within Solr. That is, each of
> those requests causes a separate infinite loop within it's request context.
> 
> This workload is similar to an algorithmic-complexity attack --- a type of
> DoS.  In every web-app stack I've tested (except Solr/Jetty and
> Solr/Tomcat) such workloads cause an immediate and complete denial of
> service. What happens for these vulnerable applications, is that the thread
> pool fills up with infinite loops, and incoming requests become rejected.
> 
> But Solr manages to survive such an attack. My best guess is that Solr has
> an especially good overload strategy that quickly kicks out the infinite
> loop requests -- which lowers CPU contention, and allows other requests to
> be admitted.
> 
> My first guess would be that Tomcat or Jetty is responsible for the good
> response to overload. However,
> there was a good discussion in 2008 on this mailing list about Solr
> Security:
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200811.mbox/browser
> 
> In this discuss Walter Underwood commented: "We have protected against
> several different DoS problems in our front-end code."
> 
> Perhaps it is these front-end defenses that help Solr survive my workloads?
> 
> Thanks!
> Mike Gagnon
> 
> 
>> Hm, I'm not sure how to approach this. Solr is not alone here - there's
>> container like jetty, solr inside it and lucene inside solr.
>> Next, that index is rally small, so there is no disk IO. The request
>> rate is also not super high and if you did this over a fast connection
> then
>> there are also no issues with slow response writing or with having lots of
>> concurrent connections or running out of threads ...
>> 
>> ...so it's not really that surprising solr keeps working :)
>> 
>> But...tell us more.
>> 
>> Otis
>> --
>> Performance Monitoring - http://sematext.com/spm
>> 
>> 
>> 
>> On Sep 12, 2012 8:51 PM, "Mike Gagnon"  wrote:
>> 
>> Hi,
>> 
>> I have been studying how server software responds to requests that cause
>> CPU overloads (such as infinite loops).
>> 
>> In my experiments I have observed that Solr performs unusually well when
>> subjected to such loads. Every other piece of web software I've
>> experimented with drops to zero service under such loads. Do you know how
>> Solr achieves such good performance? I am guessing that when Solr is
>> overload sheds load to make room for incoming requests, but I could not
>> find any documentation that describes Solr's overload strategy.
>> 
>> Experimental setup: I ran Solr 3.1 on a 12-core machine with 12 GB ram,
>> using it index and search about 10,000 pages on MediaWiki. I test both
>> Solr+Jetty and Solr+Tomcat. I submitted a variety of Solr queries at a
> rate
>> of 300 requests per second. At the same time, I submitted "overload
>> requests" at a rate of 60 requests per second. Each overload request
> caused
>> an infinite loop in Solr via
>> https://issues.apache.org/jira/browse/SOLR-2631.
>> 
>> With Jetty about 70% of non-overload requests completed --- 95% of
> requests
>> completing within 0.6 seconds.
>> With Tomcat about 34% of non-overload requests completed --- 95% of
>> requests completing within 0.6 seconds.
>> 
>> I also ran Solr+Jetty with non-overload requests coming in 65 requests per
>> second (overload requests remain at 60 requests per second). In this
>> workload, the completion rate drops to 15% and the 95th percentile latency
>> increases to 25.
>> 
>> Cheers,
>> Mike Gagnon
>> 



Re: Nodes cannot recover and become unavailable

2012-09-19 Thread Sami Siren
Hi,

I am having troubles understanding the reason for that NPE.

First you could try removing the line #102 in HttpClientUtility so
that logging does not prevent creation of the http client in
SyncStrategy.

--
 Sami Siren

On Wed, Sep 19, 2012 at 5:29 PM, Markus Jelsma
 wrote:
> Hi,
>
> Since the 2012-09-17 11:10:41 build shards start to have trouble coming back 
> online. When i restart one node the slices on the other nodes are throwing 
> exceptions and cannot be queried. I'm not sure how to remedy the problem but 
> stopping a node or restarting it a few times seems to help it. The problem is 
> when i restart a node, and it happens, i must not restart another node 
> because that may trigger other slices becoming unavailable.
>
> Here are some parts of the log:
>
> 2012-09-19 14:13:18,149 ERROR [solr.cloud.RecoveryStrategy] - 
> [RecoveryThread] - : Recovery failed - trying again... core=oi_i
> 2012-09-19 14:13:25,818 WARN [solr.cloud.RecoveryStrategy] - 
> [main-EventThread] - : Stopping recovery for 
> zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
> 2012-09-19 14:13:44,497 WARN [solr.cloud.RecoveryStrategy] - [Thread-4] - : 
> Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_jcore=oi_j
> 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
> [RecoveryThread] - : Error while trying to recover. 
> core=oi_i:org.apache.solr.common.SolrException: We are not the leader
> at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
> at 
> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199)
> at 
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388)
> at 
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
>
> 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
> [RecoveryThread] - : Recovery failed - trying again... core=oi_i
> 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
> [RecoveryThread] - : Recovery failed - max retries exceeded. core=oi_i
> 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
> [RecoveryThread] - : Recovery failed - I give up. core=oi_i
> 2012-09-19 14:14:00,333 WARN [solr.cloud.RecoveryStrategy] - [RecoveryThread] 
> - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
>  ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : Sync request error: 
> java.lang.NullPointerException
>  ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : 
> http://nl10.host:8080/solr/oi_i/: Could not tell a replica to 
> recover:java.lang.NullPointerException
> at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305)
> at 
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:102)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.(HttpSolrServer.java:155)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.(HttpSolrServer.java:128)
> at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:262)
> at 
> org.apache.solr.cloud.SyncStrategy.requestRecovery(SyncStrategy.java:272)
> at org.apache.solr.cloud.SyncStrategy.syncToMe(SyncStrategy.java:203)
> at 
> org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:125)
> at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:87)
> at 
> org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:169)
> at 
> org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158)
> at 
> org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102)
> at 
> org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:275)
> at 
> org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:326)
> at 
> org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:159)
> at 
> org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158)
> at 
> org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102)
> at 
> org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:56)
> at 
> org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:131)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
>
>  ERROR [apache.zookeeper.ClientCnxn] - [main-EventThread] - : Error while 
> calling watcher
> java.lang.NullPointerException
> at 
> org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:139)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
> at 
> org.apache.zookeeper.ClientCn

Re: How does Solr handle overloads so well?

2012-09-19 Thread Mike Gagnon
Via this bug: https://issues.apache.org/jira/browse/SOLR-2631


> ... Solr can infinite loop, use 100% CPU and stack overflow, if you

> execute the following HTTP request:
> - http://localhost:8983/solr/select?qt=/admin/ping
> - http://localhost:8983/solr/admin/ping?qt=/admin/ping

I am running Solr 3.1, which has that bug.

Thanks,
Mike

On Wed, Sep 19, 2012 at 8:20 AM, Erik Hatcher wrote:

> How are you triggering an infinite loop in your requests to Solr?
>
> Erik
>
> On Sep 19, 2012, at 11:12 , Mike Gagnon wrote:
>
> > [ I am sorry for breaking the thread, but my inbox has neither received
> my
> > original post to the mailing list, nor Otis's response (so I can't reply
> to
> > his response) ]
> >
> > Thanks a bunch for your response Otis.  Let me more thoroughly explain my
> > experimental workload and why I am surprised Solr works so well.
> >
> > The most important characteristic of my workload is that many of the
> > requests (60 per second) cause infinite loops within Solr. That is, each
> of
> > those requests causes a separate infinite loop within it's request
> context.
> >
> > This workload is similar to an algorithmic-complexity attack --- a type
> of
> > DoS.  In every web-app stack I've tested (except Solr/Jetty and
> > Solr/Tomcat) such workloads cause an immediate and complete denial of
> > service. What happens for these vulnerable applications, is that the
> thread
> > pool fills up with infinite loops, and incoming requests become rejected.
> >
> > But Solr manages to survive such an attack. My best guess is that Solr
> has
> > an especially good overload strategy that quickly kicks out the infinite
> > loop requests -- which lowers CPU contention, and allows other requests
> to
> > be admitted.
> >
> > My first guess would be that Tomcat or Jetty is responsible for the good
> > response to overload. However,
> > there was a good discussion in 2008 on this mailing list about Solr
> > Security:
> >
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200811.mbox/browser
> >
> > In this discuss Walter Underwood commented: "We have protected against
> > several different DoS problems in our front-end code."
> >
> > Perhaps it is these front-end defenses that help Solr survive my
> workloads?
> >
> > Thanks!
> > Mike Gagnon
> >
> >
> >> Hm, I'm not sure how to approach this. Solr is not alone here - there's
> >> container like jetty, solr inside it and lucene inside solr.
> >> Next, that index is rally small, so there is no disk IO. The request
> >> rate is also not super high and if you did this over a fast connection
> > then
> >> there are also no issues with slow response writing or with having lots
> of
> >> concurrent connections or running out of threads ...
> >>
> >> ...so it's not really that surprising solr keeps working :)
> >>
> >> But...tell us more.
> >>
> >> Otis
> >> --
> >> Performance Monitoring - http://sematext.com/spm
> >>
> >>
> >>
> >> On Sep 12, 2012 8:51 PM, "Mike Gagnon"  wrote:
> >>
> >> Hi,
> >>
> >> I have been studying how server software responds to requests that cause
> >> CPU overloads (such as infinite loops).
> >>
> >> In my experiments I have observed that Solr performs unusually well when
> >> subjected to such loads. Every other piece of web software I've
> >> experimented with drops to zero service under such loads. Do you know
> how
> >> Solr achieves such good performance? I am guessing that when Solr is
> >> overload sheds load to make room for incoming requests, but I could not
> >> find any documentation that describes Solr's overload strategy.
> >>
> >> Experimental setup: I ran Solr 3.1 on a 12-core machine with 12 GB ram,
> >> using it index and search about 10,000 pages on MediaWiki. I test both
> >> Solr+Jetty and Solr+Tomcat. I submitted a variety of Solr queries at a
> > rate
> >> of 300 requests per second. At the same time, I submitted "overload
> >> requests" at a rate of 60 requests per second. Each overload request
> > caused
> >> an infinite loop in Solr via
> >> https://issues.apache.org/jira/browse/SOLR-2631.
> >>
> >> With Jetty about 70% of non-overload requests completed --- 95% of
> > requests
> >> completing within 0.6 seconds.
> >> With Tomcat about 34% of non-overload requests completed --- 95% of
> >> requests completing within 0.6 seconds.
> >>
> >> I also ran Solr+Jetty with non-overload requests coming in 65 requests
> per
> >> second (overload requests remain at 60 requests per second). In this
> >> workload, the completion rate drops to 15% and the 95th percentile
> latency
> >> increases to 25.
> >>
> >> Cheers,
> >> Mike Gagnon
> >>
>
>


Re: How does Solr handle overloads so well?

2012-09-19 Thread Walter Underwood
The front-end code protection that I mentioned was outside of Solr. At that 
time, requests with very large start values were slow, so we put code in the 
front end to never request those. Even if the user wanted page 5000 of the 
results, they would get page 100.

Now, those requests are fast, so that external protection is not needed.

I was running overload tests this summer and could not get Solr to behave 
badly. The throughput would drop off with overload, but not too bad. This was 
all with simple queries on a 1.2M doc index.

wunder
Walter Underwood
Search Guy, Chegg

On Sep 19, 2012, at 8:20 AM, Erik Hatcher wrote:

> How are you triggering an infinite loop in your requests to Solr?
> 
>   Erik
> 
> On Sep 19, 2012, at 11:12 , Mike Gagnon wrote:
> 
>> [ I am sorry for breaking the thread, but my inbox has neither received my
>> original post to the mailing list, nor Otis's response (so I can't reply to
>> his response) ]
>> 
>> Thanks a bunch for your response Otis.  Let me more thoroughly explain my
>> experimental workload and why I am surprised Solr works so well.
>> 
>> The most important characteristic of my workload is that many of the
>> requests (60 per second) cause infinite loops within Solr. That is, each of
>> those requests causes a separate infinite loop within it's request context.
>> 
>> This workload is similar to an algorithmic-complexity attack --- a type of
>> DoS.  In every web-app stack I've tested (except Solr/Jetty and
>> Solr/Tomcat) such workloads cause an immediate and complete denial of
>> service. What happens for these vulnerable applications, is that the thread
>> pool fills up with infinite loops, and incoming requests become rejected.
>> 
>> But Solr manages to survive such an attack. My best guess is that Solr has
>> an especially good overload strategy that quickly kicks out the infinite
>> loop requests -- which lowers CPU contention, and allows other requests to
>> be admitted.
>> 
>> My first guess would be that Tomcat or Jetty is responsible for the good
>> response to overload. However,
>> there was a good discussion in 2008 on this mailing list about Solr
>> Security:
>> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200811.mbox/browser
>> 
>> In this discuss Walter Underwood commented: "We have protected against
>> several different DoS problems in our front-end code."
>> 
>> Perhaps it is these front-end defenses that help Solr survive my workloads?
>> 
>> Thanks!
>> Mike Gagnon
>> 
>> 
>>> Hm, I'm not sure how to approach this. Solr is not alone here - there's
>>> container like jetty, solr inside it and lucene inside solr.
>>> Next, that index is rally small, so there is no disk IO. The request
>>> rate is also not super high and if you did this over a fast connection
>> then
>>> there are also no issues with slow response writing or with having lots of
>>> concurrent connections or running out of threads ...
>>> 
>>> ...so it's not really that surprising solr keeps working :)
>>> 
>>> But...tell us more.
>>> 
>>> Otis
>>> --
>>> Performance Monitoring - http://sematext.com/spm
>>> 
>>> 
>>> 
>>> On Sep 12, 2012 8:51 PM, "Mike Gagnon"  wrote:
>>> 
>>> Hi,
>>> 
>>> I have been studying how server software responds to requests that cause
>>> CPU overloads (such as infinite loops).
>>> 
>>> In my experiments I have observed that Solr performs unusually well when
>>> subjected to such loads. Every other piece of web software I've
>>> experimented with drops to zero service under such loads. Do you know how
>>> Solr achieves such good performance? I am guessing that when Solr is
>>> overload sheds load to make room for incoming requests, but I could not
>>> find any documentation that describes Solr's overload strategy.
>>> 
>>> Experimental setup: I ran Solr 3.1 on a 12-core machine with 12 GB ram,
>>> using it index and search about 10,000 pages on MediaWiki. I test both
>>> Solr+Jetty and Solr+Tomcat. I submitted a variety of Solr queries at a
>> rate
>>> of 300 requests per second. At the same time, I submitted "overload
>>> requests" at a rate of 60 requests per second. Each overload request
>> caused
>>> an infinite loop in Solr via
>>> https://issues.apache.org/jira/browse/SOLR-2631.
>>> 
>>> With Jetty about 70% of non-overload requests completed --- 95% of
>> requests
>>> completing within 0.6 seconds.
>>> With Tomcat about 34% of non-overload requests completed --- 95% of
>>> requests completing within 0.6 seconds.
>>> 
>>> I also ran Solr+Jetty with non-overload requests coming in 65 requests per
>>> second (overload requests remain at 60 requests per second). In this
>>> workload, the completion rate drops to 15% and the 95th percentile latency
>>> increases to 25.
>>> 
>>> Cheers,
>>> Mike Gagnon
>>> 
> 







Re: Nodes cannot recover and become unavailable

2012-09-19 Thread Sami Siren
also, did you re create the cluster after upgrading to a newer
version? I believe there were some changes made to the
clusterstate.json recently that are not backwards compatible.

--
 Sami Siren



On Wed, Sep 19, 2012 at 6:21 PM, Sami Siren  wrote:
> Hi,
>
> I am having troubles understanding the reason for that NPE.
>
> First you could try removing the line #102 in HttpClientUtility so
> that logging does not prevent creation of the http client in
> SyncStrategy.
>
> --
>  Sami Siren
>
> On Wed, Sep 19, 2012 at 5:29 PM, Markus Jelsma
>  wrote:
>> Hi,
>>
>> Since the 2012-09-17 11:10:41 build shards start to have trouble coming back 
>> online. When i restart one node the slices on the other nodes are throwing 
>> exceptions and cannot be queried. I'm not sure how to remedy the problem but 
>> stopping a node or restarting it a few times seems to help it. The problem 
>> is when i restart a node, and it happens, i must not restart another node 
>> because that may trigger other slices becoming unavailable.
>>
>> Here are some parts of the log:
>>
>> 2012-09-19 14:13:18,149 ERROR [solr.cloud.RecoveryStrategy] - 
>> [RecoveryThread] - : Recovery failed - trying again... core=oi_i
>> 2012-09-19 14:13:25,818 WARN [solr.cloud.RecoveryStrategy] - 
>> [main-EventThread] - : Stopping recovery for 
>> zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
>> 2012-09-19 14:13:44,497 WARN [solr.cloud.RecoveryStrategy] - [Thread-4] - : 
>> Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_jcore=oi_j
>> 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
>> [RecoveryThread] - : Error while trying to recover. 
>> core=oi_i:org.apache.solr.common.SolrException: We are not the leader
>> at 
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402)
>> at 
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
>> at 
>> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199)
>> at 
>> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388)
>> at 
>> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
>>
>> 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
>> [RecoveryThread] - : Recovery failed - trying again... core=oi_i
>> 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
>> [RecoveryThread] - : Recovery failed - max retries exceeded. core=oi_i
>> 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
>> [RecoveryThread] - : Recovery failed - I give up. core=oi_i
>> 2012-09-19 14:14:00,333 WARN [solr.cloud.RecoveryStrategy] - 
>> [RecoveryThread] - : Stopping recovery for 
>> zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
>>  ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : Sync request 
>> error: java.lang.NullPointerException
>>  ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : 
>> http://nl10.host:8080/solr/oi_i/: Could not tell a replica to 
>> recover:java.lang.NullPointerException
>> at 
>> org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305)
>> at 
>> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:102)
>> at 
>> org.apache.solr.client.solrj.impl.HttpSolrServer.(HttpSolrServer.java:155)
>> at 
>> org.apache.solr.client.solrj.impl.HttpSolrServer.(HttpSolrServer.java:128)
>> at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:262)
>> at 
>> org.apache.solr.cloud.SyncStrategy.requestRecovery(SyncStrategy.java:272)
>> at org.apache.solr.cloud.SyncStrategy.syncToMe(SyncStrategy.java:203)
>> at 
>> org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:125)
>> at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:87)
>> at 
>> org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:169)
>> at 
>> org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158)
>> at 
>> org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102)
>> at 
>> org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:275)
>> at 
>> org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:326)
>> at 
>> org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:159)
>> at 
>> org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158)
>> at 
>> org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102)
>> at 
>> org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:56)
>> at 
>> org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:131)
>> at 
>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
>> at 
>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.j

Re: SOLR memory usage jump in JVM

2012-09-19 Thread Walter Underwood
Ooh, that is a nasty one. Is this JDK 7 only or also in 6?

It looks like the "-XX:ConcGCThreads=1" option is a workaround, is that right?

We've had some 1.6 JVMs behave in the same way that bug describes, but I 
haven't verified it is because of finalizer problems.

wunder

On Sep 19, 2012, at 5:43 AM, Erick Erickson wrote:

> Two in one morning
> 
> The JVM bug I'm familiar with is here:
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034
> 
> FWIW,
> Erick
> 
> On Wed, Sep 19, 2012 at 8:20 AM, Shawn Heisey  wrote:
>> On 9/18/2012 9:29 PM, Lance Norskog wrote:
>>> 
>>> There is a known JVM garbage collection bug that causes this. It has to do
>>> with reclaiming Weak references, I think in WeakHashMap. Concurrent garbage
>>> collection collides with this bug and the result is that old field cache
>>> data is retained after closing the index. The bug is more common with more
>>> processors doing GC simultaneously.
>>> 
>>> The symptom is that when you run a monitor, the memory usage rises to a
>>> peak, drops to a floor, rises again in the classic sawtooth pattern. When
>>> the GC bug happens, the ceiling becomes the floor, and the sawtooth goes
>>> from the new floor to a new ceiling. The two sizes are the same. So, 2G to
>>> 5G, over and over, suddenly it is 5G to 8G, over and over.
>>> 
>>> The bug is fixed in recent Java 7 releases. I'm sorry, but I cannot find
>>> the bug number.
>> 
>> 
>> I think I ran into this when I was looking at memory usage on my SolrJ
>> indexing program.  Under Java6, memory usage in jconsole (remotely via JMX)
>> was fairly constant long-term (aside from the unavoidable sawtooth).  When I
>> ran it under Java 7u3, it would continually grow, slowly ... but if I
>> measured it with jstat on the Linux commandline rather than remotely via
>> jconsole under windows, memory usage was consistent over time, just like
>> under java6 with the remote jconsole.  After looking at heap dumps and
>> scratching my head a lot, I finally concluded that I did not have a memory
>> leak, there was a problem with remote JMX monitoring in java7.  Glad to hear
>> I was not imagining it, and that it's fixed now.
>> 
>> Thanks,
>> Shawn
>> 

--
Walter Underwood
wun...@wunderwood.org





Re: SOLR memory usage jump in JVM

2012-09-19 Thread Rozdev29
I have used this setting to reduce gc pauses with CMS - java 6 u23

XX:+ParallelRefProcEnabled

With this setting, jvm does gc of weakrefs with multiple threads and pauses are 
low.

Please use this option only when you have multiple cores.

For me, CMS gives better results

Sent from my iPhone

On Sep 19, 2012, at 8:50 AM, Walter Underwood  wrote:

> Ooh, that is a nasty one. Is this JDK 7 only or also in 6?
> 
> It looks like the "-XX:ConcGCThreads=1" option is a workaround, is that right?
> 
> We've had some 1.6 JVMs behave in the same way that bug describes, but I 
> haven't verified it is because of finalizer problems.
> 
> wunder
> 
> On Sep 19, 2012, at 5:43 AM, Erick Erickson wrote:
> 
>> Two in one morning
>> 
>> The JVM bug I'm familiar with is here:
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034
>> 
>> FWIW,
>> Erick
>> 
>> On Wed, Sep 19, 2012 at 8:20 AM, Shawn Heisey  wrote:
>>> On 9/18/2012 9:29 PM, Lance Norskog wrote:
 
 There is a known JVM garbage collection bug that causes this. It has to do
 with reclaiming Weak references, I think in WeakHashMap. Concurrent garbage
 collection collides with this bug and the result is that old field cache
 data is retained after closing the index. The bug is more common with more
 processors doing GC simultaneously.
 
 The symptom is that when you run a monitor, the memory usage rises to a
 peak, drops to a floor, rises again in the classic sawtooth pattern. When
 the GC bug happens, the ceiling becomes the floor, and the sawtooth goes
 from the new floor to a new ceiling. The two sizes are the same. So, 2G to
 5G, over and over, suddenly it is 5G to 8G, over and over.
 
 The bug is fixed in recent Java 7 releases. I'm sorry, but I cannot find
 the bug number.
>>> 
>>> 
>>> I think I ran into this when I was looking at memory usage on my SolrJ
>>> indexing program.  Under Java6, memory usage in jconsole (remotely via JMX)
>>> was fairly constant long-term (aside from the unavoidable sawtooth).  When I
>>> ran it under Java 7u3, it would continually grow, slowly ... but if I
>>> measured it with jstat on the Linux commandline rather than remotely via
>>> jconsole under windows, memory usage was consistent over time, just like
>>> under java6 with the remote jconsole.  After looking at heap dumps and
>>> scratching my head a lot, I finally concluded that I did not have a memory
>>> leak, there was a problem with remote JMX monitoring in java7.  Glad to hear
>>> I was not imagining it, and that it's fixed now.
>>> 
>>> Thanks,
>>> Shawn
>>> 
> 
> --
> Walter Underwood
> wun...@wunderwood.org
> 
> 
> 


Journey to enable highlighting

2012-09-19 Thread Spadez
Hi,

So I want to enable highlighting on my results. When I run the query like
this:

http://localhost:8080/solr/select?q=book&hl=true

I don't get any highlighted results. I am assuming that more is needed to
actually enable highlighting. Commented out at the bottom of my
solrconfig.xml is this:

*Default configuration in a requestHandler would look like:
   
 query
 facet
 mlt
 highlight
 stats
 debug
   *

Is it right that to enable highlighting, I would need to uncomment out this
section, to leave just this shown?:

*Default configuration in a requestHandler would look like:
   
   highlight
   *





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Journey-to-enable-highlighting-tp4008952.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Journey to enable highlighting

2012-09-19 Thread Ahmet Arslan
> So I want to enable highlighting on my results. When I run
> the query like
> this:
> 
> http://localhost:8080/solr/select?q=book&hl=true

Try explicitly setting the field(s) that you want to highlight.
To enable highlighting your field must me stored="true" 

See : http://wiki.apache.org/solr/FieldOptionsByUseCase
 
Examine other parameters : http://wiki.apache.org/solr/HighlightingParameters

For example, sometimes you may need to increase hl.maxAnalyzedChars .

> I don't get any highlighted results. I am assuming that more
> is needed to
> actually enable highlighting. Commented out at the bottom of
> my
> solrconfig.xml is this:
> 
> *Default configuration in a requestHandler would look like:
>         name="components">
>      
>    query
>      
>    facet
>      
>    mlt
>      
>    highlight
>      
>    stats
>      
>    debug
>        *
> 
> Is it right that to enable highlighting, I would need to
> uncomment out this
> section, to leave just this shown?:
> 
> *Default configuration in a requestHandler would look like:
>         name="components">
>        
>    highlight
>        *

You do not need to change/edit  section. highlighting 
component (and query, facet, mlt etc) is registered by default.


Re: missing a directory, can not process pdf files

2012-09-19 Thread xxxx xxxx
user:~/solr/example/exampledocs$ java -jar post.jar test.pdf doesnt work

Index binary documents such as Word and PDF with Solr Cell 
(ExtractingRequestHandler).

how do i do his?

http://lucene.apache.org/solr/api-4_0_0-BETA/doc-files/tutorial.html


http://wiki.apache.org/solr/ExtractingRequestHandler

it says solr 1.4?

curl is not installed normally so how do we do this like with post.jar?
also the docs dir is not existing, seems very outdated?

"using "curl" or other command line tools to post documents to Solr is nice for 
testing, but not the recommended update method for best performance."

what then?


far below there:

java -Durl=http://localhost:8983/solr/update/extract -Dparams=literal.id=doc5 
-Dtype=text/html -jar post.jar tutorial.html


is this the right?

java -Dauto -jar post.jar tutorial.html
java -Dauto -Drecursive -jar post.jar .

"NOTE: The post.jar utility is not meant for production use"
so how do we normally do this or  should do this?
 Original-Nachricht 
> Datum: Wed, 19 Sep 2012 10:51:29 -0400
> Von: Erik Hatcher 
> An: solr-user@lucene.apache.org
> Betreff: Re: missing a directory, can not process pdf files

> There's nothing in that tutorial that mentions an update "directory". 
> /update is a URL endpoint that requires Solr be up and running.
> 
> Please post the entire set of steps that you're trying and the exact
> (copy/pasted) error messages you're receiving.
> 
> And once you index a PDF file, you don't retrieve the file back from Solr,
> you retrieve search results.  The original file is where it was indexed
> from, not inside Solr.  What you'll get back is the file name (if you stored
> it, that is).
> 
>   Erik
> 
> On Sep 19, 2012, at 10:40 ,   wrote:
> 
> > I want to process a pdf file see "Indexing Data" from
> http://lucene.apache.org/solr/api-4_0_0-BETA/doc-files/tutorial.html
> > 
> > the directory "update" doesnt even exist:
> > SimplePostTool: POSTing files to http://localhost:8983/solr/update..
> > 
> > fails because the /update directory is not there and also has no
> contents (and is missing in the repos on github and so on)
> > 
> > how can we retrieve the files when we do a query which contain the
> searched query?
> >  Original-Nachricht 
> >> Datum: Wed, 19 Sep 2012 08:33:57 -0400
> >> Von: Erick Erickson 
> >> An: solr-user@lucene.apache.org
> >> Betreff: Re: missing a directory, can not process pdf files
> > 
> >> Please review:
> >> 
> >> http://wiki.apache.org/solr/UsingMailingLists
> >> 
> >> There's nothing in your problem statement that's diagnosable. What did
> >> you try? What
> >> were the results? Details matter.
> >> 
> >> 4.0 is in process of being prepped for release. 30 days was a
> >> straw-man proposal.
> >> 
> >> Best
> >> Erick
> >> 
> >> On Wed, Sep 19, 2012 at 3:46 AM,    wrote:
> >>> seems the /update directory is missing? I use solr 4.0.0 beta
> >>> can not process pdf files because of it
> >>> 
> >>> also when will the final version be released? thought it it 30 days
> >> after beta?
> >>> 
> >>> how can we get the files which contain the searched queries / content?
> >>> 
> >>> 
> 


Search by field with the space in it

2012-09-19 Thread Aleksey Vorona

Hi,

I have a field with space in its name (that is a dynamic field). How can 
I execute search on it?


I tried "q=aattr_box%20%type_sc:super" and it did not work

The field name is "aattr_box type"

-- Aleksey


Re: Search by field with the space in it

2012-09-19 Thread Ahmet Arslan

> I have a field with space in its name (that is a dynamic
> field). How can I execute search on it?
> 
> I tried "q=aattr_box%20%type_sc:super" and it did not work
> 
> The field name is "aattr_box type"

How about q=aattr_box\ type_sc:super


Re: Search by field with the space in it

2012-09-19 Thread Aleksey Vorona

On 12-09-19 11:04 AM, Ahmet Arslan wrote:

I have a field with space in its name (that is a dynamic
field). How can I execute search on it?

I tried "q=aattr_box%20%type_sc:super" and it did not work

The field name is "aattr_box type"

How about q=aattr_box\ type_sc:super


That works! Thank you!

Sidenote: of course I urlencode space.

-- Aleksey


Understanding fieldCache SUBREADER "insanity"

2012-09-19 Thread Aaron Daubman
Hi all,

In reviewing a solr instance with somewhat variable performance, I
noticed that its fieldCache stats show an insanity_count of 1 with the
insanity type SUBREADER:

---snip---
insanity_count : 1
insanity#0 : SUBREADER: Found caches for descendants of
ReadOnlyDirectoryReader(segments_k
_6h9(3.3):C17198463)+tf_normalizedTotalHotttnesss
'ReadOnlyDirectoryReader(segments_k
_6h9(3.3):C17198463)'=>'tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=>[F#1965982057
'ReadOnlyDirectoryReader(segments_k
_6h9(3.3):C17198463)'=>'tf_normalizedTotalHotttnesss',float,null=>[F#1965982057
'MMapIndexInput(path="/io01/p/solr/playlist/a/playlist/index/_6h9.frq")'=>'tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=>[F#1308116426
---snip---

How can I decipher what this means and what, if anything, I should do
to fix/improve the "insanity"?

Thanks,
 Aaron


Re: missing a directory, can not process pdf files

2012-09-19 Thread Ahmet Arslan
> user:~/solr/example/exampledocs$ java
> -jar post.jar test.pdf doesnt work
> 
> Index binary documents such as Word and PDF with Solr Cell
> (ExtractingRequestHandler).

> how do i do his?
> 
> http://lucene.apache.org/solr/api-4_0_0-BETA/doc-files/tutorial.html
> 
> 
> http://wiki.apache.org/solr/ExtractingRequestHandler
> 
> it says solr 1.4?
> 
> curl is not installed normally so how do we do this like
> with post.jar?
> also the docs dir is not existing, seems very outdated?
> 
> "using "curl" or other command line tools to post documents
> to Solr is nice for testing, but not the recommended update
> method for best performance."
> 
> what then?
> 
> 
> far below there:
> 
> java -Durl=http://localhost:8983/solr/update/extract
> -Dparams=literal.id=doc5 -Dtype=text/html -jar post.jar
> tutorial.html
> 
> 
> is this the right?
> 
> java -Dauto -jar post.jar tutorial.html
> java -Dauto -Drecursive -jar post.jar .
> 
> "NOTE: The post.jar utility is not meant for production
> use"
> so how do we normally do this or  should do this?

I haven't used post.jar to index rich documents. This is new feature of solr 
4.0. To index rich documents you can use one of these : 

http://wiki.apache.org/solr/ContentStreamUpdateRequestExample
http://wiki.apache.org/solr/TikaEntityProcessor
http://searchhub.org/dev/2012/02/14/indexing-with-solrj/




SpanNearQuery distance issue

2012-09-19 Thread vempap
Hello All,

I've a issue with respect to the distance measure of SpanNearQuery in
Lucene. Let's say I've following two documents:

DocID: 6, cotent:"1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1001
1002 1003 1004 1005 1006 1007 1008 1009 1100", 
DocID: 7, content:"a b c d e a b c f g h i j k l m l k j z z z"

If my span query is :
a) "3n(a,e)" - It matches doc 7
But, if it is:
b) "3n(1,5)" - It does not match doc 6
If query is:
c) "4n(1,5)" - it matches doc 6

I have no clue why a) works rather not b). I tried to debug the code, but
couldn't figure it out.

Any help ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SpanNearQuery-distance-issue-tp4008973.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SpanNearQuery distance issue

2012-09-19 Thread Ahmet Arslan

> I've a issue with respect to the distance measure of
> SpanNearQuery in
> Lucene. Let's say I've following two documents:
> 
> DocID: 6, cotent:"1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
> 18 19 20 1001
> 1002 1003 1004 1005 1006 1007 1008 1009 1100", 
> DocID: 7, content:"a b c d e a b c f g h i j k l m l k j z z
> z"
> 
> If my span query is :
> a) "3n(a,e)" - It matches doc 7
> But, if it is:
> b) "3n(1,5)" - It does not match doc 6
> If query is:
> c) "4n(1,5)" - it matches doc 6
> 
> I have no clue why a) works rather not b). I tried to debug
> the code, but
> couldn't figure it out.

a) works be because, doc7 has .. e a .. and n is an unordered operator.

http://searchhub.org/dev/2009/02/22/exploring-query-parsers/ 


Solr - Keep Punctuation in Parsed Phrase Query

2012-09-19 Thread Daisy
Hi;

I am working with apache-solr-3.6.0 on windows machine. I would like to
search for phrases which contain punctuation marks. Example:

"He said: Hi"
I tried to escape the punctuation marks using \ so my url was:

http://localhost:8983/solr/select/?q="He%20said\:%20Hi"&version=2.2&start=0&rows=10&indent=on&debugQuery=true
But I discovered that solr trim the punctuation in the parsed query and the
result was:

"He said\: Hi"
"He said\: Hi"
PhraseQuery(text:"he said hi")
text:"he said hi"
So How could I query a phrase without trimming the punctuation marks?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Keep-Punctuation-in-Parsed-Phrase-Query-tp4008977.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SpanNearQuery distance issue

2012-09-19 Thread vempap
Shoot me. Thanks, I did not notice that the doc has ".. e a .." in the
content. Thanks again for immediate reply :)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SpanNearQuery-distance-issue-tp4008973p4008978.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr - Keep Punctuation in Parsed Phrase Query

2012-09-19 Thread Ahmet Arslan
> I am working with apache-solr-3.6.0 on windows machine. I
> would like to
> search for phrases which contain punctuation marks.
> Example:
> 
> "He said: Hi"
> I tried to escape the punctuation marks using \ so my url
> was:
> 
> http://localhost:8983/solr/select/?q="He%20said\:%20Hi"&version=2.2&start=0&rows=10&indent=on&debugQuery=true
> But I discovered that solr trim the punctuation in the
> parsed query and the
> result was:
> 
> "He said\: Hi"
> "He said\: Hi"
> PhraseQuery(text:"he said
> hi")
> text:"he said
> hi"
> So How could I query a phrase without trimming the
> punctuation marks?

fieldType of text is doing this. You need a fieldType (analyzer) that does not 
strip punctuations. For example text_ws 


Re: missing a directory, can not process pdf files

2012-09-19 Thread xxxx xxxx
So I have to create a java file and compile it just for this purpose? like 
http://wiki.apache.org/solr/ContentStreamUpdateRequestExample?
No way to do this via post.jar (and without curl? or an other already existing 
implementation via command line ...) also there is no way mentioned how it can 
be done without curl even they say we should not use curl?
 Original-Nachricht 
> Datum: Wed, 19 Sep 2012 11:23:25 -0700 (PDT)
> Von: Ahmet Arslan 
> An: solr-user@lucene.apache.org
> Betreff: Re: missing a directory, can not process pdf files

> > user:~/solr/example/exampledocs$ java
> > -jar post.jar test.pdf doesnt work
> > 
> > Index binary documents such as Word and PDF with Solr Cell
> > (ExtractingRequestHandler).
> 
> > how do i do his?
> > 
> > http://lucene.apache.org/solr/api-4_0_0-BETA/doc-files/tutorial.html
> > 
> > 
> > http://wiki.apache.org/solr/ExtractingRequestHandler
> > 
> > it says solr 1.4?
> > 
> > curl is not installed normally so how do we do this like
> > with post.jar?
> > also the docs dir is not existing, seems very outdated?
> > 
> > "using "curl" or other command line tools to post documents
> > to Solr is nice for testing, but not the recommended update
> > method for best performance."
> > 
> > what then?
> > 
> > 
> > far below there:
> > 
> > java -Durl=http://localhost:8983/solr/update/extract
> > -Dparams=literal.id=doc5 -Dtype=text/html -jar post.jar
> > tutorial.html
> > 
> > 
> > is this the right?
> > 
> > java -Dauto -jar post.jar tutorial.html
> > java -Dauto -Drecursive -jar post.jar .
> > 
> > "NOTE: The post.jar utility is not meant for production
> > use"
> > so how do we normally do this or  should do this?
> 
> I haven't used post.jar to index rich documents. This is new feature of
> solr 4.0. To index rich documents you can use one of these : 
> 
> http://wiki.apache.org/solr/ContentStreamUpdateRequestExample
> http://wiki.apache.org/solr/TikaEntityProcessor
> http://searchhub.org/dev/2012/02/14/indexing-with-solrj/
> 
> 


Re: Search by field with the space in it

2012-09-19 Thread Erick Erickson
I would _really_ recommend that you re-do your schema and
take spaces out of your field names. That may require that
you change your indexing program to not send spaces in dynamic
field names

This is the kind of thing that causes endless headaches as time
goes forward.

You don't _have_ to, but I predict you'll regret if if you don't .

Best
Erick

On Wed, Sep 19, 2012 at 2:11 PM, Aleksey Vorona  wrote:
> On 12-09-19 11:04 AM, Ahmet Arslan wrote:
>>>
>>> I have a field with space in its name (that is a dynamic
>>> field). How can I execute search on it?
>>>
>>> I tried "q=aattr_box%20%type_sc:super" and it did not work
>>>
>>> The field name is "aattr_box type"
>>
>> How about q=aattr_box\ type_sc:super
>>
> That works! Thank you!
>
> Sidenote: of course I urlencode space.
>
> -- Aleksey


Re: SpanNearQuery distance issue

2012-09-19 Thread Erick Erickson
BANG!

On Wed, Sep 19, 2012 at 2:38 PM, vempap  wrote:
> Shoot me. Thanks, I did not notice that the doc has ".. e a .." in the
> content. Thanks again for immediate reply :)
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SpanNearQuery-distance-issue-tp4008973p4008978.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: missing a directory, can not process pdf files

2012-09-19 Thread Chris Hostetter

: user:~/solr/example/exampledocs$ java -jar post.jar test.pdf doesnt work

1) you can use post.jar to send PDFs, but you have to use the option to 
tell solr you are sending a PDF file - because by default it assumes you 
are posting XML.  you can see the problem by looking at the output from 
post.jar and the solr logs...

hossman@frisbee:~/tmp/solr-4.0-BETA/bin-zip/apache-solr-4.0.0-BETA/example/exampledocs$
 java -jar post.jar /tmp/test.pdf 
SimplePostTool version 1.5
Posting files to base url http://localhost:8983/solr/update using content-type 
application/xml..
...

And in the Solr logs...

...
SEVERE: org.apache.solr.common.SolrException: Invalid UTF-8 middle byte 
0xe3 (at char #10, byte #-1)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:159)
...

...if you specify the type things should work fine on the clinet side.

As for the Server side...

2) by default Solr's "/update" handler supports Solr Documents in XML, 
JSON, CSV, and JavaBin.  If you wnat to use the "ExtractingRequestHandler" 
to parse rich documents you just have to change the URL exactly as noted 
in the wiki you mentioned ("-Durl=http://localhost:8983/solr/update/extract";)


-Hoss


Re: Solr - Keep Punctuation in Parsed Phrase Query

2012-09-19 Thread Daisy
Thanks, it worked after editing my schema line to:



But I wonder what "text_ws" means?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Keep-Punctuation-in-Parsed-Phrase-Query-tp4008977p4008984.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr - Keep Punctuation in Parsed Phrase Query

2012-09-19 Thread Ahmet Arslan

> But I wonder what "text_ws" means?

it means text whitespace you can find its definition in schema.xml. search for 
'text_ws' It uses whitespacetokenizer. 


Re: missing a directory, can not process pdf files

2012-09-19 Thread xxxx xxxx
1) so how does this look like for example?
2) without curl? how does this look like? i am very confused because they use 
curl in the example but say at the same time that we should not use curl. also 
i have not installed curl
 Original-Nachricht 
> Datum: Wed, 19 Sep 2012 11:47:54 -0700 (PDT)
> Von: Chris Hostetter 
> An: solr-user@lucene.apache.org
> Betreff: Re: missing a directory, can not process pdf files

> 
> : user:~/solr/example/exampledocs$ java -jar post.jar test.pdf doesnt work
> 
> 1) you can use post.jar to send PDFs, but you have to use the option to 
> tell solr you are sending a PDF file - because by default it assumes you 
> are posting XML.  you can see the problem by looking at the output from 
> post.jar and the solr logs...
> 
> hossman@frisbee:~/tmp/solr-4.0-BETA/bin-zip/apache-solr-4.0.0-BETA/example/exampledocs$
> java -jar post.jar /tmp/test.pdf 
> SimplePostTool version 1.5
> Posting files to base url http://localhost:8983/solr/update using
> content-type application/xml..
> ...
> 
> And in the Solr logs...
> 
> ...
> SEVERE: org.apache.solr.common.SolrException: Invalid UTF-8 middle byte 
> 0xe3 (at char #10, byte #-1)
>   at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:159)
> ...
> 
> ...if you specify the type things should work fine on the clinet side.
> 
> As for the Server side...
> 
> 2) by default Solr's "/update" handler supports Solr Documents in XML, 
> JSON, CSV, and JavaBin.  If you wnat to use the "ExtractingRequestHandler"
> to parse rich documents you just have to change the URL exactly as noted 
> in the wiki you mentioned
> ("-Durl=http://localhost:8983/solr/update/extract";)
> 
> 
> -Hoss


solr4 and dataimporthandler

2012-09-19 Thread Ramo Karahasan
Hi,

 

i've set up and solr4 application and wanted to do an dataimport via DB. So
I'm using the DIH for the solr4 beta.  If I go to the datahandler in the
admin/ panel, and click on import, I get the following messages in the log:

 

SEVERE: Full Import failed:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Could not
load driver: com.mysql.jdbc.Driver Processing Document # 1

at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273)

at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.ja
va:382)

at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448
)

at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)

Caused by: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Could not
load driver: com.mysql.jdbc.Driver Processing Document # 1

at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:
413)

at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:326
)

at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:234)

... 3 more

Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
Could not load driver: com.mysql.jdbc.Driver Processing Document # 1

at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(D
ataImportHandlerException.java:71)

at
org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(Jd
bcDataSource.java:114)

at
org.apache.solr.handler.dataimport.JdbcDataSource.init(JdbcDataSource.java:6
2)

at
org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataIm
porter.java:354)

at
org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl.jav
a:99)

at
org.apache.solr.handler.dataimport.SqlEntityProcessor.init(SqlEntityProcesso
r.java:53)

at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcess
orWrapper.java:74)

at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:
430)

at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:
411)

... 5 more

Caused by: java.lang.ClassNotFoundException: Unable to load
com.mysql.jdbc.Driver or
org.apache.solr.handler.dataimport.com.mysql.jdbc.Driver

at
org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:899)

at
org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(Jd
bcDataSource.java:112)

... 12 more

Caused by: org.apache.solr.common.SolrException: Error loading class
'com.mysql.jdbc.Driver'

at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:43
8)

at
org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:889)

... 13 more

Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver

at java.net.URLClassLoader$1.run(URLClassLoader.java:217)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:205)

at java.lang.ClassLoader.loadClass(ClassLoader.java:321)

at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:615)

at java.lang.ClassLoader.loadClass(ClassLoader.java:266)

at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:334)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:264)

at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:42
2)

... 14 more

 

Sep 19, 2012 8:58:59 PM org.apache.solr.update.DirectUpdateHandler2 rollback

INFO: start rollback{flags=0,_version_=0}

Sep 19, 2012 8:58:59 PM org.apache.solr.handler.dataimport.SolrWriter
rollback

SEVERE: Exception while solr rollback.

java.lang.NullPointerException

at
org.apache.solr.update.DefaultSolrCoreState.rollbackIndexWriter(DefaultSolrC
oreState.java:173)

at
org.apache.solr.update.DirectUpdateHandler2.rollbackWriter(DirectUpdateHandl
er2.java:150)

at
org.apache.solr.update.DirectUpdateHandler2.rollback(DirectUpdateHandler2.ja
va:625)

at
org.apache.solr.update.processor.RunUpdateProcessor.processRollback(RunUpdat
eProcessorFactory.java:98)

at
org.apache.solr.update.processor.UpdateRequestProcessor.processRollback(Upda
teRequestProcessor.java:72)

at
org.apache.solr.update.processor.LogUpdateProcessor.processRollback(LogUpdat
eProcessorFactory.java:170)

at
org.apache.solr.handler.dataimport.SolrWriter.rollback(SolrWriter.java:117)

at
org.apache.solr.handler.dataimport.DocBuilder.rollback(DocBuilder.java:319)

at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo

Re: solr4 and dataimporthandler

2012-09-19 Thread Ahmet Arslan
> SEVERE: Full Import failed:java.lang.RuntimeException:
> java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> Could not
> load driver: com.mysql.jdbc.Driver Processing Document # 1

You need to put mysql-connector-java-5.1.*.jar to lib folder.


Conditionally apply synonyms?

2012-09-19 Thread Carrie Coy
Is there an existing TokenFilterFactory that can conditionally insert 
synonyms based on a given document attribute, say category? Some 
synonyms only make sense in context: "bats" in Sports is different from 
"bats" in "Party and Novelty".


It seems the synonyms.txt file would need an additional column that 
could be checked against the document attribute prior to appending synonyms:


*#synonymscategory*
post,polesports
wheel,caster   furniture
pat,paddy,patrick   holiday

Is anything like this possible without writing a custom TokenFilterFactory?


Re: Understanding fieldCache SUBREADER "insanity"

2012-09-19 Thread Tomás Fernández Löbbe
Hi Aaron, here there is some information about the "insanity count":
http://wiki.apache.org/solr/SolrCaching#The_Lucene_FieldCache

As for the SUBREADER type, the javadocs say:
"Indicates an overlap in cache usage on a given field in sub/super readers."

This probably means that you are using the same field for faceting and for
sorting (tf_normalizedTotalHotttnesss), sorting uses the segment level
cache and faceting uses by default the global field cache. This can be a
problem because the field is duplicated in cache, and then it uses twice
the memory.

One way to solve this would be to change the faceting method on that field
to 'fcs', which uses segment level cache (but may be a little bit slower).

Tomás


On Wed, Sep 19, 2012 at 3:16 PM, Aaron Daubman  wrote:

> Hi all,
>
> In reviewing a solr instance with somewhat variable performance, I
> noticed that its fieldCache stats show an insanity_count of 1 with the
> insanity type SUBREADER:
>
> ---snip---
> insanity_count : 1
> insanity#0 : SUBREADER: Found caches for descendants of
> ReadOnlyDirectoryReader(segments_k
> _6h9(3.3):C17198463)+tf_normalizedTotalHotttnesss
> 'ReadOnlyDirectoryReader(segments_k
>
> _6h9(3.3):C17198463)'=>'tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=>[F#1965982057
> 'ReadOnlyDirectoryReader(segments_k
>
> _6h9(3.3):C17198463)'=>'tf_normalizedTotalHotttnesss',float,null=>[F#1965982057
>
> 'MMapIndexInput(path="/io01/p/solr/playlist/a/playlist/index/_6h9.frq")'=>'tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=>[F#1308116426
> ---snip---
>
> How can I decipher what this means and what, if anything, I should do
> to fix/improve the "insanity"?
>
> Thanks,
>  Aaron
>


Re: Understanding fieldCache SUBREADER "insanity"

2012-09-19 Thread Aaron Daubman
Hi Tomás,

> This probably means that you are using the same field for faceting and for
> sorting (tf_normalizedTotalHotttnesss), sorting uses the segment level
> cache and faceting uses by default the global field cache. This can be a
> problem because the field is duplicated in cache, and then it uses twice
> the memory.
>
> One way to solve this would be to change the faceting method on that field
> to 'fcs', which uses segment level cache (but may be a little bit slower).

Thanks for explaining what the sparse wiki and javadoc mean - I had
read them but had no idea what the implications were ;-)

We are not doing any explicit faceting, and this index is also
supposed to be a read-only, already-optimized, single-segment index -
both of these seem to indicate to (very unknowledgeable about this) me
that this could be more of a problem - e.g. what am I doing to cause
this since I don't think I need to be using segment-level anything
(should be a single segment if I understand optimization and RO
indicies) and I am not leveraging faceting?

Any pointers on where else to look for what might be causing this (one
issue I am currently troubleshooting is too-many-pauses caused by
too-frequent GC, so preventing this double-allocation could help)?

Thanks again,
 Aaron


Re: Understanding fieldCache SUBREADER "insanity"

2012-09-19 Thread Yonik Seeley
The other thing to realize is that it's only "insanity" if it's
unexpected or not-by-design (so the term is rather mis-named).
It's more for core developers - if you are just using Solr without
custom plugins, don't worry about it.

-Yonik
http://lucidworks.com


On Wed, Sep 19, 2012 at 3:27 PM, Tomás Fernández Löbbe
 wrote:
> Hi Aaron, here there is some information about the "insanity count":
> http://wiki.apache.org/solr/SolrCaching#The_Lucene_FieldCache
>
> As for the SUBREADER type, the javadocs say:
> "Indicates an overlap in cache usage on a given field in sub/super readers."
>
> This probably means that you are using the same field for faceting and for
> sorting (tf_normalizedTotalHotttnesss), sorting uses the segment level
> cache and faceting uses by default the global field cache. This can be a
> problem because the field is duplicated in cache, and then it uses twice
> the memory.
>
> One way to solve this would be to change the faceting method on that field
> to 'fcs', which uses segment level cache (but may be a little bit slower).
>
> Tomás
>
>
> On Wed, Sep 19, 2012 at 3:16 PM, Aaron Daubman  wrote:
>
>> Hi all,
>>
>> In reviewing a solr instance with somewhat variable performance, I
>> noticed that its fieldCache stats show an insanity_count of 1 with the
>> insanity type SUBREADER:
>>
>> ---snip---
>> insanity_count : 1
>> insanity#0 : SUBREADER: Found caches for descendants of
>> ReadOnlyDirectoryReader(segments_k
>> _6h9(3.3):C17198463)+tf_normalizedTotalHotttnesss
>> 'ReadOnlyDirectoryReader(segments_k
>>
>> _6h9(3.3):C17198463)'=>'tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=>[F#1965982057
>> 'ReadOnlyDirectoryReader(segments_k
>>
>> _6h9(3.3):C17198463)'=>'tf_normalizedTotalHotttnesss',float,null=>[F#1965982057
>>
>> 'MMapIndexInput(path="/io01/p/solr/playlist/a/playlist/index/_6h9.frq")'=>'tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=>[F#1308116426
>> ---snip---
>>
>> How can I decipher what this means and what, if anything, I should do
>> to fix/improve the "insanity"?
>>
>> Thanks,
>>  Aaron
>>


LotsOfCores : Any alternative approaches till it's ready

2012-09-19 Thread vybe3142
LotsOfCores ( http://wiki.apache.org/solr/LotsOfCores ) is intended to
dynamically juggle loading (and unloading ) required cores  where the total
number of cores is very large.

We're approaching that situation, but it looks like LotsOfCores isn't quite
ready for prime time yet. Are there any other alternative approaches that we
can use (running SOLR 4.0.BETA currently)

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/LotsOfCores-Any-alternative-approaches-till-it-s-ready-tp4009004.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Nodes cannot recover and become unavailable

2012-09-19 Thread Mark Miller
bq. I believe there were some changes made to the clusterstate.json
recently that are not backwards compatible.

Indeed - I think yonik committed something the other day - we prob
should send an email out about this. Not sure exactly how easy an
upgrade is or what steps to take - it may be something like stop your
whole cluster, delete clusterstate.json and then it works, or it may
take more or less than that - if that's the issue here, i don't know,
but it's likely an issue.

On Wed, Sep 19, 2012 at 8:41 AM, Sami Siren  wrote:
> also, did you re create the cluster after upgrading to a newer
> version? I believe there were some changes made to the
> clusterstate.json recently that are not backwards compatible.
>
> --
>  Sami Siren
>
>
>
> On Wed, Sep 19, 2012 at 6:21 PM, Sami Siren  wrote:
>> Hi,
>>
>> I am having troubles understanding the reason for that NPE.
>>
>> First you could try removing the line #102 in HttpClientUtility so
>> that logging does not prevent creation of the http client in
>> SyncStrategy.
>>
>> --
>>  Sami Siren
>>
>> On Wed, Sep 19, 2012 at 5:29 PM, Markus Jelsma
>>  wrote:
>>> Hi,
>>>
>>> Since the 2012-09-17 11:10:41 build shards start to have trouble coming 
>>> back online. When i restart one node the slices on the other nodes are 
>>> throwing exceptions and cannot be queried. I'm not sure how to remedy the 
>>> problem but stopping a node or restarting it a few times seems to help it. 
>>> The problem is when i restart a node, and it happens, i must not restart 
>>> another node because that may trigger other slices becoming unavailable.
>>>
>>> Here are some parts of the log:
>>>
>>> 2012-09-19 14:13:18,149 ERROR [solr.cloud.RecoveryStrategy] - 
>>> [RecoveryThread] - : Recovery failed - trying again... core=oi_i
>>> 2012-09-19 14:13:25,818 WARN [solr.cloud.RecoveryStrategy] - 
>>> [main-EventThread] - : Stopping recovery for 
>>> zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
>>> 2012-09-19 14:13:44,497 WARN [solr.cloud.RecoveryStrategy] - [Thread-4] - : 
>>> Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_jcore=oi_j
>>> 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
>>> [RecoveryThread] - : Error while trying to recover. 
>>> core=oi_i:org.apache.solr.common.SolrException: We are not the leader
>>> at 
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402)
>>> at 
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
>>> at 
>>> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199)
>>> at 
>>> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388)
>>> at 
>>> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
>>>
>>> 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
>>> [RecoveryThread] - : Recovery failed - trying again... core=oi_i
>>> 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
>>> [RecoveryThread] - : Recovery failed - max retries exceeded. core=oi_i
>>> 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
>>> [RecoveryThread] - : Recovery failed - I give up. core=oi_i
>>> 2012-09-19 14:14:00,333 WARN [solr.cloud.RecoveryStrategy] - 
>>> [RecoveryThread] - : Stopping recovery for 
>>> zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
>>>  ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : Sync request 
>>> error: java.lang.NullPointerException
>>>  ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : 
>>> http://nl10.host:8080/solr/oi_i/: Could not tell a replica to 
>>> recover:java.lang.NullPointerException
>>> at 
>>> org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305)
>>> at 
>>> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:102)
>>> at 
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.(HttpSolrServer.java:155)
>>> at 
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.(HttpSolrServer.java:128)
>>> at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:262)
>>> at 
>>> org.apache.solr.cloud.SyncStrategy.requestRecovery(SyncStrategy.java:272)
>>> at 
>>> org.apache.solr.cloud.SyncStrategy.syncToMe(SyncStrategy.java:203)
>>> at 
>>> org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:125)
>>> at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:87)
>>> at 
>>> org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:169)
>>> at 
>>> org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158)
>>> at 
>>> org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102)
>>> at 
>>> org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:275)
>>> at 
>>> org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:326)
>>> at

solr failing on staartup

2012-09-19 Thread Harish Rawat
Hi Guys

The solr server which was running fine for last few months, is now failing
during startup wuth following error

Sep 19, 2012 12:53:25 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.RuntimeException: java.io.FileNotFoundException:
/var/lib/solr/default/index/.cfs (No such file or directory)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1103)
at org.apache.solr.core.SolrCore.(SolrCore.java:587)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:463)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:133)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94)
at
org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:662)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1250)
at
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
at
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)

Does any one know what could be possible reasons for failure?

Regards
Harish


Re: Nodes cannot recover and become unavailable

2012-09-19 Thread Yonik Seeley
On Wed, Sep 19, 2012 at 4:25 PM, Mark Miller  wrote:
> bq. I believe there were some changes made to the clusterstate.json
> recently that are not backwards compatible.
>
> Indeed - I think yonik committed something the other day - we prob
> should send an email out about this.

Yeah, I was just in the process of committing another change, updating
CHANGES and sending a message.

-Yonik
http://lucidworks.com


Re: Understanding fieldCache SUBREADER "insanity"

2012-09-19 Thread Tomás Fernández Löbbe
Some function queries also use the field cache. I *think* those usually use
the segment level cache, but I'm not sure.

On Wed, Sep 19, 2012 at 4:36 PM, Yonik Seeley  wrote:

> The other thing to realize is that it's only "insanity" if it's
> unexpected or not-by-design (so the term is rather mis-named).
> It's more for core developers - if you are just using Solr without
> custom plugins, don't worry about it.
>
> -Yonik
> http://lucidworks.com
>
>
> On Wed, Sep 19, 2012 at 3:27 PM, Tomás Fernández Löbbe
>  wrote:
> > Hi Aaron, here there is some information about the "insanity count":
> > http://wiki.apache.org/solr/SolrCaching#The_Lucene_FieldCache
> >
> > As for the SUBREADER type, the javadocs say:
> > "Indicates an overlap in cache usage on a given field in sub/super
> readers."
> >
> > This probably means that you are using the same field for faceting and
> for
> > sorting (tf_normalizedTotalHotttnesss), sorting uses the segment level
> > cache and faceting uses by default the global field cache. This can be a
> > problem because the field is duplicated in cache, and then it uses twice
> > the memory.
> >
> > One way to solve this would be to change the faceting method on that
> field
> > to 'fcs', which uses segment level cache (but may be a little bit
> slower).
> >
> > Tomás
> >
> >
> > On Wed, Sep 19, 2012 at 3:16 PM, Aaron Daubman 
> wrote:
> >
> >> Hi all,
> >>
> >> In reviewing a solr instance with somewhat variable performance, I
> >> noticed that its fieldCache stats show an insanity_count of 1 with the
> >> insanity type SUBREADER:
> >>
> >> ---snip---
> >> insanity_count : 1
> >> insanity#0 : SUBREADER: Found caches for descendants of
> >> ReadOnlyDirectoryReader(segments_k
> >> _6h9(3.3):C17198463)+tf_normalizedTotalHotttnesss
> >> 'ReadOnlyDirectoryReader(segments_k
> >>
> >>
> _6h9(3.3):C17198463)'=>'tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=>[F#1965982057
> >> 'ReadOnlyDirectoryReader(segments_k
> >>
> >>
> _6h9(3.3):C17198463)'=>'tf_normalizedTotalHotttnesss',float,null=>[F#1965982057
> >>
> >>
> 'MMapIndexInput(path="/io01/p/solr/playlist/a/playlist/index/_6h9.frq")'=>'tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=>[F#1308116426
> >> ---snip---
> >>
> >> How can I decipher what this means and what, if anything, I should do
> >> to fix/improve the "insanity"?
> >>
> >> Thanks,
> >>  Aaron
> >>
>


Re: Understanding fieldCache SUBREADER "insanity"

2012-09-19 Thread Yonik Seeley
> already-optimized, single-segment index

That part is interesting... if true, then the type of "insanity" you
saw should be impossible, and either the insanity detection or
something else is broken.

-Yonik
http://lucidworks.com


Re: Minimum Match parameter in phrase queries

2012-09-19 Thread Erick Erickson
The mm parameter, as I understand it, doesn't really play with phrase.
So you're looking for the phrase
"this amazing sample query", "amazing" must be in the phrase.

and phrase slop reorders things, counting the reordering as "slop", so
the approach would not
do what you want anyway, i.e.
"this amazing query"
would match
"this query is amazing"
if ps were >= 2 (or maybe three, I always have to draw the picture again).

I don't even think the mm parameter applies at all to the pf clause,
it's just used to boost
relevance.

You may be able to use the SurroundQueryParser, which is available in 4.0 see:
http://wiki.apache.org/solr/SurroundQueryParser
but note its limitations.

Best
Erick

On Wed, Sep 19, 2012 at 12:52 PM, Jose Aguilar
 wrote:
> Hi all,
>
> We have a problem when using minimum match and a phrase query, and maybe 
> someone here has seen this and could give us a hand.
>
> We need to make a query where several terms need to be in order, but in the 
> query there might be some other terms in between them. So for example the 
> queries:
>
> "this amazing query"
>
> And
>
> "this amazing sample query"
>
> Should match:
>
> "this amazing query"
>
> But not:
>
> "this query is amazing"
>
> The way we approached the problem was to create a phrase query and add the 
> minimum match parameter (50% in this case), our reasoning was that with the 
> "mm" parameter it wouldn't need to match all the terms in the query. But it 
> doesn't seem to work (maybe we are doing something wrong). So something like:
>
> http://localhost:8080/SolrContext/select?q=this+amazing+sample+query&qf=MY_FIELD&pf=MY_FIELD&ps=5&debugQuery=true&defType=edismax&mm=50%
>
> Doesn't return any results. If we change the query to "this amazing query", 
> the exact phrase, it does match.
>
> Is this the right approach for this problem? Any pointers would be helpful.
>
> Thanks,
>
> Jose.
>


Re: Search by field with the space in it

2012-09-19 Thread Aleksey Vorona
Thank you for that insight. I, myself, would've liked to remove the 
spaces, but it is not possible in that particular project.


I see that I need to learn more about Lucene. Hopefully that will help 
me avoid some of those headaches to come.


-- Aleksey

On 12-09-19 11:42 AM, Erick Erickson wrote:

I would _really_ recommend that you re-do your schema and
take spaces out of your field names. That may require that
you change your indexing program to not send spaces in dynamic
field names

This is the kind of thing that causes endless headaches as time
goes forward.

You don't _have_ to, but I predict you'll regret if if you don't .

Best
Erick

On Wed, Sep 19, 2012 at 2:11 PM, Aleksey Vorona  wrote:

On 12-09-19 11:04 AM, Ahmet Arslan wrote:

I have a field with space in its name (that is a dynamic
field). How can I execute search on it?

I tried "q=aattr_box%20%type_sc:super" and it did not work

The field name is "aattr_box type"

How about q=aattr_box\ type_sc:super


That works! Thank you!

Sidenote: of course I urlencode space.

-- Aleksey




Re: Conditionally apply synonyms?

2012-09-19 Thread Erick Erickson
Not that I know of, synonyms are an all-or-nothing on a field.

But how would you indicate the context at index time as opposed to
query time? Especially at query time, there's very little in the way of
context to figure out what the category was.

Or were you thinking that the document had a category and applying
this only at index time?

Best
Erick

On Wed, Sep 19, 2012 at 3:23 PM, Carrie Coy  wrote:
> Is there an existing TokenFilterFactory that can conditionally insert
> synonyms based on a given document attribute, say category? Some synonyms
> only make sense in context: "bats" in Sports is different from "bats" in
> "Party and Novelty".
>
> It seems the synonyms.txt file would need an additional column that could be
> checked against the document attribute prior to appending synonyms:
>
> *#synonymscategory*
> post,polesports
> wheel,caster   furniture
> pat,paddy,patrick   holiday
>
> Is anything like this possible without writing a custom TokenFilterFactory?


Re: Search by field with the space in it

2012-09-19 Thread Erick Erickson
well, I've certainly been wrong before, so it may not be so bad. Time
will tell...

Erick

On Wed, Sep 19, 2012 at 5:08 PM, Aleksey Vorona  wrote:
> Thank you for that insight. I, myself, would've liked to remove the spaces,
> but it is not possible in that particular project.
>
> I see that I need to learn more about Lucene. Hopefully that will help me
> avoid some of those headaches to come.
>
> -- Aleksey
>
>
> On 12-09-19 11:42 AM, Erick Erickson wrote:
>>
>> I would _really_ recommend that you re-do your schema and
>> take spaces out of your field names. That may require that
>> you change your indexing program to not send spaces in dynamic
>> field names
>>
>> This is the kind of thing that causes endless headaches as time
>> goes forward.
>>
>> You don't _have_ to, but I predict you'll regret if if you don't .
>>
>> Best
>> Erick
>>
>> On Wed, Sep 19, 2012 at 2:11 PM, Aleksey Vorona  wrote:
>>>
>>> On 12-09-19 11:04 AM, Ahmet Arslan wrote:
>
> I have a field with space in its name (that is a dynamic
> field). How can I execute search on it?
>
> I tried "q=aattr_box%20%type_sc:super" and it did not work
>
> The field name is "aattr_box type"

 How about q=aattr_box\ type_sc:super

>>> That works! Thank you!
>>>
>>> Sidenote: of course I urlencode space.
>>>
>>> -- Aleksey
>
>


Re: Conditionally apply synonyms?

2012-09-19 Thread Carrie Coy
the latter:  the document (eg product)  has a category, and the synonyms 
would be applied at index time.  sports-related "bat" synonyms to 
baseball "bats", and halloween-themed "bat" synonyms to scary "bats", 
for example.



On 09/19/2012 05:08 PM, Erick Erickson wrote:

Not that I know of, synonyms are an all-or-nothing on a field.

But how would you indicate the context at index time as opposed to
query time? Especially at query time, there's very little in the way of
context to figure out what the category was.

Or were you thinking that the document had a category and applying
this only at index time?

Best
Erick

On Wed, Sep 19, 2012 at 3:23 PM, Carrie Coy  wrote:

Is there an existing TokenFilterFactory that can conditionally insert
synonyms based on a given document attribute, say category? Some synonyms
only make sense in context: "bats" in Sports is different from "bats" in
"Party and Novelty".

It seems the synonyms.txt file would need an additional column that could be
checked against the document attribute prior to appending synonyms:

*#synonymscategory*
post,polesports
wheel,caster   furniture
pat,paddy,patrick   holiday

Is anything like this possible without writing a custom TokenFilterFactory?


SolrCloud clusterstate.json layout changes

2012-09-19 Thread Yonik Seeley
Folks,

Some changes have been committed in the past few days related to
SOLR-3815 as part of the groundwork
for SOLR-3755 (shard splitting).

The resulting clusterstate.json now looks like the following:

{"collection1":{
"shard1":{
  "range":"8000-",
  "replicas":{"Rogue:8983_solr_collection1":{
  "shard":"shard1",
  "roles":null,
  "state":"active",
  "core":"collection1",
  "collection":"collection1",
  "node_name":"Rogue:8983_solr",
  "base_url":"http://Rogue:8983/solr";,
  "leader":"true"}}},
"shard2":{
  "range":"0-7fff",
  "replicas":{


Note the addition of the "replicas" level to make room for other
properties at the shard level such as "range" (which define what hash
range belongs in what shard).
Although "range" now exists, it is ignored by the current code (i.e.
indexing still uses hash MOD nShards to place documents).

-Yonik
http://lucidworks.com


Http 500 Error - Cant find the log file

2012-09-19 Thread Spadez
Hi,

I have been making changes to my schema and unfortauntly I now get a error
when viewing my Solr tomcat admin page.

The error doesnt seem to explain the problem:
HTTP Status 500 - Severe errors in solr configuration. Check your log files
for more detailed information on what may be wrong. If you want solr to
continue after configuration errors, change:
false in solr.xml
-
org.apache.solr.common.SolrException: No cores were created, please check
the logs for errors at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:172)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
at
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:115)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4726)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601) at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675)
at
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502) at
org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317) at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065) at 

It says to look in the log file, but I cant find one. I installed it in the
opt directory, but from what I have read, the log file should be here:

/opt/apache-solr-3.6.0/example/logs

However, the folder is empty. How can I find this log file, so I can
diagnose what is causing the solr error.

Thank you for any help you can give.

James



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Http-500-Error-Cant-find-the-log-file-tp4009030.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Http 500 Error - Cant find the log file

2012-09-19 Thread Chris Hostetter

Logs are dependent on the servlet contianer you use -- ie: with the 
solr example, log messages are written to the console were you run 
"start.jar" and can be configured to point elsewhere based on how you 
configure jetty.

in your case it looks like you are using tomcat, so you'll want to check 
the tomcat logs directory -- probably for "catalina.out" if i remember 
correctly.


: I have been making changes to my schema and unfortauntly I now get a error
: when viewing my Solr tomcat admin page.
: 
: The error doesnt seem to explain the problem:
: HTTP Status 500 - Severe errors in solr configuration. Check your log files
: for more detailed information on what may be wrong. If you want solr to
: continue after configuration errors, change:
: false in solr.xml
: -
: org.apache.solr.common.SolrException: No cores were created, please check
: the logs for errors at
: 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:172)
: at
: org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96)
: at
: 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
: at
: 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
: at
: 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:115)
: at
: 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072)
: at org.apache.catalina.core.StandardContext.start(StandardContext.java:4726)
: at
: 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
: at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
: at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601) at
: org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675)
: at
: org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601)
: at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502) at
: org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317) at
: org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
: at
: 
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
: at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065) at 
: 
: It says to look in the log file, but I cant find one. I installed it in the
: opt directory, but from what I have read, the log file should be here:
: 
: /opt/apache-solr-3.6.0/example/logs
: 
: However, the folder is empty. How can I find this log file, so I can
: diagnose what is causing the solr error.
: 
: Thank you for any help you can give.
: 
: James
: 
: 
: 
: --
: View this message in context: 
http://lucene.472066.n3.nabble.com/Http-500-Error-Cant-find-the-log-file-tp4009030.html
: Sent from the Solr - User mailing list archive at Nabble.com.
: 

-Hoss


Re: Http 500 Error - Cant find the log file

2012-09-19 Thread Chris Hostetter

Logs are dependent on the servlet contianer you use -- ie: with the 
solr example, log messages are written to the console were you run 
"start.jar" and can be configured to point elsewhere based on how you 
configure jetty.

in your case it looks like you are using tomcat, so you'll want to check 
the tomcat logs directory -- probably for "catalina.out" if i remember 
correctly.


: I have been making changes to my schema and unfortauntly I now get a error
: when viewing my Solr tomcat admin page.
: 
: The error doesnt seem to explain the problem:
: HTTP Status 500 - Severe errors in solr configuration. Check your log files
: for more detailed information on what may be wrong. If you want solr to
: continue after configuration errors, change:
: false in solr.xml
: -
: org.apache.solr.common.SolrException: No cores were created, please check
: the logs for errors at
: 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:172)
: at
: org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96)
: at
: 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
: at
: 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
: at
: 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:115)
: at
: 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072)
: at org.apache.catalina.core.StandardContext.start(StandardContext.java:4726)
: at
: 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
: at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
: at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601) at
: org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675)
: at
: org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601)
: at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502) at
: org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317) at
: org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
: at
: 
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
: at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065) at 
: 
: It says to look in the log file, but I cant find one. I installed it in the
: opt directory, but from what I have read, the log file should be here:
: 
: /opt/apache-solr-3.6.0/example/logs
: 
: However, the folder is empty. How can I find this log file, so I can
: diagnose what is causing the solr error.
: 
: Thank you for any help you can give.
: 
: James
: 
: 
: 
: --
: View this message in context: 
http://lucene.472066.n3.nabble.com/Http-500-Error-Cant-find-the-log-file-tp4009030.html
: Sent from the Solr - User mailing list archive at Nabble.com.
: 

-Hoss


Re: Search by field with the space in it

2012-09-19 Thread Jack Krupansky
Also note that some of the Solr request parameters are lists of fields where 
space is the delimiter and can NOT be escaped.


For example, the "fl" parameter uses both comma and space as delimiters, and 
the e/dismax field list parameters use space as the field delimiter.


-- Jack Krupansky

-Original Message- 
From: Aleksey Vorona

Sent: Wednesday, September 19, 2012 5:08 PM
To: solr-user@lucene.apache.org
Subject: Re: Search by field with the space in it

Thank you for that insight. I, myself, would've liked to remove the
spaces, but it is not possible in that particular project.

I see that I need to learn more about Lucene. Hopefully that will help
me avoid some of those headaches to come.

-- Aleksey

On 12-09-19 11:42 AM, Erick Erickson wrote:

I would _really_ recommend that you re-do your schema and
take spaces out of your field names. That may require that
you change your indexing program to not send spaces in dynamic
field names

This is the kind of thing that causes endless headaches as time
goes forward.

You don't _have_ to, but I predict you'll regret if if you don't .

Best
Erick

On Wed, Sep 19, 2012 at 2:11 PM, Aleksey Vorona  wrote:

On 12-09-19 11:04 AM, Ahmet Arslan wrote:

I have a field with space in its name (that is a dynamic
field). How can I execute search on it?

I tried "q=aattr_box%20%type_sc:super" and it did not work

The field name is "aattr_box type"

How about q=aattr_box\ type_sc:super


That works! Thank you!

Sidenote: of course I urlencode space.

-- Aleksey 




Re: SolrCloud clusterstate.json layout changes

2012-09-19 Thread Yonik Seeley
On Wed, Sep 19, 2012 at 5:27 PM, Yonik Seeley  wrote:
> Folks,
>
> Some changes have been committed in the past few days related to
> SOLR-3815 as part of the groundwork
> for SOLR-3755 (shard splitting).
>
> The resulting clusterstate.json now looks like the following:
>
> {"collection1":{
> "shard1":{
>   "range":"8000-",
>   "replicas":{"Rogue:8983_solr_collection1":{
>   "shard":"shard1",
>   "roles":null,
>   "state":"active",
>   "core":"collection1",
>   "collection":"collection1",
>   "node_name":"Rogue:8983_solr",
>   "base_url":"http://Rogue:8983/solr";,
>   "leader":"true"}}},
> "shard2":{
>   "range":"0-7fff",
>   "replicas":{
>
>
> Note the addition of the "replicas" level to make room for other
> properties at the shard level such as "range" (which define what hash
> range belongs in what shard).
> Although "range" now exists, it is ignored by the current code (i.e.
> indexing still uses hash MOD nShards to place documents).

Correction - MOD was just one of the earliest methods, not the
previous method which split the hash range up equally between all
shards, and should still be the same when we switch to paying
attention to the ranges.

-Yonik
http://lucidworks.com


Re: Explicit Optimize and Background Merge

2012-09-19 Thread Mark Miller
You don't have to worry about background merges when optimizing, it
won't error out.

Optimize is a little heavy handed though.

You might just use expunge deletes and/or try a low merge factor -
though with the latest tiered merge policy, I think you have to use a
different knob than merge factor that I don't know offhand...

Mark

On Wed, Sep 19, 2012 at 4:16 PM, roz dev  wrote:
> Hi All
>
> I need input from community about the idea of doing explicit optimization
> of Solr Index and benefit/negatives of doing this.
>
> I know that Background merge also happens and am wondering if background
> merge is going on and then i trigger explicit commit then will it error out?
>
> How do these 2 things work together?
>
> What is the preferred way to keep the index size smaller, in terms of
> number of documents. In my case, I want to keep the max doc to low number
> so that
> memory cost of field cache is not too high.
>
> Any thoughts?
>
> -Saroj



-- 
- Mark


Re: Conditionally apply synonyms?

2012-09-19 Thread Lance Norskog
Here is another way, without using synonyms: in data preparation, you can 
create a new token 'bats_sports' for all common words in different categories. 
You can do this in a separate field that you do not store, just index. Now, if 
you search with a category you would send in 'bats bats_sports' and boost 
results from this other field.

- Original Message -
| From: "Carrie Coy" 
| To: solr-user@lucene.apache.org
| Sent: Wednesday, September 19, 2012 2:20:49 PM
| Subject: Re: Conditionally apply synonyms?
| 
| the latter:  the document (eg product)  has a category, and the
| synonyms
| would be applied at index time.  sports-related "bat" synonyms to
| baseball "bats", and halloween-themed "bat" synonyms to scary "bats",
| for example.
| 
| 
| On 09/19/2012 05:08 PM, Erick Erickson wrote:
| > Not that I know of, synonyms are an all-or-nothing on a field.
| >
| > But how would you indicate the context at index time as opposed to
| > query time? Especially at query time, there's very little in the
| > way of
| > context to figure out what the category was.
| >
| > Or were you thinking that the document had a category and applying
| > this only at index time?
| >
| > Best
| > Erick
| >
| > On Wed, Sep 19, 2012 at 3:23 PM, Carrie Coy  wrote:
| >> Is there an existing TokenFilterFactory that can conditionally
| >> insert
| >> synonyms based on a given document attribute, say category? Some
| >> synonyms
| >> only make sense in context: "bats" in Sports is different from
| >> "bats" in
| >> "Party and Novelty".
| >>
| >> It seems the synonyms.txt file would need an additional column
| >> that could be
| >> checked against the document attribute prior to appending
| >> synonyms:
| >>
| >> *#synonymscategory*
| >> post,polesports
| >> wheel,caster   furniture
| >> pat,paddy,patrick   holiday
| >>
| >> Is anything like this possible without writing a custom
| >> TokenFilterFactory?
| 


Re: Compond File Format Advice needed - On migration to 3.6.1

2012-09-19 Thread Jack Krupansky
You may simply be encountering the situation where the merge size is greater 
than 10% of the index size, as per this comment in the code:


/** If a merged segment will be more than this percentage
*  of the total size of the index, leave the segment as
*  non-compound file even if compound file is enabled.
*  Set to 1.0 to always use CFS regardless of merge
*  size.  Default is 0.1. */
public void setNoCFSRatio(double noCFSRatio) {

Unfortunately there currently is no way for you to set the ratio higher in 
Solr.


LogMergePolicy has the same issue.

There should be some wiki doc for this, but I couldn't find any.

-- Jack Krupansky

-Original Message- 
From: Sujatha Arun

Sent: Tuesday, September 18, 2012 10:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Compond File Format Advice needed - On migration to 3.6.1

anybody?

On Tue, Sep 18, 2012 at 10:42 PM, Sujatha Arun  wrote:


Hi ,

The default Index file creation format in 3.6.1 [migrating from 1.3]
 in-spite of setting the usecompoundfile to true seems to be to create non
compound files due to Lucene 
2790

.

I have tried the following ,but everything seems to create non compound
files ..


   - set  compound file format to true
   - used the TieredMergePolicy, did not change maxMergeAtOnce and
   segmentsPerTier.
   - switched back to LogByteSizeMergePolicy but this also creates non
   compound files

We are in a situation where we have several cores and hence several
indexes ,and do not want to run into too many open files error. What can 
be

done to switch to compound file format from the beginning or will this
TiredMergepolicy lead us to too many open files eventually?

Regards
Sujatha





Re: Solr on https

2012-09-19 Thread Chris Hostetter

: We are trying to run solr on https, these are few of the issues or 
: problems that are coming up. Just wanted to understand if anyone else is 
: facing these problems,

There are currently some known issues using SolrCloud with https, Sami is 
working on it...

https://issues.apache.org/jira/browse/SOLR-3854


-Hoss


Re: Personalized Boosting

2012-09-19 Thread Chris Hostetter

: Is it possible to let user's define their position in search when location
: is queried? Let's say that I am UserA and when you make a search with
: Moscow, my default ranking is 258. By clicking a button, something like
: "Boost Me!", I would like to see UserA as the first user when search is done
: by Moscow query.

please explain your usecase more -- it's not clear what you mean by a user 
saying "Boost Me!" ... do the documents in your index model your users? 
ie: does each doc represent one user? ..

Let's assume for now that's what you ment -- if so, what should happen if 
multiple "users" ask you to boost them for the same location value?  is 
the fact that you are refering to a field named "location" significant? 
should the boost apply to other searches of locations that are 
geographically close by?

all of these kinds of things have a significant impact on the possible 
solutions for your goal, since we don't really understand your goal.

Assuming you don't really want geo-proximity boosting, and you just want 
an easy way to say "document X should be considered a more important 
match for word Z" then i would suggest using multiple fields and use 
query time boosts. ie: put Z in both your location field and in osme 
other boosted_location field and when you get a query, search across both 
of them, with a higher boost on booted_location, something like..

  defType=dismax & q=Z & qf=location+boosted_location^100

-Hoss


Re: Compond File Format Advice needed - On migration to 3.6.1

2012-09-19 Thread Sujatha Arun
Thanks Jack . Yes, this seems so !!

However I would like to fix this at code level by setting the noCFSRatio to
1.0 . But in solr 3.6.1 i am not able to find the build.xml file .
I suppose the build process has been changed since 1.3 ,can you throw some
light on how I can build source code post this change .

In 1.3  , I used to change the code from src file and compile and build
from the same directory as the build.xml file ,however  all files seem to
be jarred now .Any pointers?

Regards
Sujatha

On Thu, Sep 20, 2012 at 5:36 AM, Jack Krupansky wrote:

> You may simply be encountering the situation where the merge size is
> greater than 10% of the index size, as per this comment in the code:
>
> /** If a merged segment will be more than this percentage
> *  of the total size of the index, leave the segment as
> *  non-compound file even if compound file is enabled.
> *  Set to 1.0 to always use CFS regardless of merge
> *  size.  Default is 0.1. */
> public void setNoCFSRatio(double noCFSRatio) {
>
> Unfortunately there currently is no way for you to set the ratio higher in
> Solr.
>
> LogMergePolicy has the same issue.
>
> There should be some wiki doc for this, but I couldn't find any.
>
> -- Jack Krupansky
>
> -Original Message- From: Sujatha Arun
> Sent: Tuesday, September 18, 2012 10:00 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Compond File Format Advice needed - On migration to 3.6.1
>
>
> anybody?
>
> On Tue, Sep 18, 2012 at 10:42 PM, Sujatha Arun 
> wrote:
>
>  Hi ,
>>
>> The default Index file creation format in 3.6.1 [migrating from 1.3]
>>  in-spite of setting the usecompoundfile to true seems to be to create non
>> compound files due to Lucene 2790> org/jira/browse/LUCENE-2790
>> >
>>
>> .
>>
>> I have tried the following ,but everything seems to create non compound
>> files ..
>>
>>
>>- set  compound file format to true
>>- used the TieredMergePolicy, did not change maxMergeAtOnce and
>>segmentsPerTier.
>>- switched back to LogByteSizeMergePolicy but this also creates non
>>
>>compound files
>>
>> We are in a situation where we have several cores and hence several
>> indexes ,and do not want to run into too many open files error. What can
>> be
>> done to switch to compound file format from the beginning or will this
>> TiredMergepolicy lead us to too many open files eventually?
>>
>> Regards
>> Sujatha
>>
>>
>