solr indexing exception

2011-08-26 Thread abhijit bashetti
Hi,

I am using DIH for indexing 50K documents .

I am using 64-bit machine with 4GB RAM

I got the following exception:

org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.OutOfMemoryError: Java heap space

at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:664)

at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:617)

at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267)

at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186)

at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)

at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)

at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)

Caused by: java.lang.OutOfMemoryError: Java heap space

at java.util.Arrays.copyOf(Unknown Source)

at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)

at java.lang.AbstractStringBuilder.append(Unknown Source)

at java.lang.StringBuffer.append(Unknown Source)

at java.io.StringWriter.write(Unknown Source)

at
org.apache.tika.sax.WriteOutContentHandler.characters(WriteOutContentHandler.java:115)

at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

at
org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)

at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

at
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153)

at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

at
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:39)

at
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61)

at
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113)

at
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:151)

at
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:261)

at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:132)

at
org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)

at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)

at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)

at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:128)

at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)

at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591)

... 6 more



26-Aug-2011 08:18:35 org.apache.solr.common.SolrException log

SEVERE: Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.OutOfMemoryError: Java heap space

at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:664)

at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:617)

at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267)

at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186)

at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)

at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)

at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)

Caused by: java.lang.OutOfMemoryError: Java heap space

at java.util.Arrays.copyOf(Unknown Source)

at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)

at java.lang.AbstractStringBuilder.append(Unknown Source)

at java.lang.StringBuffer.append(Unknown Source)

at java.io.StringWriter.write(Unknown Source)

at
org.apache.tika.sax.WriteOutContentHandler.characters(WriteOutContentHandler.java:115)

at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

at
org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)

at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)

at
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153)

at
org.apache.tika.sax.ContentHandlerDecorator.charac

how to differentiate multiple datasources when building solr query....

2011-08-26 Thread vighnesh
hi all

I have a two data sources in data-config file and i need data from first
datasource , second datasource and from both .how can acheive this in solr
query.

example like: first datasource:

http://localhost:8983/solr/db/select/?q=newthread&version=2.2&start=0&rows=200&indent=on&datasource=datasource-1

example like: second datasource:

http://localhost:8983/solr/db/select/?q=newthread&version=2.2&start=0&rows=200&indent=on&datasource=datasource-1

example like: both datasources:

http://localhost:8983/solr/db/select/?q=newthread&version=2.2&start=0&rows=200&indent=on&datasource=datasource-1&datasource=datasource-1

is this querys are correct or not ? plese let me know that.


thanks in advance.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-differentiate-multiple-datasources-when-building-solr-query-tp3286309p3286309.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to differentiate multiple datasources when building solr query....

2011-08-26 Thread Lance Norskog
Did you mean datasource-1 and datasource-2 ?

On Fri, Aug 26, 2011 at 2:41 AM, vighnesh  wrote:

> hi all
>
> I have a two data sources in data-config file and i need data from first
> datasource , second datasource and from both .how can acheive this in solr
> query.
>
> example like: first datasource:
>
>
> http://localhost:8983/solr/db/select/?q=newthread&version=2.2&start=0&rows=200&indent=on&datasource=datasource-1
>
> example like: second datasource:
>
>
> http://localhost:8983/solr/db/select/?q=newthread&version=2.2&start=0&rows=200&indent=on&datasource=datasource-1
>
> example like: both datasources:
>
>
> http://localhost:8983/solr/db/select/?q=newthread&version=2.2&start=0&rows=200&indent=on&datasource=datasource-1&datasource=datasource-1
>
> is this querys are correct or not ? plese let me know that.
>
>
> thanks in advance.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-to-differentiate-multiple-datasources-when-building-solr-query-tp3286309p3286309.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Lance Norskog
goks...@gmail.com


Re: how to differentiate multiple datasources when building solr query....

2011-08-26 Thread vighnesh
yes those are two data-sources name .
how can i get the data from only datasource-1 or from data source-2 or from
both

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-differentiate-multiple-datasources-when-building-solr-query-tp3286309p3286325.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr indexing exception

2011-08-26 Thread Gora Mohanty
On Fri, Aug 26, 2011 at 1:47 PM, abhijit bashetti
 wrote:
> Hi,
>
> I am using DIH for indexing 50K documents .
>
> I am using 64-bit machine with 4GB RAM

How much memory is allocated to Solr? What is the approximate size
of the data being indexed into Solr.

Regards,
Gora


Re: missing field in schema browser on solr admin

2011-08-26 Thread Erik Hatcher
Is the field stored?  Do you see it on documents when you do a q=*:* search?

How is that field defined and populated?  (exact config/code needed here)

Erik

On Aug 25, 2011, at 23:07 , deniz wrote:

> hi all...
> 
> i have added a new field to index... but now when i check solr admin, i see
> some interesting stuff...
> 
> i can see the field in schema and also db config file but there is nothing
> about the field in schema browser... in addition i cant make a search in
> that field... all of the config files seem correct but still no change...
> 
> 
> any ideas or anyone who has ever had a similar problem?
> 
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/missing-field-in-schema-browser-on-solr-admin-tp3285739p3285739.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: SolrServer instances

2011-08-26 Thread François Schiettecatte
Sounds to me that you are looking for HTTP Persistent Connections (connection 
keep-alive as opposed to close), and a singleton object. This would be outside 
SOLR per se.

A few caveats though, I am not sure if tomcat supports keep-alive, and I am not 
sure how SOLR deals with multiple requests coming down the pipe, and you will 
need to deal with concurrency, and I am not sure what you are looking to gain 
from this, opening an http connection is pretty cheap.

François

On Aug 26, 2011, at 2:09 AM, Jonty Rhods wrote:

> do I also required to close the connection from solr server
> (CommonHttpSolrServer).
> 
> regards
> 
> On Fri, Aug 26, 2011 at 9:45 AM, Jonty Rhods  wrote:
> 
>> Deal all please help I am stuck here as I have not much experience..
>> 
>> thanks
>> 
>> On Thu, Aug 25, 2011 at 6:51 PM, Jonty Rhods wrote:
>> 
>>> Hi All,
>>> 
>>> I am using SolrJ (3.1) and Tomcat 6.x. I want to open solr server once (20
>>> concurrence) and reuse this across all the site. Or something like
>>> connection pool like we are using for DB (ie Apache DBCP). There is a way to
>>> use static method which is a way but I want better solution from you people.
>>> 
>>> 
>>> 
>>> I read one threade where Ahmet suggest to use something like that
>>> 
>>> String serverPath = "http://localhost:8983/solr";;
>>> HttpClient client = new HttpClient(new
>>> MultiThreadedHttpConnectionManager());
>>> URL url = new URL(serverPath);
>>> CommonsHttpSolrServer solrServer = new CommonsHttpSolrServer(url, client);
>>> 
>>> But how to use instance of this across all class.
>>> 
>>> Please suggest.
>>> 
>>> regards
>>> Jonty
>>> 
>> 
>> 



Re: Solr Implementations

2011-08-26 Thread Erick Erickson
See below

On Thu, Aug 25, 2011 at 4:22 PM, zarni aung  wrote:
> First, I would like to apologize if this is a repeat question but can't seem
> to get the right answer anywhere.
>
>   - What happens to pending documents when the server dies abruptly?  I
>   understand that when the server shuts down gracefully, it will commit the
>   pending documents and close the IndexWriter.  For the case where the server
>   just crashes,  I am assuming that the pending documents are lost but would
>   it also corrupt the index files?  If so, when the server comes back online
>   what is the state?  I would think that a full re-indexing is in order.
>
>

This is generally not a problem, your pending updates are simply lost. A lot
of work has gone into making sure that the indexes aren't corrupted in this
situation. You can use the checkindex utility if you're worried.

A brief outline here. Solr only writes new segments, it does NOT modify existing
segments. There is a file that lets Solr know what the current valid
segments are.
During indexing (including merging, optimization, etc), only NEW segments are
written and the file that tells Solr what's current is left alone
during the new segment
writes.

The very last thing that's done is the segments file (i.e. the file
that tells Solr what's
current) is updated, and it's very small. I suppose there's a
vanishingly small chance
that that file could be corrupted when begin written, and it may even
be that a temp
file is written first then files renamed (but I don't know that for sure)...

So, the point of this long digression is that if your server gets
killed, upon restart it
should see a consistent picture of the index as of the last completed
commit, any
interim docs will be lost.

>   - What are the dangers of having n-number of ReadOnly Solr instances
>   pointing to the same data directory?  (Shared by a SAN)?  Will there be
>   issues with locking?  This is a scenario with replication.  The Read-Only
>   instances are pointing to the same data directory on a SAN.
>

This is not a problem. You should have only one *writer*
pointing to the index, but readers are OK. Applying the discussion above to
readers, note that the segments available to any reader are never changed. So
having N Solr instances reading from these unchanging files is no problem.

That said, this will be slower than using Solr's replication (which is
preferred) for
two reasons.
1> any networked filesystem will have some inherent speed issues.
2> all these read requests will have to be queued somehow.

But if your performance is acceptable with this setup it'll work.


Best
Erick

> Thank you very much.
>
> Z
>


Re: how to differentiate multiple datasources when building solr query....

2011-08-26 Thread Erik Hatcher
Vighnesh -

What you're looking for is DataImportHandler's TemplateTransformer.  Docs here: 


Basically just enable the TemplateTransformer in each of your DIH configs then 
set a literal field value like this differently for each source:

   

Be sure you have a datasource (string field type) field in your schema.  
Reindex.

Then you can fq=datasource:datasource-1 and facet on it and so on since it is a 
normal field on all documents.

Erik

On Aug 26, 2011, at 05:41 , vighnesh wrote:

> hi all
> 
> I have a two data sources in data-config file and i need data from first
> datasource , second datasource and from both .how can acheive this in solr
> query.
> 
> example like: first datasource:
> 
> http://localhost:8983/solr/db/select/?q=newthread&version=2.2&start=0&rows=200&indent=on&datasource=datasource-1
> 
> example like: second datasource:
> 
> http://localhost:8983/solr/db/select/?q=newthread&version=2.2&start=0&rows=200&indent=on&datasource=datasource-1
> 
> example like: both datasources:
> 
> http://localhost:8983/solr/db/select/?q=newthread&version=2.2&start=0&rows=200&indent=on&datasource=datasource-1&datasource=datasource-1
> 
> is this querys are correct or not ? plese let me know that.
> 
> 
> thanks in advance.
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-to-differentiate-multiple-datasources-when-building-solr-query-tp3286309p3286309.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Paging over mutlivalued field results?

2011-08-26 Thread Erick Erickson
OK, I think I have it.

It's a problem, indeed. And no, there's no way I know of to
make a doc fetch only bring back some range of values in
a multivalued field.

So you're stuck with either getting the whole book back
and peeling out the pages (how do you know which
sentences are on which page anyway?) or breaking
your book up somehow, say on a chapter or page basis
to reduce what's returned. You could possibly group
the results by book if you broke it up..

BTW, you can match within sentences by appropriate
proximity searches combined with an incrementgap.

Best
Erick

On Thu, Aug 25, 2011 at 10:01 PM, Darren Govoni  wrote:
> Hi Erick,
>   Sure thing.
>
> I have a document schema where I put the sentences of that document in a
> multivalued field "sentences".
> I search that field in a query but get back the document results, naturally.
>
> I then need to further find which exact sentences matched the query (for
> each document result)
> and then do my own paging since I am only returning pages of sentences and
> not the whole document.
> (i.e. I don't want to page the document results).
>
> Does this make sense? Or is there a better way Solr can accomodate this?
>
> Much appreciated.
>
> Darren
>
> On 08/25/2011 07:24 PM, Erick Erickson wrote:
>>
>> Hmm, I don't quite understand what you want. An example
>> or two would help.
>>
>> Best
>> Erick
>>
>> On Thu, Aug 25, 2011 at 12:11 PM, Darren Govoni
>>  wrote:
>>>
>>> Hi,
>>>  Is it possible to construct a query in Solr where the paged results are
>>> matching multivalued fields and not documents?
>>>
>>> thanks,
>>> Darren
>>>
>
>


Re: where should i keep the class files to perform scheduling?

2011-08-26 Thread Erick Erickson
Actually, the easiest thing to do would be to make a cron
job on *nix or use task scheduler on windows to fire
off a delta-import request to your solr server on a schedule
you'd like.

The code you reference appears to be for Solr 1.2, which is
way old...

Best
Erick

On Fri, Aug 26, 2011 at 7:45 AM, nagarjuna  wrote:
> hi everybody...
>       i dont know about how to perform DIH scheduling for fullimport in
> solri got little bit information from
> http://stackoverflow.com/questions/3206171/how-can-i-schedule-data-imports-in-solr
> here
> but i dont where should i kepp those three class files
> (ApplicationListener,HTTPPostScheduler,solrDataImportProperties),whether i
> need to develop those 3 classes or solr guys gave those classes ,i dont know
> how to proceed ...
>
>
> can anybody pls help me
>
> Thanx in advance
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/where-should-i-keep-the-class-files-to-perform-scheduling-tp3286562p3286562.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: how to differentiate multiple datasources when building solr query....

2011-08-26 Thread Erick Erickson
Although I'd really recommend using underscore rather than hyphen,
since '-' is a query operator and it'll mess you up *sometime* ...

 Best
Erick

On Fri, Aug 26, 2011 at 8:43 AM, Erik Hatcher  wrote:
> Vighnesh -
>
> What you're looking for is DataImportHandler's TemplateTransformer.  Docs 
> here: 
>
> Basically just enable the TemplateTransformer in each of your DIH configs 
> then set a literal field value like this differently for each source:
>
>   
>
> Be sure you have a datasource (string field type) field in your schema.  
> Reindex.
>
> Then you can fq=datasource:datasource-1 and facet on it and so on since it is 
> a normal field on all documents.
>
>        Erik
>
> On Aug 26, 2011, at 05:41 , vighnesh wrote:
>
>> hi all
>>
>> I have a two data sources in data-config file and i need data from first
>> datasource , second datasource and from both .how can acheive this in solr
>> query.
>>
>> example like: first datasource:
>>
>> http://localhost:8983/solr/db/select/?q=newthread&version=2.2&start=0&rows=200&indent=on&datasource=datasource-1
>>
>> example like: second datasource:
>>
>> http://localhost:8983/solr/db/select/?q=newthread&version=2.2&start=0&rows=200&indent=on&datasource=datasource-1
>>
>> example like: both datasources:
>>
>> http://localhost:8983/solr/db/select/?q=newthread&version=2.2&start=0&rows=200&indent=on&datasource=datasource-1&datasource=datasource-1
>>
>> is this querys are correct or not ? plese let me know that.
>>
>>
>> thanks in advance.
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/how-to-differentiate-multiple-datasources-when-building-solr-query-tp3286309p3286309.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: how to differentiate multiple datasources when building solr query....

2011-08-26 Thread vighnesh
thanx for giving response 

i am unable to configure this please provide any samples code for how to use
template transformer

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-differentiate-multiple-datasources-when-building-solr-query-tp3286309p3286816.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: where should i keep the class files to perform scheduling?

2011-08-26 Thread Igor MILOVANOVIC
Easiest way is to do it via cron job.

2011/8/26 nagarjuna 

> hi everybody...
>   i dont know about how to perform DIH scheduling for fullimport in
> solri got little bit information from
>
> http://stackoverflow.com/questions/3206171/how-can-i-schedule-data-imports-in-solr
> here
> but i dont where should i kepp those three class files
> (ApplicationListener,HTTPPostScheduler,solrDataImportProperties),whether i
> need to develop those 3 classes or solr guys gave those classes ,i dont
> know
> how to proceed ...
>
>
> can anybody pls help me
>
> Thanx in advance
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/where-should-i-keep-the-class-files-to-perform-scheduling-tp3286562p3286562.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Igor Milovanović
http://about.me/igor.milovanovic
http://umotvorine.com/


Re: where should i keep the class files to perform scheduling?

2011-08-26 Thread nagarjuna
Thank u very much for ur reply Erick Erickson 
i am using solr 3.3.0 version
 and i have no idea about the cron job i thought that it would be for unix
but i am using windows
and i would like to integrate my scheduling task with my solr server 

please give me the suggestion

--
View this message in context: 
http://lucene.472066.n3.nabble.com/where-should-i-keep-the-class-files-to-perform-scheduling-tp3286562p3286827.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Paging over mutlivalued field results?

2011-08-26 Thread darren

Many thanks Erick.

I think a good feature to add to Solr to address this is
to allow a query to return either the "document" as a result
or the matching (multivalued) fields of a document as individual results
(subject to paging too).
Because sometimes the field value (only) is the desired result list.

Since Solr already touches the index for the field match and paging,
all the logic is already there to return only the field results (and some
document metadata perhaps).

I will ponder a way to do this with a query handler maybe.

On Fri, 26 Aug 2011 09:09:06 -0400, Erick Erickson
 wrote:
> OK, I think I have it.
> 
> It's a problem, indeed. And no, there's no way I know of to
> make a doc fetch only bring back some range of values in
> a multivalued field.
> 
> So you're stuck with either getting the whole book back
> and peeling out the pages (how do you know which
> sentences are on which page anyway?) or breaking
> your book up somehow, say on a chapter or page basis
> to reduce what's returned. You could possibly group
> the results by book if you broke it up..
> 
> BTW, you can match within sentences by appropriate
> proximity searches combined with an incrementgap.
> 
> Best
> Erick
> 
> On Thu, Aug 25, 2011 at 10:01 PM, Darren Govoni 
> wrote:
>> Hi Erick,
>>   Sure thing.
>>
>> I have a document schema where I put the sentences of that document in
a
>> multivalued field "sentences".
>> I search that field in a query but get back the document results,
>> naturally.
>>
>> I then need to further find which exact sentences matched the query
(for
>> each document result)
>> and then do my own paging since I am only returning pages of sentences
>> and
>> not the whole document.
>> (i.e. I don't want to page the document results).
>>
>> Does this make sense? Or is there a better way Solr can accomodate
this?
>>
>> Much appreciated.
>>
>> Darren
>>
>> On 08/25/2011 07:24 PM, Erick Erickson wrote:
>>>
>>> Hmm, I don't quite understand what you want. An example
>>> or two would help.
>>>
>>> Best
>>> Erick
>>>
>>> On Thu, Aug 25, 2011 at 12:11 PM, Darren Govoni
>>>  wrote:

 Hi,
  Is it possible to construct a query in Solr where the paged results
 are
 matching multivalued fields and not documents?

 thanks,
 Darren

>>
>>


Re: Paging over mutlivalued field results?

2011-08-26 Thread Erik Hatcher
The way folks have addressed this situation to date is to model the 
"multivalued fields" as additional documents too.


On Aug 26, 2011, at 09:32 ,   wrote:

> 
> Many thanks Erick.
> 
> I think a good feature to add to Solr to address this is
> to allow a query to return either the "document" as a result
> or the matching (multivalued) fields of a document as individual results
> (subject to paging too).
> Because sometimes the field value (only) is the desired result list.
> 
> Since Solr already touches the index for the field match and paging,
> all the logic is already there to return only the field results (and some
> document metadata perhaps).
> 
> I will ponder a way to do this with a query handler maybe.
> 
> On Fri, 26 Aug 2011 09:09:06 -0400, Erick Erickson
>  wrote:
>> OK, I think I have it.
>> 
>> It's a problem, indeed. And no, there's no way I know of to
>> make a doc fetch only bring back some range of values in
>> a multivalued field.
>> 
>> So you're stuck with either getting the whole book back
>> and peeling out the pages (how do you know which
>> sentences are on which page anyway?) or breaking
>> your book up somehow, say on a chapter or page basis
>> to reduce what's returned. You could possibly group
>> the results by book if you broke it up..
>> 
>> BTW, you can match within sentences by appropriate
>> proximity searches combined with an incrementgap.
>> 
>> Best
>> Erick
>> 
>> On Thu, Aug 25, 2011 at 10:01 PM, Darren Govoni 
>> wrote:
>>> Hi Erick,
>>>   Sure thing.
>>> 
>>> I have a document schema where I put the sentences of that document in
> a
>>> multivalued field "sentences".
>>> I search that field in a query but get back the document results,
>>> naturally.
>>> 
>>> I then need to further find which exact sentences matched the query
> (for
>>> each document result)
>>> and then do my own paging since I am only returning pages of sentences
>>> and
>>> not the whole document.
>>> (i.e. I don't want to page the document results).
>>> 
>>> Does this make sense? Or is there a better way Solr can accomodate
> this?
>>> 
>>> Much appreciated.
>>> 
>>> Darren
>>> 
>>> On 08/25/2011 07:24 PM, Erick Erickson wrote:
 
 Hmm, I don't quite understand what you want. An example
 or two would help.
 
 Best
 Erick
 
 On Thu, Aug 25, 2011 at 12:11 PM, Darren Govoni
  wrote:
> 
> Hi,
>  Is it possible to construct a query in Solr where the paged results
> are
> matching multivalued fields and not documents?
> 
> thanks,
> Darren
> 
>>> 
>>> 



commas in synonyms.txt are not escaping

2011-08-26 Thread Moore, Gary

I have a number of chemical names containing commas which I'm mapping in 
index_synonyms.txt thusly:

2\,4-D-butotyl=>Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D 
3,CCRIS 8562

According to the sample synonyms.txt, the comma above should be. i.e. 
a\,a=>b\,b.The problem is that according to analysis.jsp the commas are not 
being escaped.  If I paste in 2,4-D-butotyl, then no mappings.  If I paste in 
2\,4-D-butotyl, the mappings are done.  This is verified by there being no 
mappings in the index.  I assume there would be if 2\,4-D-butotyl actually 
appeared in a document.

The filter I'm declaring in the index analyzer looks like this:



Doesn't seem to matter which tokenizer I use.This must be something simple 
that I'm not doing but am a bit stumped at the moment and would appreciate any 
tips.
Thanks
Gary




Solr and client app on same Jetty?

2011-08-26 Thread Arcadius Ahouansou
Hello.

I have Solr running on Jetty and I also have a web client application
running on another jetty instance on the same box.

The question is: wouldn't it be better to run the client and solr on the
very same jetty instance?

I came across http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer as weel.

The only drawback I can think of is, in case we would like to scale and have
1 web app against 2 or 3 solr, a code change will be needed.

- Is there any other drawback in doing so?
- more importantly, any performance or scalability issue?


Thanks.

Arcadius.


Re: commas in synonyms.txt are not escaping

2011-08-26 Thread Alexei Martchenko
Gary, please post the entire field declaration so I can try to reproduce
here

2011/8/26 Moore, Gary 

>
> I have a number of chemical names containing commas which I'm mapping in
> index_synonyms.txt thusly:
>
> 2\,4-D-butotyl=>Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D
> 3,CCRIS 8562
>
> According to the sample synonyms.txt, the comma above should be. i.e.
> a\,a=>b\,b.The problem is that according to analysis.jsp the commas are
> not being escaped.  If I paste in 2,4-D-butotyl, then no mappings.  If I
> paste in 2\,4-D-butotyl, the mappings are done.  This is verified by there
> being no mappings in the index.  I assume there would be if 2\,4-D-butotyl
> actually appeared in a document.
>
> The filter I'm declaring in the index analyzer looks like this:
>
>   tokenizerFactory="solr.KeywordTokenizerFactory" ignoreCase="true"
> expand="true"/>
>
> Doesn't seem to matter which tokenizer I use.This must be something
> simple that I'm not doing but am a bit stumped at the moment and would
> appreciate any tips.
> Thanks
> Gary
>
>
>


-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


RE: commas in synonyms.txt are not escaping

2011-08-26 Thread Moore, Gary
Here you go -- I'm just hacking the text field at the moment.  Thanks,
Gary


  








  
  

   





  


-Original Message-
From: Alexei Martchenko [mailto:ale...@superdownloads.com.br] 
Sent: Friday, August 26, 2011 10:30 AM
To: solr-user@lucene.apache.org
Subject: Re: commas in synonyms.txt are not escaping

Gary, please post the entire field declaration so I can try to reproduce
here




Re: commas in synonyms.txt are not escaping

2011-08-26 Thread Alexei Martchenko
Gary, isn't your wordDelimiter removing your commas in the query time? have
u tried it in the analyzer?

2011/8/26 Moore, Gary 

> Here you go -- I'm just hacking the text field at the moment.  Thanks,
> Gary
>
> 
>  
>
>  synonyms="index_synonyms.txt"
> tokenizerFactory="solr.KeywordTokenizerFactory" ignoreCase="true"
> expand="true"/>
> 
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
>
>  
>  
>
>   
> words="stopwords.txt"/>
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
>
>  
>
>
> -Original Message-
> From: Alexei Martchenko [mailto:ale...@superdownloads.com.br]
> Sent: Friday, August 26, 2011 10:30 AM
> To: solr-user@lucene.apache.org
> Subject: Re: commas in synonyms.txt are not escaping
>
> Gary, please post the entire field declaration so I can try to reproduce
> here
>
>
>


-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Solr and client app on same Jetty?

2011-08-26 Thread Gérard Dupont
Hi,

On 26 August 2011 16:23, Arcadius Ahouansou  wrote:

> Hello.
>
> I have Solr running on Jetty and I also have a web client application
> running on another jetty instance on the same box.
>
> The question is: wouldn't it be better to run the client and solr on the
> very same jetty instance?
>

Don't have clear performance bench on this, but did not notice a lot of
differences during tests with Jetty.


> I came across http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer as
> weel.
>
> The only drawback I can think of is, in case we would like to scale and
> have
> 1 web app against 2 or 3 solr, a code change will be needed.
> - Is there any other drawback in doing so?
>

We used embedded server for a long time and we moved to standalone server
recently since it should allow more flexibility and independance. No much
code changes.


> - more importantly, any performance or scalability issue?
>

Standalone server seems more efficient and eventually you can make it scale
independently of your client. But it really depends on your needs. For the
small applications (1M documents and few dozens users) we made last few
years, embedded server was fine.


>
> Thanks.
>
> Arcadius.
>

cheers,

-- 
Gérard Dupont
Information Processing Control and Cognition (IPCC)
CASSIDIAN - an EADS company

Document & Learning team - LITIS Laboratory


Re: Solr Implementations

2011-08-26 Thread zarni aung
Thank you so much for your response Erik.

On Fri, Aug 26, 2011 at 8:30 AM, Erick Erickson wrote:

> See below
>
> On Thu, Aug 25, 2011 at 4:22 PM, zarni aung  wrote:
> > First, I would like to apologize if this is a repeat question but can't
> seem
> > to get the right answer anywhere.
> >
> >   - What happens to pending documents when the server dies abruptly?  I
> >   understand that when the server shuts down gracefully, it will commit
> the
> >   pending documents and close the IndexWriter.  For the case where the
> server
> >   just crashes,  I am assuming that the pending documents are lost but
> would
> >   it also corrupt the index files?  If so, when the server comes back
> online
> >   what is the state?  I would think that a full re-indexing is in order.
> >
> >
>
> This is generally not a problem, your pending updates are simply lost. A
> lot
> of work has gone into making sure that the indexes aren't corrupted in this
> situation. You can use the checkindex utility if you're worried.
>
> A brief outline here. Solr only writes new segments, it does NOT modify
> existing
> segments. There is a file that lets Solr know what the current valid
> segments are.
> During indexing (including merging, optimization, etc), only NEW segments
> are
> written and the file that tells Solr what's current is left alone
> during the new segment
> writes.
>
> The very last thing that's done is the segments file (i.e. the file
> that tells Solr what's
> current) is updated, and it's very small. I suppose there's a
> vanishingly small chance
> that that file could be corrupted when begin written, and it may even
> be that a temp
> file is written first then files renamed (but I don't know that for
> sure)...
>
> So, the point of this long digression is that if your server gets
> killed, upon restart it
> should see a consistent picture of the index as of the last completed
> commit, any
> interim docs will be lost.
>
> >   - What are the dangers of having n-number of ReadOnly Solr instances
> >   pointing to the same data directory?  (Shared by a SAN)?  Will there be
> >   issues with locking?  This is a scenario with replication.  The
> Read-Only
> >   instances are pointing to the same data directory on a SAN.
> >
>
> This is not a problem. You should have only one *writer*
> pointing to the index, but readers are OK. Applying the discussion above to
> readers, note that the segments available to any reader are never changed.
> So
> having N Solr instances reading from these unchanging files is no problem.
>
> That said, this will be slower than using Solr's replication (which is
> preferred) for
> two reasons.
> 1> any networked filesystem will have some inherent speed issues.
> 2> all these read requests will have to be queued somehow.
>
> But if your performance is acceptable with this setup it'll work.
>
>
> Best
> Erick
>
> > Thank you very much.
> >
> > Z
> >
>


what is scheduling ? why should we do this?how to achieve this ?

2011-08-26 Thread nagarjuna
i dont know what exactly the meaning of scheduling in solr and why should i
do ?and how can achieve this please help me to do this i already saw this
link 
http://wiki.apache.org/solr/DataImportHandler?highlight=%28scheduling%29#Scheduling
except this one please send me any sample codes or links 


Thanks in advance

--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-is-scheduling-why-should-we-do-this-how-to-achieve-this-tp3287115p3287115.html
Sent from the Solr - User mailing list archive at Nabble.com.


core creation and instanceDir parameter

2011-08-26 Thread Gérard Dupont
Hi all,

Playing with multicore and dynamic creation of new core, I found out that
there is one mandatory parameter "instanceDir" which is mandaotry to find
out the location of solrconfig.xml and schema.xml. Since all my cores share
the same configuration (found realtively to the $SOLR_HOME defined on server
side) and that all data is saved in the same folder (one sub-folder per
core), I was wandering why do we still need to send this parameter? In my
configuration, I would like to avoid that the client, which ask for core
creation, need to be aware of instance location on the server.

BTW I'm on solr 3.3.0

Thanks for any advice.

-- 
Gérard Dupont
Information Processing Control and Cognition (IPCC)
CASSIDIAN - an EADS company

Document & Learning team - LITIS Laboratory


New IndexSearcher and autowarming

2011-08-26 Thread Mike Austin
I would like to have the ability to keep requests from being slowed from new
document adds and commits by having a separate index that gets updated.
Basically a read-only and an updatable index. After the update index has
finished updating with new adds and commits, I'd like to switch the update
to the "live" read-only.  At the same time, it would be nice to have the old
read-only index become "updated" with the now live read-only index before I
start this update process again.

1. Index1 is live and read-only and doesn't get slowed by updates
2. Index2 is updated with Index1 and gets new adds and commits
3. Index2 gets cache warming
4. Index2 becomes the live index read-only index
5. Index1 gets synced with Index2 so that when these steps start again, the
updating is happening on an updated index.

I know that this is possible but can't find a simple tutorial on how to do
this.  By the way, I'm using SolrNet in a windows environment.

Thanks,
Mike


Re: commas in synonyms.txt are not escaping

2011-08-26 Thread Yonik Seeley
On Fri, Aug 26, 2011 at 10:17 AM, Moore, Gary  wrote:
>
> I have a number of chemical names containing commas which I'm mapping in 
> index_synonyms.txt thusly:
>
> 2\,4-D-butotyl=>Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D 
> 3,CCRIS 8562
>
> According to the sample synonyms.txt, the comma above should be. i.e. 
> a\,a=>b\,b.    The problem is that according to analysis.jsp the commas are 
> not being escaped.  If I paste in 2,4-D-butotyl, then no mappings.  If I 
> paste in 2\,4-D-butotyl, the mappings are done.


I can confirm that this works in 1.4, but no longer works in 3x or
trunk.  Can you open an issue?

-Yonik
http://www.lucidimagination.com


Re: commas in synonyms.txt are not escaping

2011-08-26 Thread Yonik Seeley
On Fri, Aug 26, 2011 at 11:16 AM, Yonik Seeley
 wrote:
> On Fri, Aug 26, 2011 at 10:17 AM, Moore, Gary  wrote:
>>
>> I have a number of chemical names containing commas which I'm mapping in 
>> index_synonyms.txt thusly:
>>
>> 2\,4-D-butotyl=>Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D 
>> 3,CCRIS 8562
>>
>> According to the sample synonyms.txt, the comma above should be. i.e. 
>> a\,a=>b\,b.    The problem is that according to analysis.jsp the commas are 
>> not being escaped.  If I paste in 2,4-D-butotyl, then no mappings.  If I 
>> paste in 2\,4-D-butotyl, the mappings are done.
>
>
> I can confirm that this works in 1.4, but no longer works in 3x or
> trunk.  Can you open an issue?

Actually, I think I've tracked it to LUCENE-3233 where the parsing
rules were moved from Solr to Lucene (and changed the functionality in
the process).
I'll reopen t hat since I don't think it's been in a released version yet.

-Yonik
http://www.lucidimagination.com


RE: commas in synonyms.txt are not escaping

2011-08-26 Thread Moore, Gary
Thanks, Yonik.
Gary

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Friday, August 26, 2011 11:25 AM
To: solr-user@lucene.apache.org
Subject: Re: commas in synonyms.txt are not escaping

On Fri, Aug 26, 2011 at 11:16 AM, Yonik Seeley
 wrote:
> On Fri, Aug 26, 2011 at 10:17 AM, Moore, Gary  wrote:
>>
>> I have a number of chemical names containing commas which I'm mapping in 
>> index_synonyms.txt thusly:
>>
>> 2\,4-D-butotyl=>Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D 
>> 3,CCRIS 8562
>>
>> According to the sample synonyms.txt, the comma above should be. i.e. 
>> a\,a=>b\,b.    The problem is that according to analysis.jsp the commas are 
>> not being escaped.  If I paste in 2,4-D-butotyl, then no mappings.  If I 
>> paste in 2\,4-D-butotyl, the mappings are done.
>
>
> I can confirm that this works in 1.4, but no longer works in 3x or
> trunk.  Can you open an issue?

Actually, I think I've tracked it to LUCENE-3233 where the parsing
rules were moved from Solr to Lucene (and changed the functionality in
the process).
I'll reopen t hat since I don't think it's been in a released version yet.

-Yonik
http://www.lucidimagination.com


RE: commas in synonyms.txt are not escaping

2011-08-26 Thread Moore, Gary
Alexi,
Yes but no difference.  This is apparently an issue introduced in 3.*.  Thanks 
for your help.
-Gary

-Original Message-
From: Alexei Martchenko [mailto:ale...@superdownloads.com.br] 
Sent: Friday, August 26, 2011 10:45 AM
To: solr-user@lucene.apache.org
Subject: Re: commas in synonyms.txt are not escaping

Gary, isn't your wordDelimiter removing your commas in the query time? have
u tried it in the analyzer?

2011/8/26 Moore, Gary 

> Here you go -- I'm just hacking the text field at the moment.  Thanks,
> Gary
>
> 
>  
>
>  synonyms="index_synonyms.txt"
> tokenizerFactory="solr.KeywordTokenizerFactory" ignoreCase="true"
> expand="true"/>
> 
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
>
>  
>  
>
>   
> words="stopwords.txt"/>
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
>
>  
>
>
> -Original Message-
> From: Alexei Martchenko [mailto:ale...@superdownloads.com.br]
> Sent: Friday, August 26, 2011 10:30 AM
> To: solr-user@lucene.apache.org
> Subject: Re: commas in synonyms.txt are not escaping
>
> Gary, please post the entire field declaration so I can try to reproduce
> here
>
>
>


-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


DIH importing

2011-08-26 Thread Mark
We are currently delta-importing using DIH after which all of our 
servers have to download the full index (16G). This obviously puts quite 
a strain on our slaves while they are syncing over the index. Is there 
anyway not to sync over the whole index, but rather just the parts that 
have changed?


We would like to get to the point where are no longer using DIH but 
rather we are constantly sending documents over HTTP to our master in 
realtime. We would then like our slaves to download these changes as 
soon as possible. Is something like this even possible?


Thanks for you help


Re: Paging over mutlivalued field results?

2011-08-26 Thread darren

Yeah, I've resigned that this is the most practical workaround.
But it also means a 100-1 explosion in my index size. For every book
document,
there will now be a 100 (for example) sentence documents from it.

What's the best way to submit a feature request for Solr?

Many thanks.

On Fri, 26 Aug 2011 09:36:51 -0400, Erik Hatcher 
wrote:
> The way folks have addressed this situation to date is to model the
> "multivalued fields" as additional documents too.
> 
> 
> On Aug 26, 2011, at 09:32 ,  
> wrote:
> 
>> 
>> Many thanks Erick.
>> 
>> I think a good feature to add to Solr to address this is
>> to allow a query to return either the "document" as a result
>> or the matching (multivalued) fields of a document as individual
results
>> (subject to paging too).
>> Because sometimes the field value (only) is the desired result list.
>> 
>> Since Solr already touches the index for the field match and paging,
>> all the logic is already there to return only the field results (and
some
>> document metadata perhaps).
>> 
>> I will ponder a way to do this with a query handler maybe.
>> 
>> On Fri, 26 Aug 2011 09:09:06 -0400, Erick Erickson
>>  wrote:
>>> OK, I think I have it.
>>> 
>>> It's a problem, indeed. And no, there's no way I know of to
>>> make a doc fetch only bring back some range of values in
>>> a multivalued field.
>>> 
>>> So you're stuck with either getting the whole book back
>>> and peeling out the pages (how do you know which
>>> sentences are on which page anyway?) or breaking
>>> your book up somehow, say on a chapter or page basis
>>> to reduce what's returned. You could possibly group
>>> the results by book if you broke it up..
>>> 
>>> BTW, you can match within sentences by appropriate
>>> proximity searches combined with an incrementgap.
>>> 
>>> Best
>>> Erick
>>> 
>>> On Thu, Aug 25, 2011 at 10:01 PM, Darren Govoni 
>>> wrote:
 Hi Erick,
   Sure thing.
 
 I have a document schema where I put the sentences of that document
in
>> a
 multivalued field "sentences".
 I search that field in a query but get back the document results,
 naturally.
 
 I then need to further find which exact sentences matched the query
>> (for
 each document result)
 and then do my own paging since I am only returning pages of
sentences
 and
 not the whole document.
 (i.e. I don't want to page the document results).
 
 Does this make sense? Or is there a better way Solr can accomodate
>> this?
 
 Much appreciated.
 
 Darren
 
 On 08/25/2011 07:24 PM, Erick Erickson wrote:
> 
> Hmm, I don't quite understand what you want. An example
> or two would help.
> 
> Best
> Erick
> 
> On Thu, Aug 25, 2011 at 12:11 PM, Darren Govoni
>  wrote:
>> 
>> Hi,
>>  Is it possible to construct a query in Solr where the paged
results
>> are
>> matching multivalued fields and not documents?
>> 
>> thanks,
>> Darren
>> 
 



Re: DIH importing

2011-08-26 Thread simon
It sounds as though you are optimizing the index after the delta import. If
you don't do that, then only new segments will be replicated and syncing
will be much faster.


On Fri, Aug 26, 2011 at 12:08 PM, Mark  wrote:

> We are currently delta-importing using DIH after which all of our servers
> have to download the full index (16G). This obviously puts quite a strain on
> our slaves while they are syncing over the index. Is there anyway not to
> sync over the whole index, but rather just the parts that have changed?
>
> We would like to get to the point where are no longer using DIH but rather
> we are constantly sending documents over HTTP to our master in realtime. We
> would then like our slaves to download these changes as soon as possible. Is
> something like this even possible?
>
> Thanks for you help
>


Re: New IndexSearcher and autowarming

2011-08-26 Thread simon
The multicore API (see http://wiki.apache.org/solr/CoreAdmin ) allows you to
swap, unload, reload cores. That should allow you to do what you want,

-Simon

On Fri, Aug 26, 2011 at 11:13 AM, Mike Austin wrote:

> I would like to have the ability to keep requests from being slowed from
> new
> document adds and commits by having a separate index that gets updated.
> Basically a read-only and an updatable index. After the update index has
> finished updating with new adds and commits, I'd like to switch the update
> to the "live" read-only.  At the same time, it would be nice to have the
> old
> read-only index become "updated" with the now live read-only index before I
> start this update process again.
>
> 1. Index1 is live and read-only and doesn't get slowed by updates
> 2. Index2 is updated with Index1 and gets new adds and commits
> 3. Index2 gets cache warming
> 4. Index2 becomes the live index read-only index
> 5. Index1 gets synced with Index2 so that when these steps start again, the
> updating is happening on an updated index.
>
> I know that this is possible but can't find a simple tutorial on how to do
> this.  By the way, I'm using SolrNet in a windows environment.
>
> Thanks,
> Mike
>


Re: New IndexSearcher and autowarming

2011-08-26 Thread Erick Erickson
Why doesn't standard replication with auto-warming work for you?
You can control how often replication gets triggered by controlling
your commit points and/or your replication interval. This seems easier
than maintaining cores like your problem statement indicates.

Best
Erick

On Fri, Aug 26, 2011 at 12:56 PM, simon  wrote:
> The multicore API (see http://wiki.apache.org/solr/CoreAdmin ) allows you to
> swap, unload, reload cores. That should allow you to do what you want,
>
> -Simon
>
> On Fri, Aug 26, 2011 at 11:13 AM, Mike Austin wrote:
>
>> I would like to have the ability to keep requests from being slowed from
>> new
>> document adds and commits by having a separate index that gets updated.
>> Basically a read-only and an updatable index. After the update index has
>> finished updating with new adds and commits, I'd like to switch the update
>> to the "live" read-only.  At the same time, it would be nice to have the
>> old
>> read-only index become "updated" with the now live read-only index before I
>> start this update process again.
>>
>> 1. Index1 is live and read-only and doesn't get slowed by updates
>> 2. Index2 is updated with Index1 and gets new adds and commits
>> 3. Index2 gets cache warming
>> 4. Index2 becomes the live index read-only index
>> 5. Index1 gets synced with Index2 so that when these steps start again, the
>> updating is happening on an updated index.
>>
>> I know that this is possible but can't find a simple tutorial on how to do
>> this.  By the way, I'm using SolrNet in a windows environment.
>>
>> Thanks,
>> Mike
>>
>


Re: New IndexSearcher and autowarming

2011-08-26 Thread Mike Austin
Hi Erick,

It might work.  I've only worked with solr having one index on one server
over a year ago so I might need to just research more about the replication.
I am using windows and I remember that replication on windows had some
issues with scripts and hard links, however it looks like we have some new
good replication features with solr1.4.

For now, I wanted to do this on just one windows server since this is my
requirement.  After your suggestion, I took a little more time to review:
http://wiki.apache.org/solr/SolrReplication.  So based on what I want to do,
would the "Replication with MultiCore " section be what I need to do?  But
this wouldn't be a master/slave setup would it since basically I want to
swap between two.  I guess I could set up 3 indexes on the same server if
that's possible to use master/slave in that way, but that might take some
more space than I anticipated.

Thanks,
Mike
On Fri, Aug 26, 2011 at 12:08 PM, Erick Erickson wrote:

> Why doesn't standard replication with auto-warming work for you?
> You can control how often replication gets triggered by controlling
> your commit points and/or your replication interval. This seems easier
> than maintaining cores like your problem statement indicates.
>
> Best
> Erick
>
> On Fri, Aug 26, 2011 at 12:56 PM, simon  wrote:
> > The multicore API (see http://wiki.apache.org/solr/CoreAdmin ) allows
> you to
> > swap, unload, reload cores. That should allow you to do what you want,
> >
> > -Simon
> >
> > On Fri, Aug 26, 2011 at 11:13 AM, Mike Austin  >wrote:
> >
> >> I would like to have the ability to keep requests from being slowed from
> >> new
> >> document adds and commits by having a separate index that gets updated.
> >> Basically a read-only and an updatable index. After the update index has
> >> finished updating with new adds and commits, I'd like to switch the
> update
> >> to the "live" read-only.  At the same time, it would be nice to have the
> >> old
> >> read-only index become "updated" with the now live read-only index
> before I
> >> start this update process again.
> >>
> >> 1. Index1 is live and read-only and doesn't get slowed by updates
> >> 2. Index2 is updated with Index1 and gets new adds and commits
> >> 3. Index2 gets cache warming
> >> 4. Index2 becomes the live index read-only index
> >> 5. Index1 gets synced with Index2 so that when these steps start again,
> the
> >> updating is happening on an updated index.
> >>
> >> I know that this is possible but can't find a simple tutorial on how to
> do
> >> this.  By the way, I'm using SolrNet in a windows environment.
> >>
> >> Thanks,
> >> Mike
> >>
> >
>


syntax for functions used in the fq parameter

2011-08-26 Thread Jason Toy
I'm trying to limit my data to only docs that have the word 'foo' appear at
least once.
I am trying to use:
fq=termfreqdata,'foo'):[1+TO+*]

but I get the syntax error:
Caused by: org.apache.lucene.queryparser.classic.ParseException: Encountered
" ":" ": "" at line 1, column 33.
Was expecting one of:

 ...
 ...
 ...
"+" ...
"-" ...
 ...
"(" ...
"*" ...
"^" ...
 ...
 ...
 ...
 ...
 ...
"[" ...
"{" ...
 ...

at
org.apache.lucene.queryparser.classic.QueryParser.generateParseException(QueryParser.java:708)
at
org.apache.lucene.queryparser.classic.QueryParser.jj_consume_token(QueryParser.java:590)
at
org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:171)
at
org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:119)
... 25 more


What is the proper way to do this?


Re: syntax for functions used in the fq parameter

2011-08-26 Thread Erick Erickson
Why doesn't " AND text:foo" fill this requirement?

Best
Erick

On Fri, Aug 26, 2011 at 2:27 PM, Jason Toy  wrote:
> I'm trying to limit my data to only docs that have the word 'foo' appear at
> least once.
> I am trying to use:
> fq=termfreqdata,'foo'):[1+TO+*]
>
> but I get the syntax error:
> Caused by: org.apache.lucene.queryparser.classic.ParseException: Encountered
> " ":" ": "" at line 1, column 33.
> Was expecting one of:
>    
>     ...
>     ...
>     ...
>    "+" ...
>    "-" ...
>     ...
>    "(" ...
>    "*" ...
>    "^" ...
>     ...
>     ...
>     ...
>     ...
>     ...
>    "[" ...
>    "{" ...
>     ...
>
> at
> org.apache.lucene.queryparser.classic.QueryParser.generateParseException(QueryParser.java:708)
> at
> org.apache.lucene.queryparser.classic.QueryParser.jj_consume_token(QueryParser.java:590)
> at
> org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:171)
> at
> org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:119)
> ... 25 more
>
>
> What is the proper way to do this?
>


Re: where should i keep the class files to perform scheduling?

2011-08-26 Thread simon
The built-in DIH scheduling was never implemented as far as I know - the
Wiki section is just a design proposal and explicitly says "Hasn't been
committed to SVN (published only here) "

On Windows, you can use the Task Scheduler to do the kinds of things that
cron does on Unix/Linux.

-Simon

On Fri, Aug 26, 2011 at 9:21 AM, nagarjuna wrote:

> Thank u very much for ur reply Erick Erickson
>i am using solr 3.3.0 version
>  and i have no idea about the cron job i thought that it would be for unix
> but i am using windows
> and i would like to integrate my scheduling task with my solr server
>
> please give me the suggestion
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/where-should-i-keep-the-class-files-to-perform-scheduling-tp3286562p3286827.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Solr Geodist

2011-08-26 Thread solrnovice
Hi, i am trying to return distance in the solr query, by passing in the
"fl=geodist()" and i dont see distance being returned. 
We have a field called coordinates which is configured as latlong and when i
perform the following search, i do see results
q=*:*&fq={!geofilt}&sfield=coordinates&pt=31.2225,-85.3931&fl=id,name   =>
this works fine

q=*:*&fq={!geofilt}&sfield=coordinates&pt=31.2225,-85.3931&fl=id,name,geodist() 
 
=> this doesnt return distance


But when i add geodist() to the "fl parameter", i dont see the pseudo column
being returned. It returns the "id" and "name" only, but not the distance.
WE are using Solr 4.0. Can anybody please let me know what is wrong with the
query?  I also tried passing in the lat , long as arguments to the geodist,
but doesnt help.

I am totally lost in the solr system, can anybody please help?

thanks
SolrNovice



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Geodist-tp3287005p3287005.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Geodist

2011-08-26 Thread Erick Erickson
When I try this form the stock solr example (using "store" rather than
"coordinate" for the field, your first example gives me an error of
"d must be > 0" or some such. When I add a "d" value to the query,
both your first and second queries work just fine and the second
returns a "geodist" value in the response.

You say you're using Solr 4.0. When did you get it? Have you tried
this with a very recent trunk?

Best
Erick

On Fri, Aug 26, 2011 at 10:29 AM, solrnovice  wrote:
> Hi, i am trying to return distance in the solr query, by passing in the
> "fl=geodist()" and i dont see distance being returned.
> We have a field called coordinates which is configured as latlong and when i
> perform the following search, i do see results
> q=*:*&fq={!geofilt}&sfield=coordinates&pt=31.2225,-85.3931&fl=id,name   =>
> this works fine
>
> q=*:*&fq={!geofilt}&sfield=coordinates&pt=31.2225,-85.3931&fl=id,name,geodist()
> => this doesnt return distance
>
>
> But when i add geodist() to the "fl parameter", i dont see the pseudo column
> being returned. It returns the "id" and "name" only, but not the distance.
> WE are using Solr 4.0. Can anybody please let me know what is wrong with the
> query?  I also tried passing in the lat , long as arguments to the geodist,
> but doesnt help.
>
> I am totally lost in the solr system, can anybody please help?
>
> thanks
> SolrNovice
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Geodist-tp3287005p3287005.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr Geodist

2011-08-26 Thread solrnovice
Eric, thanks for the quick response. I left out the "d" value, yes, when you
perform a spatial query, we should have a distance of d>0, sorry about that. 

What is the setting of your "store" value, i mean in the schema, was it
marked at LatLong. For some reason i dont see the geodist() being returned
in the result set. my coordinates is setup as  "type=location", below is the
snapshot from my schema.xml.




We are using LucidImagination, so i guess it comes with Solr 4.0, please let
me know if i am wrong. That may be the reason for geodist() not being
returned.I checked the solr version by going to solr admin and checked the
version. it shows 4.0. 

For now i found a work around, this works for me. the distance is returned
in the form of "score". 

http://127.0.0.1:/solr/apex_dev/select/?q=*:*+_val_:%22geodist%28%29%22&rows=100&fq={!geofilt}&sfield=coordinates&pt=31.2225,-85.3931&d=50&sort=geodist%28%29%20asc&fl=*,score

I read in a different post that , earlier versions of solr ( prior to 4.0),
we have to use the score option. 

thanks for taking time to try the query.


SN


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Geodist-tp3287005p3287806.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Highlight on alternateField

2011-08-26 Thread Val Minyaylo

Thanks a lot Koji.

On 8/25/2011 5:04 PM, Koji Sekiguchi wrote:

(11/08/26 2:32), Val Minyaylo wrote:

Hi there,
I am trying to utilize highlighting alternateField and can't get 
highlights on the results from
targeted fields. Is this expected behavior or am I understanding 
alternateFields wrong?


Yes, it is expected behavior.


solrconfig.xml:
name="f.description.hl.alternateField">description_highlighting

100


With &hl=on&hl.fl=description parameters, you can get the first 100 
chars of
the (raw) stored description_highlighting field value if highlighter 
cannot generate

snippets on description field for some reason.

koji


Re: where should i keep the class files to perform scheduling?

2011-08-26 Thread Gora Mohanty
On Fri, Aug 26, 2011 at 6:51 PM, nagarjuna  wrote:
> Thank u very much for ur reply Erick Erickson
>    i am using solr 3.3.0 version
>  and i have no idea about the cron job i thought that it would be for unix
> but i am using windows
> and i would like to integrate my scheduling task with my solr server
>
> please give me the suggestion
[...]

No offence, but are you actually reading the replies to your queries on
the list?

To quote part of Erick's reply that you are referencing:
"...or use task scheduler on windows..."

Regards,
Gora


synonyms vs replacements

2011-08-26 Thread Robert Petersen
Hello all,

 

Which is better?   Say you add an index time synonym between nunchuck
and nunchuk and then both words will be in the document and both will be
searchable.   I can get the same exact behavior by putting an index time
replacement of nunchuck => nunchuk and a search time replacement of the
same.  

 

I figured the replacement strategy keeps the the index size slightly
smaller by only having the one term in the index, but the synonym
strategy only requires you update the master, not the slave farm, and
requires slightly less work for the searchers during a user query.  Are
there any other considerations I should be aware of?  

 

Thanks

 

BTW nunchuk is the correct spelling.  J

 

 



Viewing the complete document from within the index

2011-08-26 Thread karthik
Hi Everyone,

I am trying to see whats the best way to view the entire document as its
indexed within solr/lucene. I have tried to use Luke but it's still showing
me the fields that i have configured to be returned back [ie., stored=true]
unless I am not enabling some option in the tool.

Is there a way to see whats actually stored in the index itself? I am trying
to peek into the index to see if my index-time synonym expansions are
working properly or not. The field for which I have enabled index-time
synonym expansion is just used for searching so i have set stored=false.

Thanks


Shingle and Query Performance

2011-08-26 Thread Lord Khan Han
Hi,

We are indexing news  document from the various sites. Currently we have
200K docs indexed. Total index size is 36 gig.  There is also attachement to
the news (pdf -docs etc) So document size could be high (ie 10mb).

We are using some complex queries which includes around 30 - 40 terms per
query. %70 of this terms is two word phrases. We are using
with conjunction +  and -  to pinpoint exact result.
There is also grouping, dismax and boosting , Termvector HL  .

Our problem is query times. Currently its around 6-7 secs. I know our query
is little bit heavy but we want to improve query performance. I believe we
can make it sub second but no succes at the moment.

We tried to use shingle 2 word token it decreases the query performcen !! We
assumed it will help the speed up phrases search..  What could be
your suggestions ? What we are missing.

(using solr latest trunk and HW is pretty good, 32 core  with 32 gig ram)

Here the field def:


  






  
  







  


and

 


auto suggestion with text_en field

2011-08-26 Thread Paul
Sorry if this has been asked before, but I couldn't seem to find it...

I've got a fairly simple index, and I'm searching on a field of type
text_en, and the results are good: I search for "computer" and I get
back hits for "computer", "computation", "computational", "computing".

I also want to create an auto suggestion drop down, so I did a query
using the field as a facet, and I get back a good, but literal, set of
suggestions. For instance, one of the suggestions is "comput", which
does actually match what I want it to, but it is ugly, since it isn't
actually a word.

As I'm thinking about it, I'm not sure what word I would like it to
return in this situation, so I'm asking how others have handled this
situation. Is it illogical to have auto complete on a text_en field?
Do I have to pick one or the other?

Thanks,


Re: Shingle and Query Performance

2011-08-26 Thread Erik Hatcher

On Aug 26, 2011, at 17:49 , Lord Khan Han wrote:
> We are indexing news  document from the various sites. Currently we have
> 200K docs indexed. Total index size is 36 gig.  There is also attachement to
> the news (pdf -docs etc) So document size could be high (ie 10mb).
> 
> We are using some complex queries which includes around 30 - 40 terms per
> query. %70 of this terms is two word phrases. We are using
> with conjunction +  and -  to pinpoint exact result.
> There is also grouping, dismax and boosting , Termvector HL  .

You're using a lot of componentry there, and have complex queries.  We need 
more details.

Turn on debugQuery=true... what do the timings say for each component?  

> Our problem is query times. Currently its around 6-7 secs. I know our query
> is little bit heavy but we want to improve query performance. I believe we
> can make it sub second but no succes at the moment.

Please provide an example query or two (perhaps a full line logged from Solr 
itself), and then let's see what debugQuery says about your query being parsed.

> We tried to use shingle 2 word token it decreases the query performcen !! We
> assumed it will help the speed up phrases search..

Again, we'd need to see a parsed query to understand this deeper.  

Lots of synonym expansion?  A parsed query will tell us.



> (using solr latest trunk and HW is pretty good, 32 core  with 32 gig ram)
> 
> Here the field def:
> 
>  autoGeneratePhraseQueries="true">
>  
>
> words="stopwords.txt" enablePositionIncrements="true" />
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
> outputUnigrams="true"/>
>  
>  
>
> ignoreCase="true" expand="true"/>
> words="stopwords.txt" enablePositionIncrements="true" />
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
> outputUnigrams="true"/>
>  
>
> 
> and
> 
>  termVectors="true" termPositions="true" termOffsets="true"/>



getting data from only one database

2011-08-26 Thread mss.mss
hi ,

we created a solr which is connected to two databases and we created a
jquery auto complete.in two databases we r having keywords and it is
default search.so beside the search button we r ctearing more more
drop down list and nmaed the two databases when the user click one one
database and enter a search keyword we have to get the search results
for the keyword from the selected database not fron two
datbases..

how can we acheive this...

reply fast...

thanks in advance :) :)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/getting-data-from-only-one-database-tp3286551p3286551.html
Sent from the Solr - User mailing list archive at Nabble.com.