Re: Exception importing multi-valued UUID field

2012-02-20 Thread Erick Erickson
I don't think escaping is your problem, you probably want to take
that bit out.

Try adding
f..split=true

when importing. You might also have to specify something like
f..separator=,
but probably not, I suspect it's the default.

See the "split" heading at: http://wiki.apache.org/solr/UpdateCSV

Although I have to ask about your use case for curiosity, is this some
kind of 1-n mapping to other docs?

Best
Erick

On Mon, Feb 20, 2012 at 7:43 PM, Greg Pelly  wrote:
> I also tried it with the comma escaped, so:
>
> '845b9db2-2a25-44e3-8eb4-3bf17cd16738\,c5477d5d-e77c-45e9-ab61-f7ca05499b37'
>
> So that's in the same format as it was exported, Excel must have removed
> the slash. But I still get the error with the slash.
>
> On Tue, Feb 21, 2012 at 11:26 AM, Greg Pelly  wrote:
>
>> Hi,
>>
>> I exported a csv file from SOLR and made some changes, I then tried to
>> reimport the file and got the exception below. It seems UUID field type
>> can't import multi-values, I removed all of the multi-values and it
>> imported without an issue.
>>
>> Cheers
>>
>>
>> org.apache.solr.common.SolrException: Error while creating field
>> 'jobuid{type=uuid,properties=indexed,stored,omitTermFreqAndPositions,multiValued}'
>> from value
>> '845b9db2-2a25-44e3-8eb4-3bf17cd16738,c5477d5d-e77c-45e9-ab61-f7ca05499b37'
>>     at org.apache.solr.schema.FieldType.createField(FieldType.java:239)
>>     at org.apache.solr.schema.SchemaField.createField(SchemaField.java:104)
>>     at
>> org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203)
>>     at
>> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:276)
>>     at
>> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
>>     at
>> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
>>     at org.apache.solr.handler.CSVLoader.doAdd(CSVRequestHandler.java:416)
>>     at
>> org.apache.solr.handler.SingleThreadedCSVLoader.addDoc(CSVRequestHandler.java:431)
>>     at org.apache.solr.handler.CSVLoader.load(CSVRequestHandler.java:393)
>>     at
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67)
>>     at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>>     at
>> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
>>     at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
>>     at
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
>>     at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
>>     at
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>     at
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>     at
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>     at
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>>     at
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>>     at
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>     at
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>>     at
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:300)
>>     at
>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
>>     at
>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
>>     at
>> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>>     at java.lang.Thread.run(Thread.java:679)
>> Caused by: org.apache.solr.common.SolrException: Invalid UUID String:
>> '845b9db2-2a25-44e3-8eb4-3bf17cd16738,c5477d5d-e77c-45e9-ab61-f7ca05499b37'
>>     at org.apache.solr.schema.UUIDField.toInternal(UUIDField.java:85)
>>     at org.apache.solr.schema.FieldType.createField(FieldType.java:237)
>>
>>
>>


Re: Exception importing multi-valued UUID field

2012-02-20 Thread Yonik Seeley
On Mon, Feb 20, 2012 at 7:26 PM, Greg Pelly  wrote:
> I exported a csv file from SOLR and made some changes, I then tried to
> reimport the file and got the exception below. It seems UUID field type
> can't import multi-values, I removed all of the multi-values and it
> imported without an issue.

Did you try split=true?

http://wiki.apache.org/solr/UpdateCSV#split

-Yonik
lucidimagination.com


Re: Is Sphinx better suited to me, or should I look at Solr?

2012-02-20 Thread Damien Camilleri
I gave up on sphinx and went to solr. I feel it is more mature. For example, 
sphinx didn't have an auto start init script and they tried to hit me up for 
consultancy fees cos I asked a simple question.

I use php and use solarium php client. Nice oop interface.

Solr has a great community. My initial struggles were with getting it running, 
mostly because I don't know much about tomcat and it didn't just work for me as 
documented, but once i stumbled through it was ok.

My search results accross 200k documents is instant on a small 512mb 
rackspacecloud instance so you will have no probs at all using solr for your 
needs.

Sent from my iPhone

On 21/02/2012, at 3:32 AM, Em  wrote:

> Hi James,
> 
> I can not speak for Sphinx, since I never used it.
> However, from reading your requirements there is nothing that fears Solr.
> 
> Although Sphinx is written in C++, running Solr on top of a HotSpot JVM
> gives you high performance. Furthermore the HotSpot JVM is optimizing
> your code at runtime which sometimes allows long-running applications to
> run as fast as software written in C++ (and sometimes even faster).
> 
> Given that Solr is pretty fast and scalable (90k docs are a really small
> index), you should have a closer look at the features each search-server
> provides to you and how they suit your needs.
> 
> You should always keep in mind that users will gladly wait a few
> milliseconds longer for their highly-relevant search-results, but do not
> care about a blazing fast 5ms response-time for a collection of
> trash-results.
> So try to find out what your concrete needs in terms of relevancy are
> and which search-server provides you the tools to go.
> I am pretty sure that both projects provide you php-client-libraries
> etc. for indexing and searching (Solr does).
> 
> Kind regards,
> Em
> 
> Am 20.02.2012 16:20, schrieb Spadez:
>> I am creating what is effectively a search engine. Content is collected via
>> spiders at
>> then is inserted into my database and becomes searchable and filterable.
>> 
>> I invision there being around 90K records to be searched at any one time.
>> The content is
>> blog posts and forum posts so we are basically looking at full text with
>> some additional
>> filters based on location, category and date posted.
>> 
>> What is really important to me is speed and relevancy. The index size or
>> index time
>> really isn’t too big of an issue. From the benchmarks I have seen it looks
>> like Sphinx
>> is much faster at querying data and showing results, but that Solr has
>> improved relevancy.
>> 
>> My website is coded entirely in PHP and I am planning on using a MYSQL
>> database. Can
>> anyone please give me a bit of input and help me decide which product might
>> be better
>> suited to me.
>> 
>> Regards,
>> 
>> James
>> 
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Is-Sphinx-better-suited-to-me-or-should-I-look-at-Solr-tp3760988p3760988.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 


Re: Exception importing multi-valued UUID field

2012-02-20 Thread Greg Pelly
I also tried it with the comma escaped, so:

'845b9db2-2a25-44e3-8eb4-3bf17cd16738\,c5477d5d-e77c-45e9-ab61-f7ca05499b37'

So that's in the same format as it was exported, Excel must have removed
the slash. But I still get the error with the slash.

On Tue, Feb 21, 2012 at 11:26 AM, Greg Pelly  wrote:

> Hi,
>
> I exported a csv file from SOLR and made some changes, I then tried to
> reimport the file and got the exception below. It seems UUID field type
> can't import multi-values, I removed all of the multi-values and it
> imported without an issue.
>
> Cheers
>
>
> org.apache.solr.common.SolrException: Error while creating field
> 'jobuid{type=uuid,properties=indexed,stored,omitTermFreqAndPositions,multiValued}'
> from value
> '845b9db2-2a25-44e3-8eb4-3bf17cd16738,c5477d5d-e77c-45e9-ab61-f7ca05499b37'
> at org.apache.solr.schema.FieldType.createField(FieldType.java:239)
> at org.apache.solr.schema.SchemaField.createField(SchemaField.java:104)
> at
> org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203)
> at
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:276)
> at
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
> at
> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
> at org.apache.solr.handler.CSVLoader.doAdd(CSVRequestHandler.java:416)
> at
> org.apache.solr.handler.SingleThreadedCSVLoader.addDoc(CSVRequestHandler.java:431)
> at org.apache.solr.handler.CSVLoader.load(CSVRequestHandler.java:393)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:300)
> at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
> at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
> at
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
> at java.lang.Thread.run(Thread.java:679)
> Caused by: org.apache.solr.common.SolrException: Invalid UUID String:
> '845b9db2-2a25-44e3-8eb4-3bf17cd16738,c5477d5d-e77c-45e9-ab61-f7ca05499b37'
> at org.apache.solr.schema.UUIDField.toInternal(UUIDField.java:85)
> at org.apache.solr.schema.FieldType.createField(FieldType.java:237)
>
>
>


Exception importing multi-valued UUID field

2012-02-20 Thread Greg Pelly
Hi,

I exported a csv file from SOLR and made some changes, I then tried to
reimport the file and got the exception below. It seems UUID field type
can't import multi-values, I removed all of the multi-values and it
imported without an issue.

Cheers


org.apache.solr.common.SolrException: Error while creating field
'jobuid{type=uuid,properties=indexed,stored,omitTermFreqAndPositions,multiValued}'
from value
'845b9db2-2a25-44e3-8eb4-3bf17cd16738,c5477d5d-e77c-45e9-ab61-f7ca05499b37'
at org.apache.solr.schema.FieldType.createField(FieldType.java:239)
at org.apache.solr.schema.SchemaField.createField(SchemaField.java:104)
at
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203)
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:276)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
at org.apache.solr.handler.CSVLoader.doAdd(CSVRequestHandler.java:416)
at
org.apache.solr.handler.SingleThreadedCSVLoader.addDoc(CSVRequestHandler.java:431)
at org.apache.solr.handler.CSVLoader.load(CSVRequestHandler.java:393)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:300)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:679)
Caused by: org.apache.solr.common.SolrException: Invalid UUID String:
'845b9db2-2a25-44e3-8eb4-3bf17cd16738,c5477d5d-e77c-45e9-ab61-f7ca05499b37'
at org.apache.solr.schema.UUIDField.toInternal(UUIDField.java:85)
at org.apache.solr.schema.FieldType.createField(FieldType.java:237)


Re: lucene operators interfearing in edismax

2012-02-20 Thread Yonik Seeley
This should be fixed in trunk by LUCENE-2566

QueryParser: Unary operators +,-,! will not be treated as operators if
they are followed by whitespace.

-Yonik
lucidimagination.com



On Mon, Feb 20, 2012 at 2:09 PM, jmlucjav  wrote:
> Hi,
>
> I am using edismax with end user entered strings. One search was not finding
> what appeared to be the best match. The search was:
>
> Sage Creek Organics - Enchanted
>
> If I remove the -, the doc I want is found as best score. Turns out (I
> think) the - is the culprit as the best match has 'enchanted' and this makes
> it 'NOT enchanted'
>
> Is my analisys correct? I tried looking at the debug output but saw not NOT
> entries there...
>
> If so, is there a standard way (any filter) to remove lucene operators from
> user entered queries? I thought this must be something usual.
>
> thanks
> javi
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/lucene-operators-interfearing-in-edismax-tp3761577p3761577.html
> Sent from the Solr - User mailing list archive at Nabble.com.


lucene operators interfearing in edismax

2012-02-20 Thread jmlucjav
Hi,

I am using edismax with end user entered strings. One search was not finding
what appeared to be the best match. The search was:

Sage Creek Organics - Enchanted

If I remove the -, the doc I want is found as best score. Turns out (I
think) the - is the culprit as the best match has 'enchanted' and this makes
it 'NOT enchanted'

Is my analisys correct? I tried looking at the debug output but saw not NOT
entries there...

If so, is there a standard way (any filter) to remove lucene operators from
user entered queries? I thought this must be something usual.

thanks
javi

--
View this message in context: 
http://lucene.472066.n3.nabble.com/lucene-operators-interfearing-in-edismax-tp3761577p3761577.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to index a facetfield by searching words matching from another Textfield

2012-02-20 Thread Em
Hi Xavier,

sounds like a job for KeepWordFilter!

>From the javadocs:
"A TokenFilter that only keeps tokens with text contained in the
required words. This filter behaves like the inverse of StopFilter."

However, you have to provide the wordslist as a .txt-file.

By using copyFields and the KeepWordFilter you are able to achieve what
you want.

Kind regards,
Em

Am 20.02.2012 17:28, schrieb Xavier:
> Hi everyone,
> 
> I'm a new Solr User but i used to work on Endeca.
> 
> There is a modul called "TextTagger" with Endeca that is auto indexing
> values in a facetfield (multivalued) when he find words (from a given
> wordslist) into an other TextField from that document.
> 
> I didn't see any subjects or any ways to do it with Solr ???
> 
> Thanks for advance ;)
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-tp3761201p3761201.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: Is Sphinx better suited to me, or should I look at Solr?

2012-02-20 Thread Em
Hi James,

I can not speak for Sphinx, since I never used it.
However, from reading your requirements there is nothing that fears Solr.

Although Sphinx is written in C++, running Solr on top of a HotSpot JVM
gives you high performance. Furthermore the HotSpot JVM is optimizing
your code at runtime which sometimes allows long-running applications to
run as fast as software written in C++ (and sometimes even faster).

Given that Solr is pretty fast and scalable (90k docs are a really small
index), you should have a closer look at the features each search-server
provides to you and how they suit your needs.

You should always keep in mind that users will gladly wait a few
milliseconds longer for their highly-relevant search-results, but do not
care about a blazing fast 5ms response-time for a collection of
trash-results.
So try to find out what your concrete needs in terms of relevancy are
and which search-server provides you the tools to go.
I am pretty sure that both projects provide you php-client-libraries
etc. for indexing and searching (Solr does).

Kind regards,
Em

Am 20.02.2012 16:20, schrieb Spadez:
> I am creating what is effectively a search engine. Content is collected via
> spiders at
> then is inserted into my database and becomes searchable and filterable.
> 
> I invision there being around 90K records to be searched at any one time.
> The content is
> blog posts and forum posts so we are basically looking at full text with
> some additional
> filters based on location, category and date posted.
> 
> What is really important to me is speed and relevancy. The index size or
> index time
> really isn’t too big of an issue. From the benchmarks I have seen it looks
> like Sphinx
> is much faster at querying data and showing results, but that Solr has
> improved relevancy.
> 
> My website is coded entirely in PHP and I am planning on using a MYSQL
> database. Can
> anyone please give me a bit of input and help me decide which product might
> be better
> suited to me.
> 
> Regards,
> 
> James
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Is-Sphinx-better-suited-to-me-or-should-I-look-at-Solr-tp3760988p3760988.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


How to index a facetfield by searching words matching from another Textfield

2012-02-20 Thread Xavier
Hi everyone,

I'm a new Solr User but i used to work on Endeca.

There is a modul called "TextTagger" with Endeca that is auto indexing
values in a facetfield (multivalued) when he find words (from a given
wordslist) into an other TextField from that document.

I didn't see any subjects or any ways to do it with Solr ???

Thanks for advance ;)


--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-tp3761201p3761201.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: custom scoring

2012-02-20 Thread Em
Hi Carlos,

> "query_score" is a field that is indexed and stored
> with every document.
Thanks for clarifying that, now the whole query-string makes more sense
to me.

Did you check whether query() - without product() and pow() - is also
much slower than a normal query?

I guess, if the performance-decrease without product() and pow() is not
that large, you are hitting the small overhead that comes with every
function query.
It would be nice, if you could check that.

However, let's take a step back and look what you really want to achieve
instead of how you are trying to achieve it right now.

You want to influence the score of your actual query by a value that
represents a combination of some static values and the likelyness of how
good a query matches a document.

>From your query, I can see that you are using the same fields in your
FunctionQuery and within your MainQuery (let's call the q-param
"MainQuery").
This means that the scores of your query()-method and your MainQuery
should be identical.
Let's call this value just "score" and rename your field "query_score"
"popularity".

I don't know how you are implementing the FunctionQuery (boost by
multiplication, boost by addition), but it seems clear to me that your
formula looks this way:

score x (score^0.5*popularity) where x is kind of an operator (+,*,...)

Why don't you reduce it to

score * boost(log(popularity)).

This is a trade-off between precision and performance.

You could even improve the above by setting the doc's boost equal to
log(populary) at indexing time.

What do you think about that?

Regards,
Em



Am 20.02.2012 15:37, schrieb Carlos Gonzalez-Cadenas:
> Hi Em:
> 
> The HTTP request is not gonna help you a lot because we use a custom
> QParser (that builds the query that I've pasted before). In any case, here
> it is:
> 
> http://localhost:8080/solr/core0/select?shards=…(shards
> here)…&indent=on&wt=exon&timeAllowed=50&fl=resulting_phrase%2Cquery_id%2Ctype%2Chighlighting&start=0&rows=16&limit=20&q=%7B!exonautocomplete%7Dhoteles
> 
> We're implementing a query autocomplete system, therefore our Lucene
> documents are queries. "query_score" is a field that is indexed and stored
> with every document. It expresses how popular a given query is (i.e. common
> queries like "hotels in barcelona" have a bigger query_score than less
> common queries like "hotels in barcelona near the beach").
> 
> Let me know if you need something else.
> 
> Thanks,
> Carlos
> 
> 
> 
> 
> 
> Carlos Gonzalez-Cadenas
> CEO, ExperienceOn - New generation search
> http://www.experienceon.com
> 
> Mobile: +34 652 911 201
> Skype: carlosgonzalezcadenas
> LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
> 
> 
> On Mon, Feb 20, 2012 at 3:12 PM, Em  wrote:
> 
>> Could you please provide me the original request (the HTTP-request)?
>> I am a little bit confused to what "query_score" refers.
>> As far as I can see it isn't a magic-value.
>>
>> Kind regards,
>> Em
>>
>> Am 20.02.2012 14:05, schrieb Carlos Gonzalez-Cadenas:
>>> Yeah Em, it helped a lot :)
>>>
>>> Here it is (for the user query "hoteles"):
>>>
>>> *+(stopword_shortened_phrase:hoteles | stopword_phrase:hoteles |
>>> wildcard_stopword_shortened_phrase:hoteles |
>>> wildcard_stopword_phrase:hoteles) *
>>>
>>> *product(pow(query((stopword_shortened_phrase:hoteles |
>>> stopword_phrase:hoteles | wildcard_stopword_shortened_phrase:hoteles |
>>>
>> wildcard_stopword_phrase:hoteles),def=0.0),const(0.5)),float(query_score))*
>>>
>>> Thanks a lot for your help.
>>>
>>> Carlos
>>> Carlos Gonzalez-Cadenas
>>> CEO, ExperienceOn - New generation search
>>> http://www.experienceon.com
>>>
>>> Mobile: +34 652 911 201
>>> Skype: carlosgonzalezcadenas
>>> LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
>>>
>>>
>>> On Mon, Feb 20, 2012 at 1:50 PM, Em 
>> wrote:
>>>
 Carlos,

 nice to hear that the approach helped you!

 Could you show us how your query-request looks like after reworking?

 Regards,
 Em

 Am 20.02.2012 13:30, schrieb Carlos Gonzalez-Cadenas:
> Hello all:
>
> We've done some tests with Em's approach of putting a BooleanQuery in
 front
> of our user query, that means:
>
> BooleanQuery
> must (DismaxQuery)
> should (FunctionQuery)
>
> The FunctionQuery obtains the SOLR IR score by mea

Is Sphinx better suited to me, or should I look at Solr?

2012-02-20 Thread Spadez
I am creating what is effectively a search engine. Content is collected via
spiders at
then is inserted into my database and becomes searchable and filterable.

I invision there being around 90K records to be searched at any one time.
The content is
blog posts and forum posts so we are basically looking at full text with
some additional
filters based on location, category and date posted.

What is really important to me is speed and relevancy. The index size or
index time
really isn’t too big of an issue. From the benchmarks I have seen it looks
like Sphinx
is much faster at querying data and showing results, but that Solr has
improved relevancy.

My website is coded entirely in PHP and I am planning on using a MYSQL
database. Can
anyone please give me a bit of input and help me decide which product might
be better
suited to me.

Regards,

James

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-Sphinx-better-suited-to-me-or-should-I-look-at-Solr-tp3760988p3760988.html
Sent from the Solr - User mailing list archive at Nabble.com.


postCommit confusion?

2012-02-20 Thread Esad Mumdzic
in a solr master slave replication, if I register postCommit listener on a 
slave, which index reader should I get if I do:

@Override
public final void postCommit() {
final RefCounted refC = core
.getNewestSearcher(true);
try{
final Map userData = 
refC.get().getIndexReader().getIndexCommit().getUserData();
// do something with userData
} catch (IOException e) {
log.error("PostCommit: ", e);
} finally {
  refC.decref();
}

}


What I observed is that I get "stale" userData, is this correct? Didn't 
"commit" replace IndexReader to the actual commit point? (I observe userData 
that were there before replication finished, but I expected to see userData  
version from  the master at this stage)

If I force core.openNewSearcher(false, false);  I get correct, replicated 
userData I just received from master…

What I am doing wrong? Contract of core.getNewestSearcher(true) return in 
postCommit(), or better "when solr updates commit point"? 

Not so import an for the particular problem, but interesting to know these life 
cycles.


Thanks, eks



Re: custom scoring

2012-02-20 Thread Carlos Gonzalez-Cadenas
Hi Em:

The HTTP request is not gonna help you a lot because we use a custom
QParser (that builds the query that I've pasted before). In any case, here
it is:

http://localhost:8080/solr/core0/select?shards=…(shards
here)…&indent=on&wt=exon&timeAllowed=50&fl=resulting_phrase%2Cquery_id%2Ctype%2Chighlighting&start=0&rows=16&limit=20&q=%7B!exonautocomplete%7Dhoteles

We're implementing a query autocomplete system, therefore our Lucene
documents are queries. "query_score" is a field that is indexed and stored
with every document. It expresses how popular a given query is (i.e. common
queries like "hotels in barcelona" have a bigger query_score than less
common queries like "hotels in barcelona near the beach").

Let me know if you need something else.

Thanks,
Carlos





Carlos Gonzalez-Cadenas
CEO, ExperienceOn - New generation search
http://www.experienceon.com

Mobile: +34 652 911 201
Skype: carlosgonzalezcadenas
LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas


On Mon, Feb 20, 2012 at 3:12 PM, Em  wrote:

> Could you please provide me the original request (the HTTP-request)?
> I am a little bit confused to what "query_score" refers.
> As far as I can see it isn't a magic-value.
>
> Kind regards,
> Em
>
> Am 20.02.2012 14:05, schrieb Carlos Gonzalez-Cadenas:
> > Yeah Em, it helped a lot :)
> >
> > Here it is (for the user query "hoteles"):
> >
> > *+(stopword_shortened_phrase:hoteles | stopword_phrase:hoteles |
> > wildcard_stopword_shortened_phrase:hoteles |
> > wildcard_stopword_phrase:hoteles) *
> >
> > *product(pow(query((stopword_shortened_phrase:hoteles |
> > stopword_phrase:hoteles | wildcard_stopword_shortened_phrase:hoteles |
> >
> wildcard_stopword_phrase:hoteles),def=0.0),const(0.5)),float(query_score))*
> >
> > Thanks a lot for your help.
> >
> > Carlos
> > Carlos Gonzalez-Cadenas
> > CEO, ExperienceOn - New generation search
> > http://www.experienceon.com
> >
> > Mobile: +34 652 911 201
> > Skype: carlosgonzalezcadenas
> > LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
> >
> >
> > On Mon, Feb 20, 2012 at 1:50 PM, Em 
> wrote:
> >
> >> Carlos,
> >>
> >> nice to hear that the approach helped you!
> >>
> >> Could you show us how your query-request looks like after reworking?
> >>
> >> Regards,
> >> Em
> >>
> >> Am 20.02.2012 13:30, schrieb Carlos Gonzalez-Cadenas:
> >>> Hello all:
> >>>
> >>> We've done some tests with Em's approach of putting a BooleanQuery in
> >> front
> >>> of our user query, that means:
> >>>
> >>> BooleanQuery
> >>> must (DismaxQuery)
> >>> should (FunctionQuery)
> >>>
> >>> The FunctionQuery obtains the SOLR IR score by means of a
> >> QueryValueSource,
> >>> then does the SQRT of this value, and then multiplies it by our custom
> >>> "query_score" float, pulling it by means of a FieldCacheSource.
> >>>
> >>> In particular, we've proceeded in the following way:
> >>>
> >>>- we've loaded the whole index in the page cache of the OS to make
> >> sure
> >>>we don't have disk IO problems that might affect the benchmarks (our
> >>>machine has enough memory to load all the index in RAM)
> >>>- we've executed an out-of-benchmark query 10-20 times to make sure
> >> that
> >>>everything is jitted and that Lucene's FieldCache is properly
> >> populated.
> >>>- we've disabled all the caches (filter query cache, document cache,
> >>>query cache)
> >>>- we've executed 8 different user queries with and without
> >>>FunctionQueries, with early termination in both cases (our collector
> >> stops
> >>>after collecting 50 documents per shard)
> >>>
> >>> Em was correct, the query is much faster with the BooleanQuery in
> front,
> >>> but it's still 30-40% slower than the query without FunctionQueries.
> >>>
> >>> Although one may think that it's reasonable that the query response
> time
> >>> increases because of the extra computations, we believe that the
> increase
> >>> is too big, given that we're collecting just 500-600 documents due to
> the
> >>> early query termination techniques we currently use.
> >>>
> >>> Any ideas on how to make it faster?.
> >>>
> >>> Thanks a lot,
> >>> Carlos
> >>>
> >>> Carlos Gonzalez-Cadenas
> >>> CEO, ExperienceOn - New generation search
> >>> http://www.experienceon.com
> >>>
> >>> Mobile: +34 652 911 201
> >>> Skype: carlosgonzalezcadenas
> >>> LinkedIn: http://www.lin

Re: custom scoring

2012-02-20 Thread Em
Could you please provide me the original request (the HTTP-request)?
I am a little bit confused to what "query_score" refers.
As far as I can see it isn't a magic-value.

Kind regards,
Em

Am 20.02.2012 14:05, schrieb Carlos Gonzalez-Cadenas:
> Yeah Em, it helped a lot :)
> 
> Here it is (for the user query "hoteles"):
> 
> *+(stopword_shortened_phrase:hoteles | stopword_phrase:hoteles |
> wildcard_stopword_shortened_phrase:hoteles |
> wildcard_stopword_phrase:hoteles) *
> 
> *product(pow(query((stopword_shortened_phrase:hoteles |
> stopword_phrase:hoteles | wildcard_stopword_shortened_phrase:hoteles |
> wildcard_stopword_phrase:hoteles),def=0.0),const(0.5)),float(query_score))*
> 
> Thanks a lot for your help.
> 
> Carlos
> Carlos Gonzalez-Cadenas
> CEO, ExperienceOn - New generation search
> http://www.experienceon.com
> 
> Mobile: +34 652 911 201
> Skype: carlosgonzalezcadenas
> LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
> 
> 
> On Mon, Feb 20, 2012 at 1:50 PM, Em  wrote:
> 
>> Carlos,
>>
>> nice to hear that the approach helped you!
>>
>> Could you show us how your query-request looks like after reworking?
>>
>> Regards,
>> Em
>>
>> Am 20.02.2012 13:30, schrieb Carlos Gonzalez-Cadenas:
>>> Hello all:
>>>
>>> We've done some tests with Em's approach of putting a BooleanQuery in
>> front
>>> of our user query, that means:
>>>
>>> BooleanQuery
>>> must (DismaxQuery)
>>> should (FunctionQuery)
>>>
>>> The FunctionQuery obtains the SOLR IR score by means of a
>> QueryValueSource,
>>> then does the SQRT of this value, and then multiplies it by our custom
>>> "query_score" float, pulling it by means of a FieldCacheSource.
>>>
>>> In particular, we've proceeded in the following way:
>>>
>>>- we've loaded the whole index in the page cache of the OS to make
>> sure
>>>we don't have disk IO problems that might affect the benchmarks (our
>>>machine has enough memory to load all the index in RAM)
>>>- we've executed an out-of-benchmark query 10-20 times to make sure
>> that
>>>everything is jitted and that Lucene's FieldCache is properly
>> populated.
>>>- we've disabled all the caches (filter query cache, document cache,
>>>query cache)
>>>- we've executed 8 different user queries with and without
>>>FunctionQueries, with early termination in both cases (our collector
>> stops
>>>after collecting 50 documents per shard)
>>>
>>> Em was correct, the query is much faster with the BooleanQuery in front,
>>> but it's still 30-40% slower than the query without FunctionQueries.
>>>
>>> Although one may think that it's reasonable that the query response time
>>> increases because of the extra computations, we believe that the increase
>>> is too big, given that we're collecting just 500-600 documents due to the
>>> early query termination techniques we currently use.
>>>
>>> Any ideas on how to make it faster?.
>>>
>>> Thanks a lot,
>>> Carlos
>>>
>>> Carlos Gonzalez-Cadenas
>>> CEO, ExperienceOn - New generation search
>>> http://www.experienceon.com
>>>
>>> Mobile: +34 652 911 201
>>> Skype: carlosgonzalezcadenas
>>> LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
>>>
>>>
>>> On Fri, Feb 17, 2012 at 11:07 AM, Carlos Gonzalez-Cadenas <
>>> c...@experienceon.com> wrote:
>>>
 Thanks Em, Robert, Chris for your time and valuable advice. We'll make
 some tests and will let you know soon.



 On Thu, Feb 16, 2012 at 11:43 PM, Em 
>> wrote:

> Hello Carlos,
>
> I think we missunderstood eachother.
>
> As an example:
> BooleanQuery (
>  clauses: (
> MustMatch(
>   DisjunctionMaxQuery(
>   TermQuery("stopword_field", "barcelona"),
>   TermQuery("stopword_field", "hoteles")
>   )
> ),
> ShouldMatch(
>  FunctionQuery(
>*please insert your function here*
> )
> )
>  )
> )
>
> Explanation:
> You construct an artificial BooleanQuery which wraps your user's query
> as well as your function query.
> Your user's query - in that case - is just a DisjunctionMaxQuery
> consisting of two TermQueries.
> In the real world you might construct another BooleanQuery around your
> DisjunctionMaxQuery in order to have more flexibility.
> However the interesting part of the given example is, that we specify
> the user's query as a MustMatch-condition of the BooleanQuery and the
> FunctionQuery just as a ShouldMatch.
> Constructed that way, I am expecting the FunctionQuery only scores
>> those
> documents which fit the MustMatch-Condition.
>
> I conclude that from the fact that the FunctionQuery-class also has a
> skipTo-method and I would expect that the scorer will use it to score
> only matching documents (however I did not search where and how it
>> might
> get called).
>

Re: DataImportHandler running out of memory

2012-02-20 Thread v_shan
DIH still running out of memory for me, with Full Import on a database of
size 1.5 GB.

Solr version: 3_5_0

Note that I have already added batchSize="-1" but getting same error. 
Sharing my DIH config below.











Please find the error trace below
===
2012-02-20 19:04:40.531:INFO::Started SocketConnector@0.0.0.0:8983
Feb 20, 2012 7:04:57 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select params={command=status&qt=/dih_ib_jdbc}
status=0 QTime=0
Feb 20, 2012 7:04:58 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select
params={command=show-config&qt=/dih_ib_jdbc} status=0 QTime=0
Feb 20, 2012 7:05:30 PM org.apache.solr.handler.dataimport.DataImporter
doFullImport
INFO: Starting Full Import
Feb 20, 2012 7:05:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dih_ib_jdbc params={command=full-import}
status=0 QTime=0
Feb 20, 2012 7:05:30 PM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
INFO: Read dih_ib_jdbc.properties
Feb 20, 2012 7:05:30 PM org.apache.solr.update.DirectUpdateHandler2
deleteAll
INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
Feb 20, 2012 7:05:30 PM org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1
   
commit{dir=E:\workspace\solr_3_5_0\example\solr\data\index,segFN=segments_1,version=1329744880204,generation=1,filenames=[segments_1]
Feb 20, 2012 7:05:30 PM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 1329744880204
Feb 20, 2012 7:05:30 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Creating a connection for entity issue with URL:
jdbc:mysql://localhost:3306/issueburner
Feb 20, 2012 7:05:30 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Time taken for getConnection(): 172
Feb 20, 2012 7:07:45 PM org.apache.solr.common.SolrException log
SEVERE: Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.OutOfMemoryError: Java heap space
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
Caused by: java.lang.OutOfMemoryError: Java heap space
at
org.apache.lucene.util.UnicodeUtil.UTF16toUTF8(UnicodeUtil.java:377)
at
org.apache.lucene.store.DataOutput.writeString(DataOutput.java:103)
at
org.apache.lucene.index.FieldsWriter.writeField(FieldsWriter.java:200)
at
org.apache.lucene.index.StoredFieldsWriterPerThread.addField(StoredFieldsWriterPerThread.java:58)
at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:265)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:766)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2327)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2299)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:240)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
at
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:73)
at
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:293)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:636)
... 5 more

Feb 20, 2012 7:07:45 PM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: start rollback
Feb 20, 2012 7:07:45 PM org.apache.solr.update.DirectUpdateHandler2 rollback

--
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-running-out-of-memory-tp490797p3760755.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: customizing standard tokenizer

2012-02-20 Thread Torsten Krah
Thx, will use the custom tokenizer. Its less error prone than the
"workarounds" mentioned.


smime.p7s
Description: S/MIME cryptographic signature


How to check for inactive cores in a solr multicore setup?

2012-02-20 Thread Nasima Banu
Hello,

I am trying to figure out a way to detect inactive cores in a multicore setup.
How is that possible?
I queried the STATUS of a core through the CoreAdminHandler. Could anyone 
please tell me what the 'current' field means??

Eg : http://localhost:8080/solr/admin/cores?action=STATUS&core=2

Response ::

2multicore/solr/2/multicore/solr/2/data/2012-02-17T06:19:20.805Z2798119257237381487132869693015312truetrueorg.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@multicore/solr/2/data/index
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@4cd0b9d72012-02-20T12:02:12Z



Please help.

Thanks,
Nasima


Re: custom scoring

2012-02-20 Thread Carlos Gonzalez-Cadenas
Yeah Em, it helped a lot :)

Here it is (for the user query "hoteles"):

*+(stopword_shortened_phrase:hoteles | stopword_phrase:hoteles |
wildcard_stopword_shortened_phrase:hoteles |
wildcard_stopword_phrase:hoteles) *

*product(pow(query((stopword_shortened_phrase:hoteles |
stopword_phrase:hoteles | wildcard_stopword_shortened_phrase:hoteles |
wildcard_stopword_phrase:hoteles),def=0.0),const(0.5)),float(query_score))*

Thanks a lot for your help.

Carlos
Carlos Gonzalez-Cadenas
CEO, ExperienceOn - New generation search
http://www.experienceon.com

Mobile: +34 652 911 201
Skype: carlosgonzalezcadenas
LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas


On Mon, Feb 20, 2012 at 1:50 PM, Em  wrote:

> Carlos,
>
> nice to hear that the approach helped you!
>
> Could you show us how your query-request looks like after reworking?
>
> Regards,
> Em
>
> Am 20.02.2012 13:30, schrieb Carlos Gonzalez-Cadenas:
> > Hello all:
> >
> > We've done some tests with Em's approach of putting a BooleanQuery in
> front
> > of our user query, that means:
> >
> > BooleanQuery
> > must (DismaxQuery)
> > should (FunctionQuery)
> >
> > The FunctionQuery obtains the SOLR IR score by means of a
> QueryValueSource,
> > then does the SQRT of this value, and then multiplies it by our custom
> > "query_score" float, pulling it by means of a FieldCacheSource.
> >
> > In particular, we've proceeded in the following way:
> >
> >- we've loaded the whole index in the page cache of the OS to make
> sure
> >we don't have disk IO problems that might affect the benchmarks (our
> >machine has enough memory to load all the index in RAM)
> >- we've executed an out-of-benchmark query 10-20 times to make sure
> that
> >everything is jitted and that Lucene's FieldCache is properly
> populated.
> >- we've disabled all the caches (filter query cache, document cache,
> >query cache)
> >- we've executed 8 different user queries with and without
> >FunctionQueries, with early termination in both cases (our collector
> stops
> >after collecting 50 documents per shard)
> >
> > Em was correct, the query is much faster with the BooleanQuery in front,
> > but it's still 30-40% slower than the query without FunctionQueries.
> >
> > Although one may think that it's reasonable that the query response time
> > increases because of the extra computations, we believe that the increase
> > is too big, given that we're collecting just 500-600 documents due to the
> > early query termination techniques we currently use.
> >
> > Any ideas on how to make it faster?.
> >
> > Thanks a lot,
> > Carlos
> >
> > Carlos Gonzalez-Cadenas
> > CEO, ExperienceOn - New generation search
> > http://www.experienceon.com
> >
> > Mobile: +34 652 911 201
> > Skype: carlosgonzalezcadenas
> > LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
> >
> >
> > On Fri, Feb 17, 2012 at 11:07 AM, Carlos Gonzalez-Cadenas <
> > c...@experienceon.com> wrote:
> >
> >> Thanks Em, Robert, Chris for your time and valuable advice. We'll make
> >> some tests and will let you know soon.
> >>
> >>
> >>
> >> On Thu, Feb 16, 2012 at 11:43 PM, Em 
> wrote:
> >>
> >>> Hello Carlos,
> >>>
> >>> I think we missunderstood eachother.
> >>>
> >>> As an example:
> >>> BooleanQuery (
> >>>  clauses: (
> >>> MustMatch(
> >>>   DisjunctionMaxQuery(
> >>>   TermQuery("stopword_field", "barcelona"),
> >>>   TermQuery("stopword_field", "hoteles")
> >>>   )
> >>> ),
> >>> ShouldMatch(
> >>>  FunctionQuery(
> >>>*please insert your function here*
> >>> )
> >>> )
> >>>  )
> >>> )
> >>>
> >>> Explanation:
> >>> You construct an artificial BooleanQuery which wraps your user's query
> >>> as well as your function query.
> >>> Your user's query - in that case - is just a DisjunctionMaxQuery
> >>> consisting of two TermQueries.
> >>> In the real world you might construct another BooleanQuery around your
> >>> DisjunctionMaxQuery in order to have more flexibility.
> >>> However the interesting part of the given example is, that we specify
> >>> the user's query as a MustMatch-condition of the BooleanQuery and the
> >>> FunctionQuery just as a ShouldMatch.
> >>> Constructed that way, I am expecting the FunctionQuery only scores
> those
> >>> documents which fit the MustMatch-Condition.
> >>>
> >>> I conclude that from the fact that the FunctionQuery-class also has a
> >>> skipTo-method and I would expect that the scorer will use it to score
> >>> only matching documents (however I did not search where and how it
> might
> >>> get called).
> >>>
> >>> If my conclusion is wrong than hopefully Robert Muir (as far as I can
> >>> see the author of that class) can tell us what was the intention by
> >>> constructing an every-time-match-all-function-query.
> >>>
> >>> Can you validate whether your QueryParser constructs a query in the
> form
> >>> I drew above

Re: Development inside or outside of Solr?

2012-02-20 Thread Erick Erickson
Either is possible. For the first, you would write a custom update processor
that handled the dual Tika call...

For the second, consider writing a SolrJ program that just does it all on
the client. Just download Tika from the apache project (or tease out all
the jars from the Solr distro) and then make it all work on the client.

Here's a sample app:
http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/

Best
Erick

On Sun, Feb 19, 2012 at 9:44 PM, bing  wrote:
> Hi, all,
>
> I am deploying a multicore solr server runing on Tomcat, where I want to
> achieve language detection during index/query.
>
> Solr3.5.0 has a wrapped Tika API that can do language detection. Currently,
> the default behavior of Solr3.5.0 is, every time I index a document, and at
> mean time Solr call Tika API to give the result of language detection, i.e.
> index and detection happens at the same time. However, I hope I can have the
> language detection result first, and then I decide which core to put the
> document, i.e. detection happens before index.
>
> There seems that I need to do development in either of the following ways:
>
> 1. I might need to do revision of Solr itself, change the default behavior
> of Solr;
> 2. Or I might write a Java client outside Solr, call the client through
> server (JSP maybe) in index/query.
>
> Can anyone meeting with similar conditions give some suggestions about the
> advantages and disad of the two approaches? Any other alternatives? Thank
> you.
>
>
> Best
> Bing
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3759680.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: custom scoring

2012-02-20 Thread Em
Carlos,

nice to hear that the approach helped you!

Could you show us how your query-request looks like after reworking?

Regards,
Em

Am 20.02.2012 13:30, schrieb Carlos Gonzalez-Cadenas:
> Hello all:
> 
> We've done some tests with Em's approach of putting a BooleanQuery in front
> of our user query, that means:
> 
> BooleanQuery
> must (DismaxQuery)
> should (FunctionQuery)
> 
> The FunctionQuery obtains the SOLR IR score by means of a QueryValueSource,
> then does the SQRT of this value, and then multiplies it by our custom
> "query_score" float, pulling it by means of a FieldCacheSource.
> 
> In particular, we've proceeded in the following way:
> 
>- we've loaded the whole index in the page cache of the OS to make sure
>we don't have disk IO problems that might affect the benchmarks (our
>machine has enough memory to load all the index in RAM)
>- we've executed an out-of-benchmark query 10-20 times to make sure that
>everything is jitted and that Lucene's FieldCache is properly populated.
>- we've disabled all the caches (filter query cache, document cache,
>query cache)
>- we've executed 8 different user queries with and without
>FunctionQueries, with early termination in both cases (our collector stops
>after collecting 50 documents per shard)
> 
> Em was correct, the query is much faster with the BooleanQuery in front,
> but it's still 30-40% slower than the query without FunctionQueries.
> 
> Although one may think that it's reasonable that the query response time
> increases because of the extra computations, we believe that the increase
> is too big, given that we're collecting just 500-600 documents due to the
> early query termination techniques we currently use.
> 
> Any ideas on how to make it faster?.
> 
> Thanks a lot,
> Carlos
> 
> Carlos Gonzalez-Cadenas
> CEO, ExperienceOn - New generation search
> http://www.experienceon.com
> 
> Mobile: +34 652 911 201
> Skype: carlosgonzalezcadenas
> LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
> 
> 
> On Fri, Feb 17, 2012 at 11:07 AM, Carlos Gonzalez-Cadenas <
> c...@experienceon.com> wrote:
> 
>> Thanks Em, Robert, Chris for your time and valuable advice. We'll make
>> some tests and will let you know soon.
>>
>>
>>
>> On Thu, Feb 16, 2012 at 11:43 PM, Em  wrote:
>>
>>> Hello Carlos,
>>>
>>> I think we missunderstood eachother.
>>>
>>> As an example:
>>> BooleanQuery (
>>>  clauses: (
>>> MustMatch(
>>>   DisjunctionMaxQuery(
>>>   TermQuery("stopword_field", "barcelona"),
>>>   TermQuery("stopword_field", "hoteles")
>>>   )
>>> ),
>>> ShouldMatch(
>>>  FunctionQuery(
>>>*please insert your function here*
>>> )
>>> )
>>>  )
>>> )
>>>
>>> Explanation:
>>> You construct an artificial BooleanQuery which wraps your user's query
>>> as well as your function query.
>>> Your user's query - in that case - is just a DisjunctionMaxQuery
>>> consisting of two TermQueries.
>>> In the real world you might construct another BooleanQuery around your
>>> DisjunctionMaxQuery in order to have more flexibility.
>>> However the interesting part of the given example is, that we specify
>>> the user's query as a MustMatch-condition of the BooleanQuery and the
>>> FunctionQuery just as a ShouldMatch.
>>> Constructed that way, I am expecting the FunctionQuery only scores those
>>> documents which fit the MustMatch-Condition.
>>>
>>> I conclude that from the fact that the FunctionQuery-class also has a
>>> skipTo-method and I would expect that the scorer will use it to score
>>> only matching documents (however I did not search where and how it might
>>> get called).
>>>
>>> If my conclusion is wrong than hopefully Robert Muir (as far as I can
>>> see the author of that class) can tell us what was the intention by
>>> constructing an every-time-match-all-function-query.
>>>
>>> Can you validate whether your QueryParser constructs a query in the form
>>> I drew above?
>>>
>>> Regards,
>>> Em
>>>
>>> Am 16.02.2012 20:29, schrieb Carlos Gonzalez-Cadenas:
 Hello Em:

 1) Here's a printout of an example DisMax query (as you can see mostly
>>> MUST
 terms except for some SHOULD terms used for boosting scores for
>>> stopwords)
 *
 *
 *((+stopword_shortened_phrase:hoteles
>>> +stopword_shortened_phrase:barcelona
 stopword_shortened_phrase:en) | (+stopword_phrase:hoteles
 +stopword_phrase:barcelona
 stopword_phrase:en) | (+stopword_shortened_phrase:hoteles
>>> +stopword_short
 ened_phrase:barcelona stopword_shortened_phrase:en) |
>>> (+stopword_phrase:hoteles
 +stopword_phrase:barcelona stopword_phrase:en) | (+stopword_shor
 tened_phrase:hoteles +wildcard_stopword_shortened_phrase:barcelona
 stopword_shortened_phrase:en) | (+stopword_phrase:hoteles
>>> +wildcard_stopw
 ord_phrase:barcelona stopword_phrase:en) |
>>> (+stopword_shortened_ph

Re: custom scoring

2012-02-20 Thread Carlos Gonzalez-Cadenas
Hello all:

We've done some tests with Em's approach of putting a BooleanQuery in front
of our user query, that means:

BooleanQuery
must (DismaxQuery)
should (FunctionQuery)

The FunctionQuery obtains the SOLR IR score by means of a QueryValueSource,
then does the SQRT of this value, and then multiplies it by our custom
"query_score" float, pulling it by means of a FieldCacheSource.

In particular, we've proceeded in the following way:

   - we've loaded the whole index in the page cache of the OS to make sure
   we don't have disk IO problems that might affect the benchmarks (our
   machine has enough memory to load all the index in RAM)
   - we've executed an out-of-benchmark query 10-20 times to make sure that
   everything is jitted and that Lucene's FieldCache is properly populated.
   - we've disabled all the caches (filter query cache, document cache,
   query cache)
   - we've executed 8 different user queries with and without
   FunctionQueries, with early termination in both cases (our collector stops
   after collecting 50 documents per shard)

Em was correct, the query is much faster with the BooleanQuery in front,
but it's still 30-40% slower than the query without FunctionQueries.

Although one may think that it's reasonable that the query response time
increases because of the extra computations, we believe that the increase
is too big, given that we're collecting just 500-600 documents due to the
early query termination techniques we currently use.

Any ideas on how to make it faster?.

Thanks a lot,
Carlos

Carlos Gonzalez-Cadenas
CEO, ExperienceOn - New generation search
http://www.experienceon.com

Mobile: +34 652 911 201
Skype: carlosgonzalezcadenas
LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas


On Fri, Feb 17, 2012 at 11:07 AM, Carlos Gonzalez-Cadenas <
c...@experienceon.com> wrote:

> Thanks Em, Robert, Chris for your time and valuable advice. We'll make
> some tests and will let you know soon.
>
>
>
> On Thu, Feb 16, 2012 at 11:43 PM, Em  wrote:
>
>> Hello Carlos,
>>
>> I think we missunderstood eachother.
>>
>> As an example:
>> BooleanQuery (
>>  clauses: (
>> MustMatch(
>>   DisjunctionMaxQuery(
>>   TermQuery("stopword_field", "barcelona"),
>>   TermQuery("stopword_field", "hoteles")
>>   )
>> ),
>> ShouldMatch(
>>  FunctionQuery(
>>*please insert your function here*
>> )
>> )
>>  )
>> )
>>
>> Explanation:
>> You construct an artificial BooleanQuery which wraps your user's query
>> as well as your function query.
>> Your user's query - in that case - is just a DisjunctionMaxQuery
>> consisting of two TermQueries.
>> In the real world you might construct another BooleanQuery around your
>> DisjunctionMaxQuery in order to have more flexibility.
>> However the interesting part of the given example is, that we specify
>> the user's query as a MustMatch-condition of the BooleanQuery and the
>> FunctionQuery just as a ShouldMatch.
>> Constructed that way, I am expecting the FunctionQuery only scores those
>> documents which fit the MustMatch-Condition.
>>
>> I conclude that from the fact that the FunctionQuery-class also has a
>> skipTo-method and I would expect that the scorer will use it to score
>> only matching documents (however I did not search where and how it might
>> get called).
>>
>> If my conclusion is wrong than hopefully Robert Muir (as far as I can
>> see the author of that class) can tell us what was the intention by
>> constructing an every-time-match-all-function-query.
>>
>> Can you validate whether your QueryParser constructs a query in the form
>> I drew above?
>>
>> Regards,
>> Em
>>
>> Am 16.02.2012 20:29, schrieb Carlos Gonzalez-Cadenas:
>> > Hello Em:
>> >
>> > 1) Here's a printout of an example DisMax query (as you can see mostly
>> MUST
>> > terms except for some SHOULD terms used for boosting scores for
>> stopwords)
>> > *
>> > *
>> > *((+stopword_shortened_phrase:hoteles
>> +stopword_shortened_phrase:barcelona
>> > stopword_shortened_phrase:en) | (+stopword_phrase:hoteles
>> > +stopword_phrase:barcelona
>> > stopword_phrase:en) | (+stopword_shortened_phrase:hoteles
>> +stopword_short
>> > ened_phrase:barcelona stopword_shortened_phrase:en) |
>> (+stopword_phrase:hoteles
>> > +stopword_phrase:barcelona stopword_phrase:en) | (+stopword_shor
>> > tened_phrase:hoteles +wildcard_stopword_shortened_phrase:barcelona
>> > stopword_shortened_phrase:en) | (+stopword_phrase:hoteles
>> +wildcard_stopw
>> > ord_phrase:barcelona stopword_phrase:en) |
>> (+stopword_shortened_phrase:hoteles
>> > +wildcard_stopword_shortened_phrase:barcelona
>> stopword_shortened_phrase:en)
>> > | (+stopword_phrase:hoteles +wildcard_stopword_phrase:barcelona
>> > stopword_phrase:en))*
>> > *
>> > *
>> > 2)* *The collector is inserted in the SolrIndexSearcher (replacing the
>> > TimeLimitingCollector). We trigger it through the SOLR interface by

Problem with SolrCloud + Zookeeper + DataImportHandler

2012-02-20 Thread Agnieszka Kukałowicz
Hi All,

I've recently downloaded latest solr trunk to configure solrcloud with
zookeeper
using standard configuration from wiki:
http://wiki.apache.org/solr/SolrCloud.

The problem occurred when I tried to configure DataImportHandler in
solrconfig.xml:

  

   db-data-config.xml

  


After starting solr with zookeeper I've got errors:

Feb 20, 2012 11:30:12 AM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.SolrException
at org.apache.solr.core.SolrCore.(SolrCore.java:606)
at org.apache.solr.core.SolrCore.(SolrCore.java:490)
at
org.apache.solr.core.CoreContainer.create(CoreContainer.java:705)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:442)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:313)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.ja
va:262)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:98
)
at
org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:71
3)
at
org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:128
2)
at
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
at
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java
:152)
at
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerC
ollection.java:156)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java
:152)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
at org.mortbay.jetty.Server.doStart(Server.java:224)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
pl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.mortbay.start.Main.invokeMain(Main.java:194)
at org.mortbay.start.Main.start(Main.java:534)
at org.mortbay.start.Main.start(Main.java:441)
at org.mortbay.start.Main.main(Main.java:119)
Caused by: org.apache.solr.common.SolrException: FATAL: Could not create
importer. DataImporter config invalid
at
org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHand
ler.java:120)
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:542
)
at org.apache.solr.core.SolrCore.(SolrCore.java:601)
... 31 more
Caused by: org.apache.solr.common.cloud.ZooKeeperException:
ZkSolrResourceLoader does not support getConfigDir() - likely, w
at
org.apache.solr.cloud.ZkSolrResourceLoader.getConfigDir(ZkSolrResourceLoad
er.java:99)
at
org.apache.solr.handler.dataimport.SimplePropertiesWriter.init(SimplePrope
rtiesWriter.java:47)
at
org.apache.solr.handler.dataimport.DataImporter.(DataImporter.java:1
12)
at
org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHand
ler.java:114)
... 33 more

I've checked if file db-data-config.xml is available in Zookeeper:

[zk: localhost:2181(CONNECTED) 0] ls /configs/conf1
[admin-extra.menu-top.html, dict, solrconfig.xml, dataimport.properties,
admin-extra.html, solrconfig.xml.old, solrconfig.xml.new, solrconfig.xml~,
xslt, db-data-config.xml, velocity, elevate.xml,
admin-extra.menu-bottom.html, solrconfig.xml.dataimport, schema.xml]
[zk: localhost:2181(CONNECTED) 1]

Is it possible to configure DIH with Zookeper? And how to do it?
I'm little confused with that.

Regards
Agnieszka Kukalowicz


Re: Solr logging

2012-02-20 Thread François Schiettecatte
Ola

Here is what I have for this:


##
#
# Log4J configuration for SOLR
#
#   http://wiki.apache.org/solr/SolrLogging
#
#
# 1) Download LOG4J:
#   http://logging.apache.org/log4j/1.2/
#   http://logging.apache.org/log4j/1.2/download.html
#   
http://www.apache.org/dyn/closer.cgi/logging/log4j/1.2.16/apache-log4j-1.2.16.tar.gz
#   
http://newverhost.com/pub//logging/log4j/1.2.16/apache-log4j-1.2.16.tar.gz
#
# 2) Download SLF4J:
#   http://www.slf4j.org/
#   http://www.slf4j.org/download.html
#   http://www.slf4j.org/dist/slf4j-1.6.4.tar.gz
#
# 3) Unpack Solr:
#   jar xvf apache-solr-3.5.0.war
#
# 4) Delete:
#   WEB-INF/lib/log4j-over-slf4j-1.6.4.jar
#   WEB-INF/lib/slf4j-jdk14-1.6.4.jar
#
# 5) Copy:
#   apache-log4j-1.2.16/log4j-1.2.16.jar->  WEB-INF/lib
#   slf4j-1.6.4/slf4j-log4j12-1.6.4.jar ->  WEB-INF/lib
#   log4j.properties (this file)->  WEB-INF/classes/ (needs 
to be created)
#
# 6) Pack Solr:
#   jar cvf apache-solr-3.4.0-omim.war admin favicon.ico index.jsp META-INF 
WEB-INF
#
#
#   Author: Francois Schiettecatte
#   Version:1.0
#
##



##
#
# Logging levels (helpful reminder)
#
# DEBUG < INFO < WARN < ERROR < FATAL
#



##
#
# Logging setup
#

log4j.rootLogger=WARN, SOLR


# Daily Rolling File Appender (SOLR)
log4j.appender.SOLR=org.apache.log4j.DailyRollingFileAppender
log4j.appender.SOLR.File=${catalina.base}/logs/solr.log
log4j.appender.SOLR.Append=true
log4j.appender.SOLR.Encoding=UTF-8
log4j.appender.SOLR.DatePattern='-'-MM-dd
log4j.appender.SOLR.layout=org.apache.log4j.PatternLayout
log4j.appender.SOLR.layout.ConversionPattern=%d [%t] %-5p %c - %m%n



##
#
# Logging levels for SOLR
#

# Default logging level
log4j.logger.org.apache.solr=WARN



##



On Feb 20, 2012, at 5:15 AM, ola nowak wrote:

> Yep. I suppose it is. But I have several applications installed on
> glassfish and I want each one of them to write into separate file. And Your
> solution with this jvm option was redirecting all messages from all apps to
> one file. Does anyone knows how to accomplish that?
> 
> 
> On Mon, Feb 20, 2012 at 11:09 AM, darul  wrote:
> 
>> Hmm, I did not try to achieve this but interested if you find a way...
>> 
>> After I believe than having log4j config file outside war archive is a
>> better solution, if you may need to update its content for example.
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Solr-logging-tp3760171p3760322.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 



Re: Development inside or outside of Solr?

2012-02-20 Thread François Schiettecatte
You could take a look at this:

http://www.let.rug.nl/vannoord/TextCat/

Will probably require some work to integrate/implement through

François

On Feb 20, 2012, at 3:37 AM, bing wrote:

> I have looked into the TikaCLI with -language option, and learned that Tika
> can output only the language metadata. It cannot help me to solve my problem
> though, as my main concern is whether to change Solr or not.  Thank you all
> the same. 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3760131.html
> Sent from the Solr - User mailing list archive at Nabble.com.



solr and tika

2012-02-20 Thread alessio crisantemi
Hi all,
In a new installation of sOlr (1.4) I configured Tika  for indexing rich
documents.
So, I commit my files and I can find it after indexing with an http query "*
http://localhost:8983/solr/select?q=attr_content:parola*"; (for search the
word 'parola') and I find the committed text.
but if I search with Solr front panel, the results is '0 documents'.

suggestions?
thanks
alessio


Re: Payload and exact search - 2

2012-02-20 Thread leonardo2
Ok, it works!!
Thanks you very much.

Leonardo


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Payload-and-exact-search-2-tp3750355p3760477.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr logging

2012-02-20 Thread darul
This case explained here:

http://stackoverflow.com/questions/762918/how-to-configure-multiple-log4j-for-different-wars-in-a-single-ear

http://techcrawler.wordpress.com/



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-logging-tp3760171p3760352.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr logging

2012-02-20 Thread ola nowak
Yep. I suppose it is. But I have several applications installed on
glassfish and I want each one of them to write into separate file. And Your
solution with this jvm option was redirecting all messages from all apps to
one file. Does anyone knows how to accomplish that?


On Mon, Feb 20, 2012 at 11:09 AM, darul  wrote:

> Hmm, I did not try to achieve this but interested if you find a way...
>
> After I believe than having log4j config file outside war archive is a
> better solution, if you may need to update its content for example.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-logging-tp3760171p3760322.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr logging

2012-02-20 Thread darul
Hmm, I did not try to achieve this but interested if you find a way...

After I believe than having log4j config file outside war archive is a
better solution, if you may need to update its content for example.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-logging-tp3760171p3760322.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr logging

2012-02-20 Thread ola nowak
I've already done that. What I'm more interested is if I can add log4j.xml
to war and where to put to make it works

On Mon, Feb 20, 2012 at 10:49 AM, darul  wrote:

> Yes, you can update your .war archive by adding/removing expected jars.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-logging-tp3760171p3760285.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr logging

2012-02-20 Thread darul
Yes, you can update your .war archive by adding/removing expected jars.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-logging-tp3760171p3760285.html
Sent from the Solr - User mailing list archive at Nabble.com.


processing of merged tokens

2012-02-20 Thread Carlos Gonzalez-Cadenas
Hello all,

For our search system we'd like to be able to process merged tokens, i.e.
when a user enters a query like "hotelsin barcelona", we'd like to know
that the user means "hotels in barcelona".

At some point in the past we implemented this kind of functionality with
shingles (using ShingleFilter), that is, if we were indexing the sentence
"hotels in barcelona" as a document, we'd be able to match at query time
merged tokens like "hotelsin" and "inbarcelona".

This solution has two problems:
1) The index size increases a lot.
2) We only catch a small % of the possibilities. Merged tokens like
"hotelsbarcelona" or "barcelonahotels" cannot be processed.

Our intuition is that there should be a better solution. Maybe it's solved
in SOLR or Lucene and we haven't found it yet. If it's not solved, I can
imagine a naive solution that would use TermsEnum to identify whether a
token exists in the index or not, and then if it doesn't exist, use the
TermsEnum again to check whether it's a composition of two known tokens.

It's highly likely that there are much better solutions and algorithms for
this. It would be great if you can help us identify the best way to solve
this problem.

Thanks a lot for your help.

Carlos

Carlos Gonzalez-Cadenas
CEO, ExperienceOn - New generation search
http://www.experienceon.com

Mobile: +34 652 911 201
Skype: carlosgonzalezcadenas
LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas


Re: Solr logging

2012-02-20 Thread ola nowak
Thanks a lot.
I've added (and deleted) those libraries and now I don't get this messages
to stdout :) I see that log4j is running and it can't find its config
file.  I wish I could add this to the solr.war. Is this possible?  I want
to avoid setting paramemeters in glassfish.
Regards,
Alex

On Mon, Feb 20, 2012 at 9:58 AM, darul  wrote:

> I get similar questions in the past :)
>
> http://lucene.472066.n3.nabble.com/Jetty-logging-td3476715.html#a3483146
>
> wish it will help you.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-logging-tp3760171p3760173.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr logging

2012-02-20 Thread darul
I get similar questions in the past :)

http://lucene.472066.n3.nabble.com/Jetty-logging-td3476715.html#a3483146

wish it will help you.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-logging-tp3760171p3760173.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr logging

2012-02-20 Thread ola nowak
Hi,
I want to set my Solr to use log4j and to write log messages into separate
file instead of writing all on standard output. How can I do it? Which jars
should I add? Where should I put log4j.xml file?
Regards,
Alex


Re: Development inside or outside of Solr?

2012-02-20 Thread bing
I have looked into the TikaCLI with -language option, and learned that Tika
can output only the language metadata. It cannot help me to solve my problem
though, as my main concern is whether to change Solr or not.  Thank you all
the same. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3760131.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Git repo

2012-02-20 Thread Igor MILOVANOVIC
http://git.apache.org/

On Sun, Feb 19, 2012 at 7:50 PM, Mark Diggory  wrote:

> Is there a git repo location that mirrors apache svn repos for solr?
>
> Cheers,
> Mark
>
>
> --
> [image: @mire Inc.]
> *Mark Diggory *(Schedule a
> Meeting<
> https://www.google.com/calendar/selfsched?sstoken=UUdDSzJzTTlOUE1mfGRlZmF1bHR8MzgwMmEwYjk1NDc1NDQ1MGI0NWViYjYzZjExZDI3Mzg
> >
> )
> *2888 Loker Avenue East, Suite 305, Carlsbad, CA. 92010*
> *Esperantolaan 4, Heverlee 3001, Belgium*
> http://www.atmire.com
>



-- 
Igor Milovanović
https://twitter.com/#!/f13o | http://about.me/igor.milovanovic |
http://umotvorine.com/