Re: Error while upgrading from 1.4 to 3.1

2011-11-09 Thread deniz
well fixed for now... just ignore this thread 

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-while-upgrading-from-1-4-to-3-1-tp3492373p3492887.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: representing latlontype in pojo

2011-11-09 Thread Michael Kuhlmann

Am 08.11.2011 23:38, schrieb Cam Bazz:

How can I store a 2d point and index it to a field type that is
latlontype, if I am using solrj?


Simply use a String field. The format is "$latitude,$longitude".

-Kuli



Re: Solr dismax scoring and weight

2011-11-09 Thread darul
Thanks for the details, but what do you mean by normalization, can you
describe shortly the concepts behind ?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-dismax-scoring-and-weight-tp3490096p3492986.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search Correlated Data between Multivalued Fields

2011-11-09 Thread Andre Bois-Crettez

I do not think this is possbile directly out of the box in Solr.

A quick workaround would be to fully denormalize the data, ie instead of 
multivalued notes for a customer, have a completely flat index of 
customer_note.
Or maybe a custom request handler plugin could actually check that 
matches are for note_id[x], note_date[x], and note_Text[x] ? Not sure if 
this is doable.


Andre

David T. Webb wrote:

I have a normalized database schema that I have flattened out to create
a Solr schema.  My question is with regards to searching the multivalued
fields that are correlated from the sub-entity in the DataInputHandler.

 


Example

I have 2 tables CUSTOMER and NOTE

 


Customer can have one to many notes.

 


My data-config would look similar to this: (Not exact, just setting up
the question) J

 




  



 


My schema would be something like this:

 










 


required="false" multiValued="true" /> 






 

All is well, indexed and searchable. 

 


So, if there are 100 notes per customer at varying dates, how would I
query to essentially ask:

 


Give me all the Customers where note_text has "sales" AND the note_date
is between Date1 and Date2?

 


The multi-valued data is stored as arrays and the array positions line
up property. (i.e.  note_id[x], note_date[x], and note_Text[x] represent
an actual row that was loaded from the database.

 


Any suggestions on how to accomplish my problem?

 


Thank you!

 


--

Sincerely,

David Webb

 



  


--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/



Re: Out of memory during the indexing

2011-11-09 Thread Andre Bois-Crettez

How much memory you actually allocate to the JVM ?
http://wiki.apache.org/solr/SolrPerformanceFactors#Memory_allocated_to_the_Java_VM
You need to increase the -Xmx value, otherwise your large ram buffers 
won't fit in the java heap.



sivaprasad wrote:

Hi,

I am getting the following error during the indexing.I am trying to index 14
million records but the document size is very minimal.

*Error:*
2011-11-08 14:53:24,634 ERROR [STDERR] (Thread-12)
java.lang.OutOfMemoryError: GC overhead limit exceeded

  

[...]

Do i need to increase the heap size for JVM?

My solrconfig settings are given below.


  
false


25

2
   
1024

2147483647
1
1000
1

and the main index values are 


false
512
10
2147483647
1

Do i need to increase the ramBufferSizeMB to a little higher?

Please provide your inputs.

Regards,
Siva
 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Out-of-memory-during-the-indexing-tp3492701p3492701.html
Sent from the Solr - User mailing list archive at Nabble.com.

  


--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/



Re: Search Correlated Data between Multivalued Fields

2011-11-09 Thread David T. Webb
Can you point me to the docs on how to create the additional flat index of 
note?  Thx for the quick reply. Dave. 

Sent from my iPhone

On Nov 9, 2011, at 6:03 AM, "Andre Bois-Crettez"  wrote:

> I do not think this is possbile directly out of the box in Solr.
> 
> A quick workaround would be to fully denormalize the data, ie instead of 
> multivalued notes for a customer, have a completely flat index of 
> customer_note.
> Or maybe a custom request handler plugin could actually check that matches 
> are for note_id[x], note_date[x], and note_Text[x] ? Not sure if this is 
> doable.
> 
> Andre
> 
> David T. Webb wrote:
>> I have a normalized database schema that I have flattened out to create
>> a Solr schema.  My question is with regards to searching the multivalued
>> fields that are correlated from the sub-entity in the DataInputHandler.
>> 
>> 
>> Example
>> 
>> I have 2 tables CUSTOMER and NOTE
>> 
>> 
>> Customer can have one to many notes.
>> 
>> 
>> My data-config would look similar to this: (Not exact, just setting up
>> the question) J
>> 
>> 
>> 
>> 
>>  
>> 
>> 
>> 
>> 
>> My schema would be something like this:
>> 
>> 
>> > required="true" />
>> 
>> > required="false" />
>> 
>> > required="false" />
>> 
>> > required="false" />
>> 
>> 
>> > required="false" multiValued="true" /> 
>> > required="false" multiValued="true" />
>> 
>> > required="false" multiValued="true" />
>> 
>> 
>> All is well, indexed and searchable. 
>> 
>> So, if there are 100 notes per customer at varying dates, how would I
>> query to essentially ask:
>> 
>> 
>> Give me all the Customers where note_text has "sales" AND the note_date
>> is between Date1 and Date2?
>> 
>> 
>> The multi-valued data is stored as arrays and the array positions line
>> up property. (i.e.  note_id[x], note_date[x], and note_Text[x] represent
>> an actual row that was loaded from the database.
>> 
>> 
>> Any suggestions on how to accomplish my problem?
>> 
>> 
>> Thank you!
>> 
>> 
>> --
>> 
>> Sincerely,
>> 
>> David Webb
>> 
>> 
>> 
>>  
> 
> -- 
> André Bois-Crettez
> 
> Search technology, Kelkoo
> http://www.kelkoo.com/
> 


Solr 4.0 indexing NoSuchMethodError

2011-11-09 Thread elisabeth benoit
Hello,

I've just installed Solr 4.0, and I am getting an error when indexing.

*GRAVE: java.lang.NoSuchMethodError:
org.apache.lucene.util.CodecUtil.writeHeader(Lorg/apache/lucene/store/DataOutput;Ljava/lang/String;I)Lorg/apache/lucene/store/DataOutput;
at org.apache.lucene.util.fst.FST.save(FST.java:311)*.

Does anybody know what I've done wrong?

Thanks,
Elisabeth


ExtractingRequestHandler HTTP GET Problem

2011-11-09 Thread Felix Remmel
Hi,
I've a problem with the ExtractingRequestHandler of Solr. I want to
send a really big base64 encoded string to Solr with the
CommonsHttpSolrServer. The base64 encoded string is the contet of the
indexed file. The CommonsHttpSolrServer sends the parameters as a HTTP
GET request. Because of that I'll get a "socket write error". If I
change the CommonsHttpSolrServer to send the parameters as HTTP POST
sending will work, but the ExtractingRequestHandler will not recognize
the parameters. If I'm using the EmbeddedSolrServer there is no
problem. This is an option for me now, but I don't know exactly if the
solr server will be at another location than the website when my
project will be ready for the "real world" use ;-). Is there anything
what I can do in the configuration of solr or the application server?
Otherwhise I'll write a patch for that, if it'll be accepted.

Felix


Re: Search Correlated Data between Multivalued Fields

2011-11-09 Thread Andre Bois-Crettez

Something like :





David T. Webb wrote:
Can you point me to the docs on how to create the additional flat index of note?  Thx for the quick reply. Dave. 


Sent from my iPhone

On Nov 9, 2011, at 6:03 AM, "Andre Bois-Crettez"  wrote:

  

I do not think this is possbile directly out of the box in Solr.

A quick workaround would be to fully denormalize the data, ie instead of 
multivalued notes for a customer, have a completely flat index of customer_note.
Or maybe a custom request handler plugin could actually check that matches are 
for note_id[x], note_date[x], and note_Text[x] ? Not sure if this is doable.

Andre

David T. Webb wrote:


I have a normalized database schema that I have flattened out to create
a Solr schema.  My question is with regards to searching the multivalued
fields that are correlated from the sub-entity in the DataInputHandler.


Example

I have 2 tables CUSTOMER and NOTE


Customer can have one to many notes.


My data-config would look similar to this: (Not exact, just setting up
the question) J




 




My schema would be something like this:











required="false" multiValued="true" /> 

required="false" multiValued="true" />




All is well, indexed and searchable. 


So, if there are 100 notes per customer at varying dates, how would I
query to essentially ask:


Give me all the Customers where note_text has "sales" AND the note_date
is between Date1 and Date2?


The multi-valued data is stored as arrays and the array positions line
up property. (i.e.  note_id[x], note_date[x], and note_Text[x] represent
an actual row that was loaded from the database.


Any suggestions on how to accomplish my problem?


Thank you!


--

Sincerely,

David Webb



 
  

--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/




--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/



RE: SpellChecker : not getting suggestions for misspelled words

2011-11-09 Thread Dyer, James
Dali,

You might want to try to increase spellcheck.count to something higher, maybe 
10 or 20.  The default spell checker pre-filters suggestions in such a way that 
you often need to ask for more results than you actually want to get the right 
ones.  The other thing you might want to see is to go into the "spellchecker" 
directory after you run your first query and see if it built the dictionary. 

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: Dali [mailto:medalibenmans...@gmail.com] 
Sent: Tuesday, November 08, 2011 5:45 AM
To: solr-user@lucene.apache.org
Subject: SpellChecker : not getting suggestions for misspelled words

Hello everyone !

I'm new to Solr and i'm facing a huge problem since yesterday with the
SpellChecker component in Solr.
I followed instructions in the wiki page, browsed forums and i don't get any
result when typing a misspelled word like other people.

Here is what i have:
 *schema.xml :*


















...

...


*solrconfig.xml*:

 
 
   explicit
   
   velocity
   browse
   layout
   Solritas
   edismax
   *:*
   10
   *,score
   
 pr_name^0.5 pr_infGenDesc^1.0 pr_OS^1.2 pr_plus^1.5 pr_techno^10.0
pr_moins^1.1 
   
   text,features,name,sku,id,manu,cat
   3

   
  pr_name^0.5 pr_infGenDesc^1.0 pr_OS^1.2 pr_plus^1.5 pr_techno^10.0
pr_moins^1.1
   
   on
   cat
   manu_exact
   ipod
   GB
   1
   cat,inStock
   price
   0
   600
   50
   after
   manufacturedate_dt
   NOW/YEAR-10YEARS
   NOW
   +1YEAR
   before
   after
   
   on
   text features name
   0
   name
 
 
   spellcheck
 
 
  

...


textSpell

  default
  name
  spellchecker

   
 jarowinkler
 spell
 org.apache.lucene.search.spell.JaroWinklerDistance
 spellcheckerJaro
 true 
 true
   


And here is what i'm typing:
http://localhost:8080/solr/select/?q=pr_name:sonadr&spellcheck=true&spellcheck.build=true
http://localhost:8080/solr/select/?q=pr_name:sonadr&spellcheck=true&spellcheck.build=true
 

and the correct pr_name value (which is indexed) is "Sonar"

Any suggestions ?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/SpellChecker-not-getting-suggestions-for-misspelled-words-tp3490004p3490004.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SpellChecker : not getting suggestions for misspelled words

2011-11-09 Thread Dali
You're right James ! It was the solution and i can get suggestions now after
increasing spellcheck.count to 20.
I also made some change at the URL :
http://localhost:8080/solr/spell/?q=pr_name:sonadr&spellcheck=true&spellcheck.build=true
instead of:
http://localhost:8080/solr/select/?q=pr_name:sonadr&spellcheck=true&spellcheck.build=true

Dali

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SpellChecker-not-getting-suggestions-for-misspelled-words-tp3490004p3493762.html
Sent from the Solr - User mailing list archive at Nabble.com.


DIH -> how to collect added/error unique keys?

2011-11-09 Thread Kai Gülzau
Hi *,

I am using DataImportHandler to do imports on a INDEX_QUEUE table (UKEY | 
ACTION)
using a custom Transformer which adds fields from various sources depending on 
the UKEY.

Indexing works fine this way.

But now I want to delete the rows from INDEX_QUEUE which were successfully 
updated.

-> Is there a good "API way" to do this?

Right now I'm using custom RequestProcessor which collects the UIDs and calls a 
method
on a singleton with access to the DB. It works but I hate these global 
singletons... :-(

public void processAdd(AddUpdateCommand cmd) throws IOException {
  SolrInputDocument doc = cmd.getSolrInputDocument();
  try {
super.processAdd(cmd);
addOK(doc);
  } catch (IOException e) {
addError(doc);
throw e;
  } catch (RuntimeException e) {
addError(doc);
throw e;
  }
}

Any other suggestions?

Regards,

Kai Gülzau



Re: Weird: Solr Search result and Analysis Result not match?

2011-11-09 Thread Erick Erickson
Regarding <1>. Take a look at admin/analysis and see the tokenization just
to check.

Oh, and one more thing...
putting  in front of 
kind of defeats the purpose of WordDelimiterFilterFactory. One of the
things WDDF does is split on case change and you're removing the case
changes before WDDF gets hold of it.

Best
Erick

On Tue, Nov 8, 2011 at 9:40 PM, Ellery Leung  wrote:
> Thanks Erick, here are my responses:
>
> 1. Yes.  What I want to achieve is that when index is filtered with 
> EdgeNgram, and a query that is not filtered in that way, I can do search on 
> partial string.
> 2. Good suggestion, will test it.
> 3. ok
> 4. Thank you
> 5/6. Will remove the synonyms and word delimiterfilterfactory in query
> 7. will look at that using Luke.  By the way, it is the first time I saw that 
> there is a tool for that.  Thank you.
> 8. Yes.
>
> Will check that again, thank you.
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: 2011年11月8日 9:52 下午
> To: solr-user@lucene.apache.org; elleryle...@be-o.com
> Subject: Re: Weird: Solr Search result and Analysis Result not match?
>
> Several things:
>
> 1> You don't have EdgeNGramFilterFactory in your query analysis chain,
> is this intentional?
> 2> You have a LOT of stuff going on here, you might try making your
> analysis chain simpler and
>     adding stuff back in until you see the error. Don't forget to re-index!
> 3> Analysis doesn't take into account query *parsing*, so it's
> possible to get a false sense of
>     assurance when the analysis page matches your expectations.
> 4> Even though nothing jumps out at me except the Edge factory,
> nice job of including
>     information.
> 5> It's unusual to expand synonyms both at query and index time,
> usually one or the
>     other with index time preferred.
> 6> Same with WordDelimiterFilterFactory. If you put all the variants
> in the index, you don't
>     need to put all the variants in the query and vice-versa.
> 7> Take a look at your actual contents, perhaps using Luke to insure
> that what you expect
>      to be in your index actually is.
> 8> You did re-index after your latest changes to your schema, right ?
>
> All of this is a way of saying that I don't quite see what the problem
> is, but at least there are
> some avenues to explore.
>
> Best
> Erick
>
> On Mon, Nov 7, 2011 at 9:29 PM, Ellery Leung  wrote:
>> Hi all.
>>
>>
>>
>> I am using Solr 3.4 under Win 7.
>>
>>
>>
>> In schema there is a multivalue field indexed in this way:
>>
>> ==
>>
>> Schema:
>>
>> ==
>>
>> > stored="true" omitNorms="true"/>
>>
>>
>>
>> > positionIncrementGap="100">
>>
>>        
>>
>>                > mapping="../../filters/filter-mappings.txt"/>
>>
>>                
>>
>>                
>>
>>                
>>
>>                
>>
>>                > synonyms="../../filters/filter-synonyms.txt" ignoreCase="true"
>> expand="true"/>
>>
>>                
>>
>>                > splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" preserveOriginal="1"/>
>>
>>                > encoder="DoubleMetaphone" inject="true"/>
>>
>>                
>>
>>                > maxGramSize="50" side="front"/>
>>
>>                
>>
>>        
>>
>>        
>>
>>                > mapping="../../filters/filter-mappings.txt"/>
>>
>>                
>>
>>                
>>
>>                
>>
>>                
>>
>>                > synonyms="../../filters/filter-synonyms.txt" ignoreCase="true"
>> expand="true"/>
>>
>>                
>>
>>                > splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1"
>> generateWordParts="0" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" preserveOriginal="1"/>
>>
>>                > encoder="DoubleMetaphone"/>
>>
>>                
>>
>>                
>>
>>        
>>
>> 
>>
>> ==
>>
>> Actual index:
>>
>> ==
>>
>> 
>>
>> 2284e2
>>
>> 2284e4
>>
>> 2284e5
>>
>> 1911e2
>>
>> 
>>
>>
>>
>> ==
>>
>> Question:
>>
>> ==
>>
>> Now when I do a search like this:
>>
>>
>>
>> myEvent:1911e2
>>
>>
>>
>> This should match the 4th item.  Now on "Full Interface", it does not return
>> any result.  But on "analysis", matches are highlighted.
>>
>>
>>
>> By using Debug: the parsedquery is:
>>
>>
>>
>> MultiPhraseQuery(myEvent:"(1911e2 1911) (A e) 2")
>>
>>
>>
>> Parsedquery_toString:
>>
>>
>>
>> myEvent:"(1911e2 1911) (A e) 2"
>>
>>
>>
>> Can anyone please help me on this?
>>
>>
>
>


Distributed indexing

2011-11-09 Thread Rafał Kuć
Hello!

I was looking for a way to implement distributed indexing in Solr.
From looking at the https://issues.apache.org/jira/browse/SOLR-2358
there was some work done to enable Solr to distribute the documents to
shards without the need of 3rd party software before Solr. What I
would like to know if this is the road Solr will take to make things
work. Maybe there was some additional work done with distributed
indexing ?

-- 
Regards,
 Rafał Kuć



Re: Solr dismax scoring and weight

2011-11-09 Thread Erick Erickson
Length normalization is an attempt to factor in how long the field is. The idea
is that a token in a field with 10,000 tokens should count less than the word
in a field of 10 tokens. But since the length of the field is encoded
in a byte, the distinction between 4 and 20 characters is pretty much lost.

HTH
Erick

On Wed, Nov 9, 2011 at 3:59 AM, darul  wrote:
> Thanks for the details, but what do you mean by normalization, can you
> describe shortly the concepts behind ?
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-dismax-scoring-and-weight-tp3490096p3492986.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Weird: Solr Search result and Analysis Result not match?

2011-11-09 Thread Erick Erickson
Oh, one more thing. I wasn't suggesting that you *remove*
WordDelimiterFilterFactory from the query chain, just
that you should be more selective about the options. Look
at the differences in the options in the example schema for
a place to start

Best
Erick

On Wed, Nov 9, 2011 at 12:33 PM, Erick Erickson  wrote:
> Regarding <1>. Take a look at admin/analysis and see the tokenization just
> to check.
>
> Oh, and one more thing...
> putting  in front of 
> kind of defeats the purpose of WordDelimiterFilterFactory. One of the
> things WDDF does is split on case change and you're removing the case
> changes before WDDF gets hold of it.
>
> Best
> Erick
>
> On Tue, Nov 8, 2011 at 9:40 PM, Ellery Leung  wrote:
>> Thanks Erick, here are my responses:
>>
>> 1. Yes.  What I want to achieve is that when index is filtered with 
>> EdgeNgram, and a query that is not filtered in that way, I can do search on 
>> partial string.
>> 2. Good suggestion, will test it.
>> 3. ok
>> 4. Thank you
>> 5/6. Will remove the synonyms and word delimiterfilterfactory in query
>> 7. will look at that using Luke.  By the way, it is the first time I saw 
>> that there is a tool for that.  Thank you.
>> 8. Yes.
>>
>> Will check that again, thank you.
>>
>> -Original Message-
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: 2011年11月8日 9:52 下午
>> To: solr-user@lucene.apache.org; elleryle...@be-o.com
>> Subject: Re: Weird: Solr Search result and Analysis Result not match?
>>
>> Several things:
>>
>> 1> You don't have EdgeNGramFilterFactory in your query analysis chain,
>> is this intentional?
>> 2> You have a LOT of stuff going on here, you might try making your
>> analysis chain simpler and
>>     adding stuff back in until you see the error. Don't forget to re-index!
>> 3> Analysis doesn't take into account query *parsing*, so it's
>> possible to get a false sense of
>>     assurance when the analysis page matches your expectations.
>> 4> Even though nothing jumps out at me except the Edge factory,
>> nice job of including
>>     information.
>> 5> It's unusual to expand synonyms both at query and index time,
>> usually one or the
>>     other with index time preferred.
>> 6> Same with WordDelimiterFilterFactory. If you put all the variants
>> in the index, you don't
>>     need to put all the variants in the query and vice-versa.
>> 7> Take a look at your actual contents, perhaps using Luke to insure
>> that what you expect
>>      to be in your index actually is.
>> 8> You did re-index after your latest changes to your schema, right ?
>>
>> All of this is a way of saying that I don't quite see what the problem
>> is, but at least there are
>> some avenues to explore.
>>
>> Best
>> Erick
>>
>> On Mon, Nov 7, 2011 at 9:29 PM, Ellery Leung  wrote:
>>> Hi all.
>>>
>>>
>>>
>>> I am using Solr 3.4 under Win 7.
>>>
>>>
>>>
>>> In schema there is a multivalue field indexed in this way:
>>>
>>> ==
>>>
>>> Schema:
>>>
>>> ==
>>>
>>> >> stored="true" omitNorms="true"/>
>>>
>>>
>>>
>>> >> positionIncrementGap="100">
>>>
>>>        
>>>
>>>                >> mapping="../../filters/filter-mappings.txt"/>
>>>
>>>                
>>>
>>>                
>>>
>>>                
>>>
>>>                
>>>
>>>                >> synonyms="../../filters/filter-synonyms.txt" ignoreCase="true"
>>> expand="true"/>
>>>
>>>                
>>>
>>>                >> splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1"
>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>> catenateNumbers="1" catenateAll="0" preserveOriginal="1"/>
>>>
>>>                >> encoder="DoubleMetaphone" inject="true"/>
>>>
>>>                
>>>
>>>                >> maxGramSize="50" side="front"/>
>>>
>>>                
>>>
>>>        
>>>
>>>        
>>>
>>>                >> mapping="../../filters/filter-mappings.txt"/>
>>>
>>>                
>>>
>>>                
>>>
>>>                
>>>
>>>                
>>>
>>>                >> synonyms="../../filters/filter-synonyms.txt" ignoreCase="true"
>>> expand="true"/>
>>>
>>>                
>>>
>>>                >> splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1"
>>> generateWordParts="0" generateNumberParts="1" catenateWords="1"
>>> catenateNumbers="1" catenateAll="0" preserveOriginal="1"/>
>>>
>>>                >> encoder="DoubleMetaphone"/>
>>>
>>>                
>>>
>>>                
>>>
>>>        
>>>
>>> 
>>>
>>> ==
>>>
>>> Actual index:
>>>
>>> ==
>>>
>>> 
>>>
>>> 2284e2
>>>
>>> 2284e4
>>>
>>> 2284e5
>>>
>>> 1911e2
>>>
>>> 
>>>
>>>
>>>
>>> ==
>>>
>>> Question:
>>>
>>> ==
>>>
>>> Now when I do a search like this:
>>>
>>>
>>>
>>> myEvent:1911e2
>>>
>>>
>>>
>>> This should match the 4th item.  Now on "Full Interface", it does not return
>>> any result.  But on "analysis", matches are highlig

Importing Big Data From Berkeley DB to Solr

2011-11-09 Thread Carey Sublette
Hi:

I have a massive data repository (hundreds of millions of records) stored in 
Berkeley DB with Java code to access it, and I need an efficient method to 
import it into Solr for indexing. I cannot find a straightforward Java data 
import API that I can load the data with.

There is no JDBC for the DataImportHandler to call, it is not a simple file, 
and the inefficiencies (and extra code) needed to submit it as HTTP calls, or 
as XML feeds, etc. are measures of last resort only.

Can a call a Lucene API in a Solr installation to do this somehow?

Thanks


Out of memory, not during import or updates of the index

2011-11-09 Thread Steve Fatula
We get at rare times out of memory errors during the day. I know one reason for 
this is data imports, none are going on. I see in the wiki, document adds have 
some quirks, not doing that. I don't know to to expect for memory use though.

We had Solr running under Tomcat set to 2G ram. I presume cache size has an 
effect on memory, that's set to 30,000 for filter, document and queryResult. 
Have experimented with different sizes for a while, these limits are all lower 
than we used to have them set to. So, hoping there no sort of memory leak 
involved.

In any case, some of the messages are:

Exception in thread "http-8080-21" java.lang.OutOfMemoryError: Java heap space


Some look like this:

Exception in thread "http-8080-22" java.lang.NullPointerException
        at 
java.util.concurrent.ConcurrentLinkedQueue.offer(ConcurrentLinkedQueue.java:273)
...

I presume the null pointer is a result of being out of memory. 

Should Solr possibly need more than 2GB? What else can we tune that might 
reduce memory usage?

Re: Importing Big Data From Berkeley DB to Solr

2011-11-09 Thread Otis Gospodnetic
Carey,

Some options:
* Just read your BDB and use SolrJ to index to Solr in batches and in parallel
* Dump your BDB into csv format and use Solr's ability to import csv files fast
* Use Hadoop MapReduce to index to Lucene or Solr in parallel

Yes, you can index using Lucene APIs directly, but you will have to make sure 
all the analysis you specify there is identical to what you have in your Solr 
schema.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>
>From: Carey Sublette 
>To: "solr-user@lucene.apache.org" 
>Sent: Wednesday, November 9, 2011 3:06 PM
>Subject: Importing Big Data From Berkeley DB to Solr
>
>Hi:
>
>I have a massive data repository (hundreds of millions of records) stored in 
>Berkeley DB with Java code to access it, and I need an efficient method to 
>import it into Solr for indexing. I cannot find a straightforward Java data 
>import API that I can load the data with.
>
>There is no JDBC for the DataImportHandler to call, it is not a simple file, 
>and the inefficiencies (and extra code) needed to submit it as HTTP calls, or 
>as XML feeds, etc. are measures of last resort only.
>
>Can a call a Lucene API in a Solr installation to do this somehow?
>
>Thanks
>
>
>

Re: Out of memory, not during import or updates of the index

2011-11-09 Thread Otis Gospodnetic
Hi,

Some options:
* Yes, on the slave/search side you can reduce your cache sizes and lower the 
memory footprint.
* You can also turn off norms in various fields if you don't need that and save 
memory there.
* You can increase your Xmx

I don't know what version of Solr you have, but look through Lucene/Solr's 
CHANGES.txt to see if there were any changes that affect memory requirements 
since your version of Solr.

Otis


Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>
>From: Steve Fatula 
>To: "solr-user@lucene.apache.org" 
>Sent: Wednesday, November 9, 2011 3:33 PM
>Subject: Out of memory, not during import or updates of the index
>
>We get at rare times out of memory errors during the day. I know one reason 
>for this is data imports, none are going on. I see in the wiki, document adds 
>have some quirks, not doing that. I don't know to to expect for memory use 
>though.
>
>We had Solr running under Tomcat set to 2G ram. I presume cache size has an 
>effect on memory, that's set to 30,000 for filter, document and queryResult. 
>Have experimented with different sizes for a while, these limits are all lower 
>than we used to have them set to. So, hoping there no sort of memory leak 
>involved.
>
>In any case, some of the messages are:
>
>Exception in thread "http-8080-21" java.lang.OutOfMemoryError: Java heap space
>
>
>Some look like this:
>
>Exception in thread "http-8080-22" java.lang.NullPointerException
>        at 
>java.util.concurrent.ConcurrentLinkedQueue.offer(ConcurrentLinkedQueue.java:273)
>...
>
>I presume the null pointer is a result of being out of memory. 
>
>Should Solr possibly need more than 2GB? What else can we tune that might 
>reduce memory usage?
>
>

Re: Solr 4.0 indexing NoSuchMethodError

2011-11-09 Thread Frédéric Cons
The CodecUtil.writeHeader signature has changed from

public static DataOutput writeHeader(DataOutput out, String codec, int
version)

in lucene 3.4 (which is the method not found) to

public static void writeHeader(DataOutput out, String codec, int version)

in lucene 4.0

It means that while you're using solr 4.0, some 3.4 jars are stuck
somewhere in the java classpath. Obviously some code is looking for this
3.4 method.

If you're using the start.jar executable, you should have a look at your
system-wide classpath
If you're using tomcat (and that sounds plausible in this situation), you
should trash the "work" directory sub-folders of your tomcat installation;
and restart it. Tomcat unpacks war archives in this directory, and it may
have kept a 3.4 solr war deployed here.

2011/11/9 elisabeth benoit 

> Hello,
>
> I've just installed Solr 4.0, and I am getting an error when indexing.
>
> *GRAVE: java.lang.NoSuchMethodError:
>
> org.apache.lucene.util.CodecUtil.writeHeader(Lorg/apache/lucene/store/DataOutput;Ljava/lang/String;I)Lorg/apache/lucene/store/DataOutput;
>at org.apache.lucene.util.fst.FST.save(FST.java:311)*.
>
> Does anybody know what I've done wrong?
>
> Thanks,
> Elisabeth
>


Re: Out of memory, not during import or updates of the index

2011-11-09 Thread Steve Fatula
From: Otis Gospodnetic 
>To: "solr-user@lucene.apache.org" 
>Sent: Wednesday, November 9, 2011 2:51 PM
>Subject: Re: Out of memory, not during import or updates of the index
>
>Hi,
>
>Some options:
>* Yes, on the slave/search side you can reduce your cache sizes and lower the 
>memory footprint.
>* You can also turn off norms in various fields if you don't need that and 
>save memory there.
>* You can increase your Xmx
>
>I don't know what version of Solr you have, but look through Lucene/Solr's 
>CHANGES.txt to see if there were any changes that affect memory requirements 
>since your version of Solr.
>
>
>
>
>
>Using Solr 3.4.0. That changelog actually says it should reduce memory usage 
>for that version. We were on a much older version previously, 1.something.

Norms are off on all fields that it can be turned off on.

I'm just hoping this new version doesn't have any leaks. Does FastLRUCache vs 
LRUCache make any memory difference?

Anyway to stop an optimize?

2011-11-09 Thread Brendan Grainger
Hi,

Does anyone know if an optimize can be stopped once started?

Thanks


RE: Importing Big Data From Berkeley DB to Solr

2011-11-09 Thread Carey Sublette
Thanks Otis:

It looks like SolrJ is what I was looking for exactly, it is also nice to know 
that the csv implementation is fast as a fall back.

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Wednesday, November 09, 2011 12:48 PM
To: solr-user@lucene.apache.org
Subject: Re: Importing Big Data From Berkeley DB to Solr

Carey,

Some options:
* Just read your BDB and use SolrJ to index to Solr in batches and in parallel
* Dump your BDB into csv format and use Solr's ability to import csv files fast
* Use Hadoop MapReduce to index to Lucene or Solr in parallel

Yes, you can index using Lucene APIs directly, but you will have to make sure 
all the analysis you specify there is identical to what you have in your Solr 
schema.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>
>From: Carey Sublette 
>To: "solr-user@lucene.apache.org" 
>Sent: Wednesday, November 9, 2011 3:06 PM
>Subject: Importing Big Data From Berkeley DB to Solr
>
>Hi:
>
>I have a massive data repository (hundreds of millions of records) stored in 
>Berkeley DB with Java code to access it, and I need an efficient method to 
>import it into Solr for indexing. I cannot find a straightforward Java data 
>import API that I can load the data with.
>
>There is no JDBC for the DataImportHandler to call, it is not a simple file, 
>and the inefficiencies (and extra code) needed to submit it as HTTP calls, or 
>as XML feeds, etc. are measures of last resort only.
>
>Can a call a Lucene API in a Solr installation to do this somehow?
>
>Thanks
>
>
>


Unable to determine why query won't return results

2011-11-09 Thread Nordstrom, Kurt
Hello all.

I'm having an issue in regards to matching a quoted phrase in Solr, and I'm not 
certain what the issue at hand is.

I have tried this on both Solr 1.3 (Our production system) and 3.3 (Our 
development system).

The field is a text field, and has the following fieldType definition: 
http://pastebin.com/SkmmucUE

In the case where the search is failing, the field is indexed with the 
following value: A. J. Johnson & Co.

We are searching the field with the following string (in quotes): "A. J. 
Johnson & Co."

Unfortunately, we get a response of no results when searching the field in 
question with the above specified string. If we search merely for "A. J. 
Johnson" (with quotes), we get the desired result.  Using the full string, 
however, seems to cause the results not to match.

I have attempted to use Solr's analyzer (without success) to trace the problem. 
The results of this are here: http://pastehtml.com/view/bdgpdrt0w.html

Any suggestions?

Re: Anyway to stop an optimize?

2011-11-09 Thread Otis Gospodnetic
Don't think so, at least not gracefully.  You can always do partial optimize 
and do a few of them if you want to optimize in smaller steps.

Otis


>
>From: Brendan Grainger 
>To: solr-user@lucene.apache.org
>Sent: Wednesday, November 9, 2011 4:35 PM
>Subject: Anyway to stop an optimize?
>
>Hi,
>
>Does anyone know if an optimize can be stopped once started?
>
>Thanks
>
>
>

Re: Anyway to stop an optimize?

2011-11-09 Thread Walter Underwood
If you restart the server, the optimize should stop and not restart, right?

wunder

On Nov 9, 2011, at 7:43 PM, Otis Gospodnetic wrote:

> Don't think so, at least not gracefully.  You can always do partial optimize 
> and do a few of them if you want to optimize in smaller steps.
> 
> Otis
> 
> 
>> 
>> From: Brendan Grainger 
>> To: solr-user@lucene.apache.org
>> Sent: Wednesday, November 9, 2011 4:35 PM
>> Subject: Anyway to stop an optimize?
>> 
>> Hi,
>> 
>> Does anyone know if an optimize can be stopped once started?
>> 
>> Thanks
>> 
>> 






Re: Anyway to stop an optimize?

2011-11-09 Thread Brendan Grainger
I think in the past I've tried that, and it has restarted, although I will have 
to try it out (this time we were loath to stop it as we didn't want any index 
corruption issues). 

A related question is, why did the optimize start? I thought it had to be 
explicitly started, but somehow it started optimzing on it's own.

Thanks again
Brendan

On Nov 9, 2011, at 10:44 PM, Walter Underwood wrote:

> If you restart the server, the optimize should stop and not restart, right?
> 
> wunder
> 
> On Nov 9, 2011, at 7:43 PM, Otis Gospodnetic wrote:
> 
>> Don't think so, at least not gracefully.  You can always do partial optimize 
>> and do a few of them if you want to optimize in smaller steps.
>> 
>> Otis
>> 
>> 
>>> 
>>> From: Brendan Grainger 
>>> To: solr-user@lucene.apache.org
>>> Sent: Wednesday, November 9, 2011 4:35 PM
>>> Subject: Anyway to stop an optimize?
>>> 
>>> Hi,
>>> 
>>> Does anyone know if an optimize can be stopped once started?
>>> 
>>> Thanks
>>> 
>>> 
> 
> 
> 
> 



Re: Anyway to stop an optimize?

2011-11-09 Thread Walter Underwood
A restart during an optimize should not cause index corruption. The optimize 
only reads existing indexes, and the only writes are to indexes not yet in use. 
If it does not finish, those half-written indexes are junk to be cleaned up 
later.

wunder

On Nov 9, 2011, at 8:16 PM, Brendan Grainger wrote:

> I think in the past I've tried that, and it has restarted, although I will 
> have to try it out (this time we were loath to stop it as we didn't want any 
> index corruption issues). 
> 
> A related question is, why did the optimize start? I thought it had to be 
> explicitly started, but somehow it started optimzing on it's own.
> 
> Thanks again
> Brendan
> 
> On Nov 9, 2011, at 10:44 PM, Walter Underwood wrote:
> 
>> If you restart the server, the optimize should stop and not restart, right?
>> 
>> wunder
>> 
>> On Nov 9, 2011, at 7:43 PM, Otis Gospodnetic wrote:
>> 
>>> Don't think so, at least not gracefully.  You can always do partial 
>>> optimize and do a few of them if you want to optimize in smaller steps.
>>> 
>>> Otis
>>> 
>>> 
 
 From: Brendan Grainger 
 To: solr-user@lucene.apache.org
 Sent: Wednesday, November 9, 2011 4:35 PM
 Subject: Anyway to stop an optimize?
 
 Hi,
 
 Does anyone know if an optimize can be stopped once started?
 
 Thanks
 
 
>> 






Re: Anyway to stop an optimize?

2011-11-09 Thread Otis Gospodnetic
This is correct.
And there is no way I can think of optimize could just start on its own - 
somebody or something called it.

Otis


Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>
>From: Walter Underwood 
>To: solr-user@lucene.apache.org
>Sent: Wednesday, November 9, 2011 11:38 PM
>Subject: Re: Anyway to stop an optimize?
>
>A restart during an optimize should not cause index corruption. The optimize 
>only reads existing indexes, and the only writes are to indexes not yet in 
>use. If it does not finish, those half-written indexes are junk to be cleaned 
>up later.
>
>wunder
>
>On Nov 9, 2011, at 8:16 PM, Brendan Grainger wrote:
>
>> I think in the past I've tried that, and it has restarted, although I will 
>> have to try it out (this time we were loath to stop it as we didn't want any 
>> index corruption issues). 
>> 
>> A related question is, why did the optimize start? I thought it had to be 
>> explicitly started, but somehow it started optimzing on it's own.
>> 
>> Thanks again
>> Brendan
>> 
>> On Nov 9, 2011, at 10:44 PM, Walter Underwood wrote:
>> 
>>> If you restart the server, the optimize should stop and not restart, right?
>>> 
>>> wunder
>>> 
>>> On Nov 9, 2011, at 7:43 PM, Otis Gospodnetic wrote:
>>> 
 Don't think so, at least not gracefully.  You can always do partial 
 optimize and do a few of them if you want to optimize in smaller steps.
 
 Otis
 
 
> 
> From: Brendan Grainger 
> To: solr-user@lucene.apache.org
> Sent: Wednesday, November 9, 2011 4:35 PM
> Subject: Anyway to stop an optimize?
> 
> Hi,
> 
> Does anyone know if an optimize can be stopped once started?
> 
> Thanks
> 
> 
>>> 
>
>
>
>
>
>
>

Dynamic adding of shards

2011-11-09 Thread Ankita Patil
Hi,
One way to add new shards is to add them in the shard parameter in the
solrconfig.xml file. But this will require to restart the solr server
everytime you add a new shard.
I wanted to know if it is possible to dynamically add shards without having
to restart the solr server. If yes how?

Thanks in advance.

Ankita


abort processing query

2011-11-09 Thread Jason, Kim
Hi all
We have very complexed queries including wildcard.
That causes memory overhead.
Sometimes, memory is full and server doesn't response.
What I wonder, when query process time on server exceeds the time limit, can
I abort processing query?
If possible, how should I do?

Thanks in advance
Jason

--
View this message in context: 
http://lucene.472066.n3.nabble.com/abort-processing-query-tp3495876p3495876.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Aggregated indexing of updating RSS feeds

2011-11-09 Thread sbarriba
All,
Can anyone advise how to stop the "deleteAll" event during a full import?

I'm still unable to determine why repeat full imports seem to delete old
indexes. After investigation the logs confirm this - see "REMOVING ALL
DOCUMENTS FROM INDEX" below.

..but the request I'm making is..
/solr/myfeed?command=full-import&rows=5000&clean=false

..note the clean=false.

All help appreciated.
Shaun


INFO: [] webapp=/solr path=/myfeed params={command=full-import} status=0
QTime=8
10-Nov-2011 05:40:01 org.apache.solr.handler.dataimport.DataImporter
doFullImport
INFO: Starting Full Import
10-Nov-2011 05:40:01 org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
INFO: Read myfeed.properties
10-Nov-2011 05:40:01 org.apache.solr.update.DirectUpdateHandler2 deleteAll
INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
10-Nov-2011 05:40:05 org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select/
params={indent=on&start=0&q=description:one+direction&rows=10&version=2.2}
hits=0 status=0 QTime=1
10-Nov-2011 05:40:07 org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select/
params={indent=on&start=0&q=id:*23327977*&rows=10&version=2.2} hits=0
status=0 QTime=1
10-Nov-2011 05:40:08 org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=2000
   
commit{dir=/mnt/ebs1/data/index,segFN=segments_1x3,version=1319402557686,generation=2487,filenames=[_3u3.tii,
segments_1x3, _3u3.frq, _3u3.prx, _3u3.nrm, _3u3.fnm, _3u3.fdx, _3u3.tis,
_3u3.fdt]
   
commit{dir=/mnt/ebs1/data/index,segFN=segments_1x4,version=1319402557691,generation=2488,filenames=[_3u5.nrm,
_3u5.fnm, _3u5.fdx, segments_1x4, _3u5.tis, _3u5.prx, _3u5.frq, _3u5.tii,
_3u5.fdt]

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Aggregated-indexing-of-updating-RSS-feeds-tp3485335p3495882.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: abort processing query

2011-11-09 Thread Ahmet Arslan
> Hi all
> We have very complexed queries including wildcard.
> That causes memory overhead.
> Sometimes, memory is full and server doesn't response.
> What I wonder, when query process time on server exceeds
> the time limit, can
> I abort processing query?
> If possible, how should I do?

QueryComponent respects timeAllowed parameter.

http://wiki.apache.org/solr/CommonQueryParameters#timeAllowed


Re: About the total size of all files

2011-11-09 Thread Gora Mohanty
On Thu, Nov 10, 2011 at 12:39 PM, 刘浪  wrote:
>
> Hi,
>    The total size of all files can attach PB or TB?
>    If I use the only one solr core to index PB level files, how about the 
> time of searching? Can the time will be less 1 second?
>    If I use the only multi solr core to index PB level files, how about the 
> time of searching? Can the time will be less 1 second?

It is difficult to understand your question. What is PB, and TB?
Peta-byte/Tera-byte?

It is difficult to make predictions about indexing, and search times
without understanding the volume and complexity of your data.
The best way to get an estimate of this is to set up Solr, and
try this out, maybe on a subset of the data at first.

Regards,
Gora