Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters

2012-11-08 Thread Daniel Brügge
Yes, I did this and the Words with the Umlaute went through the Stopfilter.
The ones without Umlaute were correctly removed.

On Thu, Nov 8, 2012 at 2:22 AM, Lance Norskog goks...@gmail.com wrote:

 You can debug this with the 'Analysis' page in the Solr UI. You pick
 'text_general' and then give words with umlauts in the text box for
 indexing and queries.

 Lance

 - Original Message -
 | From: Daniel Brügge daniel.brue...@googlemail.com
 | To: solr-user@lucene.apache.org
 | Sent: Wednesday, November 7, 2012 8:45:45 AM
 | Subject: SolrCloud, Zookeeper and Stopwords with Umlaute or other
 special characters
 |
 | Hi,
 |
 | i am running a SolrCloud cluster with the 4.0.0 version. I have a
 | stopwords
 | file
 | which is in the correct encoding. It contains german Umlaute like
 | e.g. 'ü'.
 | I am
 | also running a standalone Zookeeper which contains this stopwords
 | file. In
 | my schema
 | i am using the stopwords file in the standard way:
 |
 | 
 |  fieldType name=text_general class=solr.TextField
 |  positionIncrementGap=100
 |analyzer type=index
 |  tokenizer class=solr.StandardTokenizerFactory/
 |  filter class=solr.StopFilterFactory
 |  ignoreCase=true
 |  words=my_stopwords.txt
 |  enablePositionIncrements=true /
 |
 |
 | When I am indexing i recognized, that all stopwords without Umlaute
 | are
 | correctly removed, but the ones with
 | Umlaute still exist.
 |
 | Is this a problem with ZK or Solr?
 |
 | Thanks  regards
 |
 | Daniel
 |



Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters

2012-11-08 Thread Daniel Brügge
When I look at the text_de fieldType provided in the example schema i can
see:


 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=lang/stopwords_de.txt format=snowball
 enablePositionIncrements=true/
 filter class=solr.GermanNormalizationFilterFactory/
 filter class=solr.GermanLightStemFilterFactory/


I have tried with this and this removed the words with Umlaute. It seems,
that is because of format=snowball. I haven't used this, because I though
I had one word per line. But maybe some invisible characters got into my
stopword file and destroyed it.

Thanks.

Daniel

On Thu, Nov 8, 2012 at 10:36 AM, Daniel Brügge 
daniel.brue...@googlemail.com wrote:

 Yes, I did this and the Words with the Umlaute went through the
 Stopfilter. The ones without Umlaute were correctly removed.

 On Thu, Nov 8, 2012 at 2:22 AM, Lance Norskog goks...@gmail.com wrote:

 You can debug this with the 'Analysis' page in the Solr UI. You pick
 'text_general' and then give words with umlauts in the text box for
 indexing and queries.

 Lance

 - Original Message -
 | From: Daniel Brügge daniel.brue...@googlemail.com
 | To: solr-user@lucene.apache.org
 | Sent: Wednesday, November 7, 2012 8:45:45 AM
 | Subject: SolrCloud, Zookeeper and Stopwords with Umlaute or other
 special characters
 |
 | Hi,
 |
 | i am running a SolrCloud cluster with the 4.0.0 version. I have a
 | stopwords
 | file
 | which is in the correct encoding. It contains german Umlaute like
 | e.g. 'ü'.
 | I am
 | also running a standalone Zookeeper which contains this stopwords
 | file. In
 | my schema
 | i am using the stopwords file in the standard way:
 |
 | 
 |  fieldType name=text_general class=solr.TextField
 |  positionIncrementGap=100
 |analyzer type=index
 |  tokenizer class=solr.StandardTokenizerFactory/
 |  filter class=solr.StopFilterFactory
 |  ignoreCase=true
 |  words=my_stopwords.txt
 |  enablePositionIncrements=true /
 |
 |
 | When I am indexing i recognized, that all stopwords without Umlaute
 | are
 | correctly removed, but the ones with
 | Umlaute still exist.
 |
 | Is this a problem with ZK or Solr?
 |
 | Thanks  regards
 |
 | Daniel
 |





Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters

2012-11-08 Thread Daniel Brügge
I trust the 'file' command output. And if i can read there UTF-8 Unicode
I believe that this is correct. Don't know if this is the 'correct answer'
for you ;)

BTW: It works locally, but not with ZK. So it's maybe more a ZK issue, which
somehow destroys my file. Will check.

On Thu, Nov 8, 2012 at 12:12 PM, Robert Muir rcm...@gmail.com wrote:

 On Wed, Nov 7, 2012 at 11:45 AM, Daniel Brügge
 daniel.brue...@googlemail.com wrote:
  Hi,
 
  i am running a SolrCloud cluster with the 4.0.0 version. I have a
 stopwords
  file
  which is in the correct encoding.

 What makes you think that?

 Note: Because I can read it is not the correct answer.

 Ensure any of your stopwords files etc are in UTF-8. This is often
 different from the encoding your computer uses by default if you open
 a file, start typing in it, and press save.



Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters

2012-11-08 Thread Daniel Brügge
Weird, if i return the file contents in ZK with 'get' it returns me

w??rde  |  would
w??rden |  would

for example. So the Umlaute are not shown. Does anyone have an idea if this
is because of Zookeepers cli or of the file contents itself?

Thanks  regards.

On Thu, Nov 8, 2012 at 12:24 PM, Daniel Brügge 
daniel.brue...@googlemail.com wrote:

 I trust the 'file' command output. And if i can read there UTF-8 Unicode
 I believe that this is correct. Don't know if this is the 'correct answer'
 for you ;)

 BTW: It works locally, but not with ZK. So it's maybe more a ZK issue,
 which
 somehow destroys my file. Will check.


 On Thu, Nov 8, 2012 at 12:12 PM, Robert Muir rcm...@gmail.com wrote:

 On Wed, Nov 7, 2012 at 11:45 AM, Daniel Brügge
 daniel.brue...@googlemail.com wrote:
  Hi,
 
  i am running a SolrCloud cluster with the 4.0.0 version. I have a
 stopwords
  file
  which is in the correct encoding.

 What makes you think that?

 Note: Because I can read it is not the correct answer.

 Ensure any of your stopwords files etc are in UTF-8. This is often
 different from the encoding your computer uses by default if you open
 a file, start typing in it, and press save.





Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters

2012-11-08 Thread Daniel Brügge
Ah, I have fixed it. It was necessary to import the files into Zookeeper
using the file.encoding system property and set it to UTF-8. Then it
worked. Hooray. :)

e.g.

java -Dfile.encoding=UTF-8 -Dbootstrap_confdir=/home/me/myconfdir
-Dcollection.configName=config1 -DzkHost=zkhost:2181 -DnumShards=2
-Dsolr.solr.home=/home/me/solr -jar start.jar



On Thu, Nov 8, 2012 at 2:09 PM, Daniel Brügge daniel.brue...@googlemail.com
 wrote:

 Weird, if i return the file contents in ZK with 'get' it returns me

 w??rde  |  would
 w??rden |  would

 for example. So the Umlaute are not shown. Does anyone have an idea if
 this is because of Zookeepers cli or of the file contents itself?

 Thanks  regards.

 On Thu, Nov 8, 2012 at 12:24 PM, Daniel Brügge 
 daniel.brue...@googlemail.com wrote:

 I trust the 'file' command output. And if i can read there UTF-8 Unicode
 I believe that this is correct. Don't know if this is the 'correct
 answer' for you ;)

 BTW: It works locally, but not with ZK. So it's maybe more a ZK issue,
 which
 somehow destroys my file. Will check.


 On Thu, Nov 8, 2012 at 12:12 PM, Robert Muir rcm...@gmail.com wrote:

 On Wed, Nov 7, 2012 at 11:45 AM, Daniel Brügge
 daniel.brue...@googlemail.com wrote:
  Hi,
 
  i am running a SolrCloud cluster with the 4.0.0 version. I have a
 stopwords
  file
  which is in the correct encoding.

 What makes you think that?

 Note: Because I can read it is not the correct answer.

 Ensure any of your stopwords files etc are in UTF-8. This is often
 different from the encoding your computer uses by default if you open
 a file, start typing in it, and press save.






SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters

2012-11-07 Thread Daniel Brügge
Hi,

i am running a SolrCloud cluster with the 4.0.0 version. I have a stopwords
file
which is in the correct encoding. It contains german Umlaute like e.g. 'ü'.
I am
also running a standalone Zookeeper which contains this stopwords file. In
my schema
i am using the stopwords file in the standard way:


 fieldType name=text_general class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=my_stopwords.txt
 enablePositionIncrements=true /


When I am indexing i recognized, that all stopwords without Umlaute are
correctly removed, but the ones with
Umlaute still exist.

Is this a problem with ZK or Solr?

Thanks  regards

Daniel


Solrcloud not reachable and after restart just a no servers hosting shard

2012-09-24 Thread Daniel Brügge
Hi,

I am running Solrcloud 4.0-BETA and during the weekend it 'crashed' somehow,
so  that it wasn't reachable. CPU load was 100%.

After a restart i couldn't access the data it just telled me:

no servers hosting shard

Is there a way to get the data back?

Thanks  regards

Daniel


Re: querying using filter query and lots of possible values

2012-07-28 Thread Daniel Brügge
Hi,

thanks for this hint. Will check this out. Sounds promising.

Daniel

On Sat, Jul 28, 2012 at 3:18 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : the list of IDs is constant for a longer time. I will take a look at
 : these join thematic.
 : Maybe another solution would be to really create a whole new
 : collection or set of documents containing the aggregated documents (from
 the
 : ids) from scratch and to execute queries on this collection. Then this
 : would take
 : some time, but maybe it's worth it because the querying will thank you.

 Another avenue to consider...


 http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/schema/ExternalFileField.html

 ...would allow you to map values in your source_id to some numeric
 values (many to many) and these numeric values would then be accessible in
 functions -- so you could use something like fq={!frange ...} to select
 all docs with value 67 where your extenral file field says that value 67
 is mapped ot the following thousand source_id values.

 the external field fields can then be modified at any time just by doing a
 commit on your index.



 -Hoss



Deduplication in SolrCloud

2012-07-27 Thread Daniel Brügge
Hi,

in my old Solr Setup I have used the deduplication feature in the update
chain
with couple of fields.

updateRequestProcessorChain name=dedupe
 processor class=solr.processor.SignatureUpdateProcessorFactory
bool name=enabledtrue/bool
 str name=signatureFieldsignature/str
bool name=overwriteDupesfalse/bool
 str name=fieldsuuid,type,url,content_hash/str
str
name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str
 /processor
processor class=solr.LogUpdateProcessorFactory /
 processor class=solr.RunUpdateProcessorFactory /
/updateRequestProcessorChain

This worked fine. When I now use this in my 2 shards SolrCloud setup when
inserting 150.000 documents,
I am always getting an error:

*INFO: end_commit_flush*
*Jul 27, 2012 3:29:36 PM org.apache.solr.common.SolrException log*
*SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError:
unable to create new native thread*
* at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456)
*
* at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:284)
*

I am inserting the documents via CSV import and curl command and split them
also into 50k chunks.

Without the dedupe chain, the import finishes after 40secs.

The curl command writes to one of my shards.


Do you have an idea why this happens? Should I reduce the fields to one? I
have read that not using the id as
dedupe fields could be an issue?


I have searched for deduplication with SolrCloud and I am wondering if it
is already working correctly? see e.g.
http://lucene.472066.n3.nabble.com/SolrCloud-deduplication-td3984657.html

Thanks  regards

Daniel


querying using filter query and lots of possible values

2012-07-26 Thread Daniel Brügge
Hi,

i am facing the following issue:

I have couple of million documents, which have a field called source_id.
My problem is, that I want to retrieve all the documents which have a
source_id
in a specific range of values. This range can be pretty big, so for example
a
list of 200 to 2000 source ids.

I was thinking that a filter query can be used like fq=source_id:(1 2 3 4 5
6 .)
but this reminds me of SQLs WHERE IN (...) which was always bit slow for a
huge
number of values.

Another solution that came into my mind was to assigned all the documents I
want to
retrieve a new kind of filter id. So all the documents which i want to
analyse
get a new id. But i need to update all the millions of documents for this
and assign
them a new id. This could take some time.

Do you can think of a nicer way to solve this issue?

Regards  greetings

Daniel


Re: querying using filter query and lots of possible values

2012-07-26 Thread Daniel Brügge
Hey Chantal,

thanks for your answer.

The range queries would not work, because they are not values in a row.
They can be randomly ordered with gaps. Above was just an example.

Excluding is also not a solution, because the list of excluded id would be
even longer.

To specify it even more. The IDs are not even integers, but UUIDs. And they
are tens of thousands. And the document pool contains hundreds of million
documents.

Thanks. Daniel



On Thu, Jul 26, 2012 at 6:22 PM, Chantal Ackermann 
c.ackerm...@it-agenten.com wrote:

 Hi Daniel,

 index the id into a field of type tint or tlong and use a range query (
 http://wiki.apache.org/solr/SolrQuerySyntax?highlight=%28rangequery%29):

 fq=id:[200 TO 2000]

 If you want to exclude certain ids it might be wiser to simply add an
 exclusion query in addition to the range query instead of listing all the
 single values. You will run into problems with too long request urls. If
 you cannot avoid long urls you might want to increase maxBooleanClauses
 (see http://wiki.apache.org/solr/SolrConfigXml/#The_Query_Section).

 Cheers,
 Chantal

 Am 26.07.2012 um 18:01 schrieb Daniel Brügge:

  Hi,
 
  i am facing the following issue:
 
  I have couple of million documents, which have a field called
 source_id.
  My problem is, that I want to retrieve all the documents which have a
  source_id
  in a specific range of values. This range can be pretty big, so for
 example
  a
  list of 200 to 2000 source ids.
 
  I was thinking that a filter query can be used like fq=source_id:(1 2 3
 4 5
  6 .)
  but this reminds me of SQLs WHERE IN (...) which was always bit slow for
 a
  huge
  number of values.
 
  Another solution that came into my mind was to assigned all the
 documents I
  want to
  retrieve a new kind of filter id. So all the documents which i want to
  analyse
  get a new id. But i need to update all the millions of documents for this
  and assign
  them a new id. This could take some time.
 
  Do you can think of a nicer way to solve this issue?
 
  Regards  greetings
 
  Daniel




Is it possible or wise to query multiple cores in parallel in SolrCloud

2012-07-26 Thread Daniel Brügge
Hi,

I am playing around with a SolrCloud setup (4 shards) and thousands of
cores.
I am thinking of executing queries on hundreds of cores like a distributed
query.

Is this possible at all from SolrCloud side. And is this wise?

Thanks  regards

Daniel


Re: querying using filter query and lots of possible values

2012-07-26 Thread Daniel Brügge
Thanks Alexandre,

the list of IDs is constant for a longer time. I will take a look at
these join thematic.
Maybe another solution would be to really create a whole new
collection or set of documents containing the aggregated documents (from the
ids) from scratch and to execute queries on this collection. Then this
would take
some time, but maybe it's worth it because the querying will thank you.

Daniel

On Thu, Jul 26, 2012 at 7:43 PM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 You can't update the original documents except by reindexing them, so
 no easy group assigment option.

 If you create this 'collection' once but query it multiple times, you
 may be able to use SOLR4 join with IDs being stored separately and
 joined on. Still not great because the performance is an issue when
 mapping on IDs:
 http://www.lucidimagination.com/blog/2012/06/20/solr-and-joins/ .

 If the list is some sort of combination of smaller lists - you could
 probably precompute (at index time) those fragments and do compound
 query over them.

 But if you have to query every time and the list is different every
 time, that could be complicated.

 Regards,
Alex.

 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Thu, Jul 26, 2012 at 12:01 PM, Daniel Brügge
 daniel.brue...@googlemail.com wrote:
  Hi,
 
  i am facing the following issue:
 
  I have couple of million documents, which have a field called
 source_id.
  My problem is, that I want to retrieve all the documents which have a
  source_id
  in a specific range of values. This range can be pretty big, so for
 example
  a
  list of 200 to 2000 source ids.
 
  I was thinking that a filter query can be used like fq=source_id:(1 2 3
 4 5
  6 .)
  but this reminds me of SQLs WHERE IN (...) which was always bit slow for
 a
  huge
  number of values.
 
  Another solution that came into my mind was to assigned all the
 documents I
  want to
  retrieve a new kind of filter id. So all the documents which i want to
  analyse
  get a new id. But i need to update all the millions of documents for this
  and assign
  them a new id. This could take some time.
 
  Do you can think of a nicer way to solve this issue?
 
  Regards  greetings
 
  Daniel



Re: querying using filter query and lots of possible values

2012-07-26 Thread Daniel Brügge
Exactly. Creating a new index from the aggregated documents is the plan
I described above. I don't really now, how long this will take for each
new index. Hopefully under 1 hour or so. That would be tolerable.

Thanks. Daniel

On Thu, Jul 26, 2012 at 8:47 PM, Chantal Ackermann 
c.ackerm...@it-agenten.com wrote:

 Hi Daniel,

 depending on how you decide on the list of ids, in the first place, you
 could also create a new index (core) and populate it with DIH which would
 select only documents from your main index (core) in this range of ids.
 When updating you could try a delta import.

 Of course, this is only worth the effort if that core would exist for some
 time - but you've written that the subset of ids is constant for a longer
 time.

 Just another idea on top ;-)
 Chantal


Re: separation of indexes to optimize facet queries without fulltext

2012-07-26 Thread Daniel Brügge
Hi Chris,

thanks for the answer.

the plan is that in lots of queries I just need faceted values and
don't even do a fulltext search.
And on the other hand I need the fulltext search for exactly one
task in my application, which is search documents and returning them.
Here no faceting at all is need, but only filtering with fields,
which i also use for the other queries.
So if 95% of the queries don't use the fulltext i thought it would
make sense to split them.

Your suggestion to have one main master index and several slave indexes
sounds promising. Is it possible to have this replication in SolrCloud e.g
with different kind of schemas etc?

Thanks. Daniel

On Thu, Jul 26, 2012 at 9:05 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : My thought was, that I could separate indexes. So for the facet queries
 : where I don't need
 : fulltext search (so also no indexed fulltext field) I can use a
 completely
 : new setup of a
 : sharded Solr which doesn't include the indexed fulltext, so the index is
 : kept small containing
 : just the few fields I have.
 :
 : And for the fulltext queries I have the current Solr configuration which
 : includes as mentioned
 : above all the fields incl. the index fulltext field.
 :
 : Is this a normal way of handling these requirements. That there are
 : different kind of
 : Solr configurations for the different needs? Because the huge redundancy

 It's definitley doable -- one thing i'm not clear on is why, if your
 faceting queries don't care about the full text, you would need to leave
 those small fields in your full index ... is your plan to do
 faceting and drill down using the smaller index, but then display docs
 resulting from those queries by using the same fq params when querying
 the full index ?

 if so then it should work, if not -- you may not need those fields in that
 index.

 In general there is nothing wrong with having multiple indexes to solve
 multiple usecases -- an index is usually an inverted denormalization of
 some structured source data designed for fast queries/retrieval.  If there
 are multiple distinct ways you want to query/retrieve data that don't lend
 themselves to the same denormalization, there's nothing wrong with
 multiple denormalizations.

 Something else to consider is an approach i've used many times: having a
 single index, but using special purpose replicas.  You can have a master
 index that you update at the rate of change, one set of slaves that are
 used for one type of query pattern (faceting on X, Y, and Z for example)
 and a differnet set of slaves that are used for a different query pattern
 (faceting on A, B, and C) so each set of slaves gets a higher cahce hit
 rate then if the queries were randomized across all machines

 -Hoss



separation of indexes to optimize facet queries without fulltext

2012-07-25 Thread Daniel Brügge
Hi,

I have currently one big sharded Solr setup storing couple of million
documents
with some 'small' fields and one fulltext field in each doc. The latter
blows up the index.
My thought was, that I could separate indexes. So for the facet queries
where I don't need
fulltext search (so also no indexed fulltext field) I can use a completely
new setup of a
sharded Solr which doesn't include the indexed fulltext, so the index is
kept small containing
just the few fields I have.

And for the fulltext queries I have the current Solr configuration which
includes as mentioned
above all the fields incl. the index fulltext field.

Is this a normal way of handling these requirements. That there are
different kind of
Solr configurations for the different needs? Because the huge redundancy
scares
me a bit. I will have the fields twice.

Thanks in advance  greetings

Daniel


Re: LockObtainFailedException after trying to create cores on second SolrCloud instance

2012-06-14 Thread Daniel Brügge
Will check later to use different data dirs for the core on
each instance.
But because each Solr sits in it's own openvz instance (virtual
server respectively) they should be totally separated. At least
from my point of understanding virtualization.

Will check and get back here...

Thanks.

On Wed, Jun 13, 2012 at 8:10 PM, Mark Miller markrmil...@gmail.com wrote:

 Thats an interesting data dir location: NativeFSLock@/home/myuser/
 data/index/write.lock

 Where are the other data dirs located? Are you sharing one drive or
 something? It looks like something already has a writer lock - are you sure
 another solr instance is not running somehow?

 On Wed, Jun 13, 2012 at 11:11 AM, Daniel Brügge 
 daniel.brue...@googlemail.com wrote:

  BTW: i am running the solr instances using -Xms512M -Xmx1024M
 
  so not so little memory.
 
  Daniel
 
  On Wed, Jun 13, 2012 at 4:28 PM, Daniel Brügge 
  daniel.brue...@googlemail.com wrote:
 
   Hi,
  
   am struggling around with creating multiple collections on a 4
 instances
   SolrCloud
   setup:
  
   I have 4 virtual OpenVZ instances, where I have installed SolrCloud on
   each and
   on one is also a standalone Zookeeper running.
  
   Loading the Solr configuration into ZK works fine.
  
   Then I startup the 4 instances and everything is also running smoothly.
  
   After that I am adding one core with the name e.g. '123'.
  
   This core is correctly visible on the instance I have used for creating
   it.
  
   it maps like
  
   '123'  shard1 - virtual-instance-1
  
  
   After that I am creating a core with the same name '123' on the second
   instance and it
   creates it, but an exception is thrown after some while and the cluster
   state of
   the newly created core goes to 'recovering'
  
  
 *123:{shard1:{
 virtual-instance-1:8983_solr_123:{
   shard:shard1,
   roles:null,
   leader:true,
   state:active,
   core:123,
   collection:123,
   node_name:virtual-instance-1:8983_solr,
   base_url:http://virtual-instance-1:8983/solr},
 **virtual-instance-2**:8983_solr_123:{*
   *shard:shard1,
   roles:null,
   state:recovering,
   core:123,
   collection:123,
   node_name:virtual-instance-2:8983_solr,
   base_url:http://virtual-instance-2:8983/solr}}},*
  
  
   The exception throws is on the first virtual instance:
  
   *Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log*
   *SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock
   obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock*
   * at org.apache.lucene.store.Lock.obtain(Lock.java:84)*
   * at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:607)*
   * at
   org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:58)*
   * at
  
 
 org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112)
   *
   * at
  
 
 org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52)
   *
   * at
  
 
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364)
   *
   * at
  
 
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82)
   *
   * at
  
 
 org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
   *
   * at
  
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919)
   *
   * at
  
 
 org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
   *
   * at
  
 
 org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
   *
   * at
  
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
   *
   * at
  
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
   *
   * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)*
   * at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
   *
   * at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
   *
   * at
  
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
   *
   * at
  
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
   *
   * at
  
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
   *
   * at
  
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
   *
   * at
  
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
   *
   * at
  
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
   *
   * at
  
 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413

Re: LockObtainFailedException after trying to create cores on second SolrCloud instance

2012-06-14 Thread Daniel Brügge
OK, I think I have found it. I provided when starting the 4 solr instances
via start.jar always the data directory property via

*-Dsolr.data.dir=/home/myuser/data
*
After removing this it worked fine. What is weird is, that all 4 instances
are totally separated, so that instance-2 should never conflict with
instance-1. they could also be on totally different physical servers.

Thanks. Daniel

On Wed, Jun 13, 2012 at 8:10 PM, Mark Miller markrmil...@gmail.com wrote:

 Thats an interesting data dir location: NativeFSLock@/home/myuser/
 data/index/write.lock

 Where are the other data dirs located? Are you sharing one drive or
 something? It looks like something already has a writer lock - are you sure
 another solr instance is not running somehow?

 On Wed, Jun 13, 2012 at 11:11 AM, Daniel Brügge 
 daniel.brue...@googlemail.com wrote:

  BTW: i am running the solr instances using -Xms512M -Xmx1024M
 
  so not so little memory.
 
  Daniel
 
  On Wed, Jun 13, 2012 at 4:28 PM, Daniel Brügge 
  daniel.brue...@googlemail.com wrote:
 
   Hi,
  
   am struggling around with creating multiple collections on a 4
 instances
   SolrCloud
   setup:
  
   I have 4 virtual OpenVZ instances, where I have installed SolrCloud on
   each and
   on one is also a standalone Zookeeper running.
  
   Loading the Solr configuration into ZK works fine.
  
   Then I startup the 4 instances and everything is also running smoothly.
  
   After that I am adding one core with the name e.g. '123'.
  
   This core is correctly visible on the instance I have used for creating
   it.
  
   it maps like
  
   '123'  shard1 - virtual-instance-1
  
  
   After that I am creating a core with the same name '123' on the second
   instance and it
   creates it, but an exception is thrown after some while and the cluster
   state of
   the newly created core goes to 'recovering'
  
  
 *123:{shard1:{
 virtual-instance-1:8983_solr_123:{
   shard:shard1,
   roles:null,
   leader:true,
   state:active,
   core:123,
   collection:123,
   node_name:virtual-instance-1:8983_solr,
   base_url:http://virtual-instance-1:8983/solr},
 **virtual-instance-2**:8983_solr_123:{*
   *shard:shard1,
   roles:null,
   state:recovering,
   core:123,
   collection:123,
   node_name:virtual-instance-2:8983_solr,
   base_url:http://virtual-instance-2:8983/solr}}},*
  
  
   The exception throws is on the first virtual instance:
  
   *Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log*
   *SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock
   obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock*
   * at org.apache.lucene.store.Lock.obtain(Lock.java:84)*
   * at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:607)*
   * at
   org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:58)*
   * at
  
 
 org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112)
   *
   * at
  
 
 org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52)
   *
   * at
  
 
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364)
   *
   * at
  
 
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82)
   *
   * at
  
 
 org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
   *
   * at
  
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919)
   *
   * at
  
 
 org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
   *
   * at
  
 
 org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
   *
   * at
  
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
   *
   * at
  
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
   *
   * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)*
   * at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
   *
   * at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
   *
   * at
  
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
   *
   * at
  
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
   *
   * at
  
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
   *
   * at
  
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
   *
   * at
  
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
   *
   * at
  
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065

Re: LockObtainFailedException after trying to create cores on second SolrCloud instance

2012-06-14 Thread Daniel Brügge
Aha, OK. That was new to me. Will check this. Thanks.

On Thu, Jun 14, 2012 at 3:52 PM, Yury Kats yuryk...@yahoo.com wrote:

 On 6/14/2012 2:05 AM, Daniel Brügge wrote:
  Will check later to use different data dirs for the core on
  each instance.
  But because each Solr sits in it's own openvz instance (virtual
  server respectively) they should be totally separated. At least
  from my point of understanding virtualization.

 Depending on how your VMs are configured, their filesystems could
 be mapped to the same place of the host's filesystem. What you describe
 sounds like this is the case.



LockObtainFailedException after trying to create cores on second SolrCloud instance

2012-06-13 Thread Daniel Brügge
Hi,

am struggling around with creating multiple collections on a 4 instances
SolrCloud
setup:

I have 4 virtual OpenVZ instances, where I have installed SolrCloud on each
and
on one is also a standalone Zookeeper running.

Loading the Solr configuration into ZK works fine.

Then I startup the 4 instances and everything is also running smoothly.

After that I am adding one core with the name e.g. '123'.

This core is correctly visible on the instance I have used for creating it.

it maps like

'123'  shard1 - virtual-instance-1


After that I am creating a core with the same name '123' on the second
instance and it
creates it, but an exception is thrown after some while and the cluster
state of
the newly created core goes to 'recovering'


  *123:{shard1:{
  virtual-instance-1:8983_solr_123:{
shard:shard1,
roles:null,
leader:true,
state:active,
core:123,
collection:123,
node_name:virtual-instance-1:8983_solr,
base_url:http://virtual-instance-1:8983/solr},
  **virtual-instance-2**:8983_solr_123:{*
*shard:shard1,
roles:null,
state:recovering,
core:123,
collection:123,
node_name:virtual-instance-2:8983_solr,
base_url:http://virtual-instance-2:8983/solr}}},*


The exception throws is on the first virtual instance:

*Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log*
*SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock
obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock*
* at org.apache.lucene.store.Lock.obtain(Lock.java:84)*
* at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:607)*
* at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:58)*
* at
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112)
*
* at
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52)
*
* at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364)
*
* at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82)
*
* at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
*
* at
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919)
*
* at
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
*
* at
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
*
* at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
*
* at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
*
* at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)*
* at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
*
* at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
*
* at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
*
* at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)*
* at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
*
* at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)*
* at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
*
* at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
*
* at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)*
* at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
*
* at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
*
* at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
*
* at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
*
* at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
*
* at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
*
* at org.eclipse.jetty.server.Server.handle(Server.java:351)*
* at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
*
* at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
*
* at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
*
* at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
*
* at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)*
* at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)*
* at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
*
* at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)

Re: LockObtainFailedException after trying to create cores on second SolrCloud instance

2012-06-13 Thread Daniel Brügge
BTW: i am running the solr instances using -Xms512M -Xmx1024M

so not so little memory.

Daniel

On Wed, Jun 13, 2012 at 4:28 PM, Daniel Brügge 
daniel.brue...@googlemail.com wrote:

 Hi,

 am struggling around with creating multiple collections on a 4 instances
 SolrCloud
 setup:

 I have 4 virtual OpenVZ instances, where I have installed SolrCloud on
 each and
 on one is also a standalone Zookeeper running.

 Loading the Solr configuration into ZK works fine.

 Then I startup the 4 instances and everything is also running smoothly.

 After that I am adding one core with the name e.g. '123'.

 This core is correctly visible on the instance I have used for creating
 it.

 it maps like

 '123'  shard1 - virtual-instance-1


 After that I am creating a core with the same name '123' on the second
 instance and it
 creates it, but an exception is thrown after some while and the cluster
 state of
 the newly created core goes to 'recovering'


   *123:{shard1:{
   virtual-instance-1:8983_solr_123:{
 shard:shard1,
 roles:null,
 leader:true,
 state:active,
 core:123,
 collection:123,
 node_name:virtual-instance-1:8983_solr,
 base_url:http://virtual-instance-1:8983/solr},
   **virtual-instance-2**:8983_solr_123:{*
 *shard:shard1,
 roles:null,
 state:recovering,
 core:123,
 collection:123,
 node_name:virtual-instance-2:8983_solr,
 base_url:http://virtual-instance-2:8983/solr}}},*


 The exception throws is on the first virtual instance:

 *Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log*
 *SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock
 obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock*
 * at org.apache.lucene.store.Lock.obtain(Lock.java:84)*
 * at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:607)*
 * at
 org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:58)*
 * at
 org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112)
 *
 * at
 org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52)
 *
 * at
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364)
 *
 * at
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82)
 *
 * at
 org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
 *
 * at
 org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919)
 *
 * at
 org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
 *
 * at
 org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
 *
 * at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
 *
 * at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 *
 * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)*
 * at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
 *
 * at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
 *
 * at
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
 *
 * at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
 *
 * at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
 *
 * at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
 *
 * at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
 *
 * at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
 *
 * at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)*
 * at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
 *
 * at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
 *
 * at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
 *
 * at
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
 *
 * at
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
 *
 * at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
 *
 * at org.eclipse.jetty.server.Server.handle(Server.java:351)*
 * at
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
 *
 * at
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
 *
 * at
 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
 *
 * at
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954

Re: shard distribution of multiple collections in SolrCloud

2012-05-24 Thread Daniel Brügge
Ok, thanks a lot, good to know.

BTW: The speed of creating a collections is not the fastest - at least here
on this server I use (approx. second), but this is normal right?

On Wed, May 23, 2012 at 9:28 PM, Mark Miller markrmil...@gmail.com wrote:

 Yeah, currently you have to create the core on each node...we are working
 on a 'collections' api that will make this a simple one call operation.

 We should have this soon.

 - Mark

 On May 23, 2012, at 2:36 PM, Daniel Brügge wrote:

  Hi,
 
  i am creating several cores using the following script. I use this for
  testing SolrCloud and to learn about the distribution of multiple
  collections.
 
  max=500
  for ((i=2; i=$max; ++i )) ;
  do
 curl 
 
 http://solrinstance1:8983/solr/admin/cores?action=CREATEname=collection$icollection=collection$icollection.configName=myconfig
  
  done
 
 
  I've setup a SolrCloud with 2 shards which are each replicated by 2 other
  instances I start.
 
  When I first start the installation I have the default collection1 in
  place which is sharded over shard1 and shard2 with 2 leader nodes and 2
  nodes which replicate the leaders.
 
  When I run this script above which calls the Coreadmin on one of the
  shards, all the collections are created on only this shard without a
  replica. So e.g.
 
 
  collection8:{shard1:{solrinstance1:8983_solr_collection8:{
 shard:shard1,
 leader:true,
 state:active,
 core:collection8,
 collection:collection8,
 node_name:solrinstance1:8983_solr,
 
 base_url:http://solrinstance1:8983/solr}}}
 
 
  I always thought, that via zookeeper these collections are sharded and
  replicated or do I need to call on each node the create core action? But
  then I need to know about these nodes, right?
 
 
  Thanks  regards
 
  Daniel

 - Mark Miller
 lucidimagination.com














shard distribution of multiple collections in SolrCloud

2012-05-23 Thread Daniel Brügge
Hi,

i am creating several cores using the following script. I use this for
testing SolrCloud and to learn about the distribution of multiple
collections.

max=500
 for ((i=2; i=$max; ++i )) ;
 do
 curl 
 http://solrinstance1:8983/solr/admin/cores?action=CREATEname=collection$icollection=collection$icollection.configName=myconfig
 
 done


I've setup a SolrCloud with 2 shards which are each replicated by 2 other
instances I start.

When I first start the installation I have the default collection1 in
place which is sharded over shard1 and shard2 with 2 leader nodes and 2
nodes which replicate the leaders.

When I run this script above which calls the Coreadmin on one of the
shards, all the collections are created on only this shard without a
replica. So e.g.


collection8:{shard1:{solrinstance1:8983_solr_collection8:{
shard:shard1,
leader:true,
state:active,
core:collection8,
collection:collection8,
node_name:solrinstance1:8983_solr,

base_url:http://solrinstance1:8983/solr}}}


I always thought, that via zookeeper these collections are sharded and
replicated or do I need to call on each node the create core action? But
then I need to know about these nodes, right?


Thanks  regards

Daniel


Re: CloudSolrServer not working with standalone Zookeeper

2012-05-21 Thread Daniel Brügge
Ok, it seems that a maven dependency to zookeeper version 3.3 broke this.
Now it connects to the zk instance.

Thanks.

On Mon, May 21, 2012 at 5:31 PM, Daniel Brügge 
daniel.brue...@googlemail.com wrote:

 Thanks for your feedback. I don't know.

 I've tried just now with the newest trunk version and the embedded ZK on
 port 9983.

 In the logs of the zk-solr it shows:

 *INFO: Accepted socket connection from /XXX.XXX.XXX.XXX:1055*
 *May 21, 2012 3:27:34 PM org.apache.zookeeper.server.NIOServerCnxn doIO*
 *WARNING: EndOfStreamException: Unable to read additional data from
 client sessionid 0x0, likely client has closed socket*
 *May 21, 2012 3:27:34 PM org.apache.zookeeper.server.NIOServerCnxn
 closeSock*
 *INFO: Closed socket connection for client /XXX.XXX.XXX.XXX:1055 (no
 session established for client)*


 So it can definitely connects to the port in my opinion, but it closes the
 connection after the defined timeout (here 1ms)

 *Caused by: java.util.concurrent.TimeoutException: Could not connect to
 ZooKeeper MYZKHOST.:9983 within 1 m*

 Hmm. I also thought that this trivial setup should work. Will check again.

 Daniel

 On Fri, May 18, 2012 at 4:23 PM, Mark Miller markrmil...@gmail.comwrote:

 Seems something is stopping the connection from occurring? Tests are
 constantly running and doing this using an embedded zk server - and I know
 more than a few people using an external zk setup. I'd have to guess
 something in your env or URL is causing this?


 On May 16, 2012, at 3:11 PM, Daniel Brügge wrote:

  OK, it's also not working with an internal started Zookeeper.
 
  On Wed, May 16, 2012 at 8:29 PM, Daniel Brügge 
  daniel.brue...@googlemail.com wrote:
 
  Hi,
 
  I am just playing around with SolrCloud and have read in articles like
 
 
 http://www.lucidimagination.com/blog/2012/03/05/scaling-solr-indexing-with-solrcloud-hadoop-and-behemoth/thatit
  is sufficient to create the connection to the Zookeeper instance and
 not
  to the Solr instance.
  When I try to connect to my standalone  Zookeeper instance (not started
  with a Solr instance and -DzkRun) I am getting this error:
 
  Caused by: java.util.concurrent.TimeoutException: Could not connect to
  ZooKeeper
 
 
  I am also getting this error when I try to connect directly to one of
 the
  Solr instances.
 
  My code looks like this:
 
 solr = new CloudSolrServer(myzkhost:2181);
 ((CloudSolrServer)
 solr).setDefaultCollection(collection1);
 
  I am working with the latest Solr trunk version (
  https://builds.apache.org/view/S-Z/view/Solr/job/Solr-trunk/1855/)
 
  Do I need to start the zookeeper in Solr to keep this working?
 
  Thanks  regards
 
  Daniel
 

 - Mark Miller
 lucidimagination.com















CloudSolrServer not working with standalone Zookeeper

2012-05-16 Thread Daniel Brügge
Hi,

I am just playing around with SolrCloud and have read in articles like
http://www.lucidimagination.com/blog/2012/03/05/scaling-solr-indexing-with-solrcloud-hadoop-and-behemoth/that
it
is sufficient to create the connection to the Zookeeper instance and not to
the Solr instance.
When I try to connect to my standalone  Zookeeper instance (not started
with a Solr instance and -DzkRun) I am getting this error:

Caused by: java.util.concurrent.TimeoutException: Could not connect to
 ZooKeeper


I am also getting this error when I try to connect directly to one of the
Solr instances.

My code looks like this:

solr = new CloudSolrServer(myzkhost:2181);
((CloudSolrServer) solr).setDefaultCollection(collection1);

I am working with the latest Solr trunk version (
https://builds.apache.org/view/S-Z/view/Solr/job/Solr-trunk/1855/)

Do I need to start the zookeeper in Solr to keep this working?

Thanks  regards

Daniel


Re: CloudSolrServer not working with standalone Zookeeper

2012-05-16 Thread Daniel Brügge
OK, it's also not working with an internal started Zookeeper.

On Wed, May 16, 2012 at 8:29 PM, Daniel Brügge 
daniel.brue...@googlemail.com wrote:

 Hi,

 I am just playing around with SolrCloud and have read in articles like

 http://www.lucidimagination.com/blog/2012/03/05/scaling-solr-indexing-with-solrcloud-hadoop-and-behemoth/that
  it
 is sufficient to create the connection to the Zookeeper instance and not
 to the Solr instance.
 When I try to connect to my standalone  Zookeeper instance (not started
 with a Solr instance and -DzkRun) I am getting this error:

 Caused by: java.util.concurrent.TimeoutException: Could not connect to
 ZooKeeper


 I am also getting this error when I try to connect directly to one of the
 Solr instances.

 My code looks like this:

 solr = new CloudSolrServer(myzkhost:2181);
 ((CloudSolrServer) solr).setDefaultCollection(collection1);

 I am working with the latest Solr trunk version (
 https://builds.apache.org/view/S-Z/view/Solr/job/Solr-trunk/1855/)

 Do I need to start the zookeeper in Solr to keep this working?

 Thanks  regards

 Daniel



Filter facet_fields with Solr similar to stopwords

2012-03-06 Thread Daniel Brügge
Hi,

I am using a solr.StopFilterFactory in a query filter for a text_general
field (here: content). It works fine, when I query the field for the
stopword, then I am getting no results.

But I am also doing a facet.field=content call to get the words which are
used in the text. What I am trying to achieve is, to also filter the
stopwords from the facet_fields, but it's not working. It would only work
if the stopwords are also used during the indexing of the text_general
field, right?

The problem here is, that it's too much data to re-index every time I add a
new stopword.

My current solution is to 'filter' with code after retrieving the
facet_fields from Solr. But is there a Solr-based way to do this niftier?

Thanks  regards

Daniel


Re: solr out of memory

2012-03-06 Thread Daniel Brügge
Maybe the index is to big and you need to add more memory to the JVM via
the -Xmx parameter. See also
http://wiki.apache.org/solr/SolrPerformanceFactors#OutOfMemoryErrors

Daniel

On Tue, Mar 6, 2012 at 10:01 AM, C.Yunqin 345804...@qq.com wrote:

 sometimes when i search  a simple  word ,like id: chenm
 the solr report eror:
 SEVERE: java.lang.OutOfMemoryError: Java heap space


 i do not know why?
 sometime the query goes on well.
 anyone have an ideal of that?


 thanks a lot


Re: Filter facet_fields with Solr similar to stopwords

2012-03-06 Thread Daniel Brügge
OK, I've found this posting from 2009:

http://lucene.472066.n3.nabble.com/excluding-certain-terms-from-facet-counts-when-faceting-based-on-indexed-terms-of-a-field-td501104.html

But this

facet.field={!terms=WORDTOEXCLUDE}content

approach also only shows me the counting of the word I want to exclude.

On Tue, Mar 6, 2012 at 11:33 AM, Daniel Brügge 
daniel.brue...@googlemail.com wrote:

 Hi,

 I am using a solr.StopFilterFactory in a query filter for a text_general
 field (here: content). It works fine, when I query the field for the
 stopword, then I am getting no results.

 But I am also doing a facet.field=content call to get the words which are
 used in the text. What I am trying to achieve is, to also filter the
 stopwords from the facet_fields, but it's not working. It would only work
 if the stopwords are also used during the indexing of the text_general
 field, right?

 The problem here is, that it's too much data to re-index every time I add
 a new stopword.

 My current solution is to 'filter' with code after retrieving the
 facet_fields from Solr. But is there a Solr-based way to do this niftier?

 Thanks  regards

 Daniel



Re: Index-Analyzer on Master with StopFilterFactory and Query-Analyzer on Slave with StopFilterFactory

2012-01-31 Thread Daniel Brügge
OK, thanks Erick. Then I won't touch it. I was just wondering, if it would
make sense. But on the other hand the schema.xml is also replicated in my
setup, so maybe it's really confusing.

Thanks

Daniel

On Tue, Jan 31, 2012 at 3:07 PM, Erick Erickson erickerick...@gmail.comwrote:

 I think it would be easy to get confused about what
 was where, resulting in hard-to-track bugs because
 the config file wasn't what you were expecting. I also
 don't understand why you think this is desirable.
 There might be an infinitesimal savings in memory,
 due to not instantiating one analysis chain, but I'm not
 even sure about that.

 The savings is so tiny that the increased risk of
 messing up seems far too high a price to pay.

 Best
 Erick

 On Mon, Jan 30, 2012 at 11:44 AM, Daniel Brügge
 daniel.brue...@googlemail.com wrote:
  Hi,
 
  I am using a 'text_general' fieldType (class = solr.TextField) in my
  schema. And I have a master/slave setup,
  where I index on the master and read from the slaves. In the text_general
  field I am using 2 analyzers. One for
  indexing and one for querying with stopword-filters.
 
  What I am thinking is if it would make sense to have a different schema
 on
  the master than on the slave? So just the
  index-analyzer on the master's schema and the query-analyzer on the
 slave's
  schema?
 
 
  fieldType name=text_general class=solr.TextField
  positionIncrementGap=100
   analyzer type=index
  tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt, stopwords_en.txt enablePositionIncrements=true
 /
   filter class=solr.LowerCaseFilterFactory/
  /analyzer
   analyzer type=query
  tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt, stopwords_en.txt enablePositionIncrements=true
 /
   filter class=solr.LowerCaseFilterFactory/
  /analyzer
   /fieldType
 
  What do you think?
 
  Thanks  best regards
 
  Daniel



Index-Analyzer on Master with StopFilterFactory and Query-Analyzer on Slave with StopFilterFactory

2012-01-30 Thread Daniel Brügge
Hi,

I am using a 'text_general' fieldType (class = solr.TextField) in my
schema. And I have a master/slave setup,
where I index on the master and read from the slaves. In the text_general
field I am using 2 analyzers. One for
indexing and one for querying with stopword-filters.

What I am thinking is if it would make sense to have a different schema on
the master than on the slave? So just the
index-analyzer on the master's schema and the query-analyzer on the slave's
schema?


fieldType name=text_general class=solr.TextField
positionIncrementGap=100
 analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt, stopwords_en.txt enablePositionIncrements=true /
 filter class=solr.LowerCaseFilterFactory/
/analyzer
 analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt, stopwords_en.txt enablePositionIncrements=true /
 filter class=solr.LowerCaseFilterFactory/
/analyzer
 /fieldType

What do you think?

Thanks  best regards

Daniel


Re: Does it make sense to configure newSearcher and firstSearcher on a Solr Master instance?

2012-01-21 Thread Daniel Brügge
Thanks Erick,

i think i then will try to setup 6 slaves and configure them nicely.

Daniel

On Fri, Jan 20, 2012 at 5:38 PM, Erick Erickson erickerick...@gmail.comwrote:

 There will be some increase pressure on your resources when replication
 happens to the slaves. That said, you can also allocate resources
 differently
 between the two. For instance, you do not need any memory for the RAMBuffer
 on the slaves since you're not indexing. On the master, you don't need any
 caches to speak of (e.g. filterCache) because you're not doing any
 searching.

 And let's assume you're pushing structured documents, e.g. Word or PDF docs
 to Solr. Those are resource-intensive things to parse and index so wouldn't
 compete with search requests on the slaves.

 Having an index big enough that it requires sharding is almost a sure sign
 that
 trying to index and search on the same box containing the shard is going
 to cause trouble.

 Bottom line: you have many fewer options for tuning the search process.
 Usually
 people only have relatively small indexes on a single box for both
 searching and
 indexing.

 Best
 Erick

 On Fri, Jan 20, 2012 at 5:45 AM, Daniel Brügge
 daniel.brue...@googlemail.com wrote:
  Erick,
 
  yes, currently I have 6 shards, which accept writes and reads. Sometimes
 I
  delete data from all 6 and try to balance them, fill them up
 respectively,
  so they have approx. the same amount of data on it. So all 6 are 'in
  motion' somehow. I would like that the writing would take place more
 often
  than now, but after a write the querying slows down, so I reduce writing
 to
  every n hours.
 
  So I've thought maybe it would make sense to add 6 slave shards. But
 what I
  don't know is, if the slave-shards also suffer after a replication and
 the
  querying will take some time too. I had a master/slave setup before, but
  without sharding. So only one big master and one slave. And after a
  replication it took couple of minutes to get a proper performance.
 
  Daniel
 
 
 
 
  On Fri, Jan 20, 2012 at 3:05 AM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  It's generally recommended that you do the indexing on the master
  and searches on the slaves. In that case, firstSearcher and
  newSearcher sections are irrelevant on the master and shouldn't
  be there.
 
  I don't understand why you would need 5 more machines, are you
  sharding?
 
  Best
  Erick
 
  On Thu, Jan 19, 2012 at 7:25 AM, Daniel Brügge
  daniel.brue...@googlemail.com wrote:
   Hi,
  
   I am currently running multiple Solr instances and often write data to
   them. I also query them. Both works fine right now, because I don't
 have
  so
   many search requests. For querying I recognized that the firstSearcher
  and
   newSearcher static warming with one facet query really brings a
  performance
   boost. But the downside is, that writing now is really slow.
  
   Does it make sense at all to place firstSearcher and newSearcher on a
  Solr
   server, which get lot's of writes. Or is the best strategy to
 introduce
   some slave server, where these event listeners are integrated, but to
  keep
   them away from the master?
  
   The thing is, that I would need 6 additional Solr slaves, if I would
 pick
   this approach. :)
  
   What do you think?
  
   Thanks.
   Daniel
 



Re: Does it make sense to configure newSearcher and firstSearcher on a Solr Master instance?

2012-01-20 Thread Daniel Brügge
Erick,

yes, currently I have 6 shards, which accept writes and reads. Sometimes I
delete data from all 6 and try to balance them, fill them up respectively,
so they have approx. the same amount of data on it. So all 6 are 'in
motion' somehow. I would like that the writing would take place more often
than now, but after a write the querying slows down, so I reduce writing to
every n hours.

So I've thought maybe it would make sense to add 6 slave shards. But what I
don't know is, if the slave-shards also suffer after a replication and the
querying will take some time too. I had a master/slave setup before, but
without sharding. So only one big master and one slave. And after a
replication it took couple of minutes to get a proper performance.

Daniel




On Fri, Jan 20, 2012 at 3:05 AM, Erick Erickson erickerick...@gmail.comwrote:

 It's generally recommended that you do the indexing on the master
 and searches on the slaves. In that case, firstSearcher and
 newSearcher sections are irrelevant on the master and shouldn't
 be there.

 I don't understand why you would need 5 more machines, are you
 sharding?

 Best
 Erick

 On Thu, Jan 19, 2012 at 7:25 AM, Daniel Brügge
 daniel.brue...@googlemail.com wrote:
  Hi,
 
  I am currently running multiple Solr instances and often write data to
  them. I also query them. Both works fine right now, because I don't have
 so
  many search requests. For querying I recognized that the firstSearcher
 and
  newSearcher static warming with one facet query really brings a
 performance
  boost. But the downside is, that writing now is really slow.
 
  Does it make sense at all to place firstSearcher and newSearcher on a
 Solr
  server, which get lot's of writes. Or is the best strategy to introduce
  some slave server, where these event listeners are integrated, but to
 keep
  them away from the master?
 
  The thing is, that I would need 6 additional Solr slaves, if I would pick
  this approach. :)
 
  What do you think?
 
  Thanks.
  Daniel



Does it make sense to configure newSearcher and firstSearcher on a Solr Master instance?

2012-01-19 Thread Daniel Brügge
Hi,

I am currently running multiple Solr instances and often write data to
them. I also query them. Both works fine right now, because I don't have so
many search requests. For querying I recognized that the firstSearcher and
newSearcher static warming with one facet query really brings a performance
boost. But the downside is, that writing now is really slow.

Does it make sense at all to place firstSearcher and newSearcher on a Solr
server, which get lot's of writes. Or is the best strategy to introduce
some slave server, where these event listeners are integrated, but to keep
them away from the master?

The thing is, that I would need 6 additional Solr slaves, if I would pick
this approach. :)

What do you think?

Thanks.
Daniel


Re: Can Apache Solr Handle TeraByte Large Data

2012-01-13 Thread Daniel Brügge
Hi,

it's definitely a problem to store 5TB in Solr without using sharding. I try to 
split data over solr instances,
so that the index will fit in my memory on the server.

I ran into trouble with a Solr using 50G index. 

Daniel

On Jan 13, 2012, at 1:08 PM, mustafozbek wrote:

 I am an apache solr user about a year. I used solr for simple search tools
 but now I want to use solr with 5TB of data. I assume that 5TB data will be
 7TB when solr index it according to filter that I use. And then I will add
 nearly 50MB of data per hour to the same index.
 1-Are there any problem using single solr server with 5TB data. (without
 shards)
   a-  Can solr server answers the queries in an acceptable time
   b-  what is the expected time for commiting of 50MB data on 7TB index.
   c-  Is there an upper limit for index size.
 2-what are the suggestions that you offer
   a-  How many shards should I use
   b-  Should I use solr cores
   c-  What is the committing frequency you offered. (is 1 hour OK)
 3-are there any test results for this kind of large data
 
 There is no available 5TB data, I just want to estimate what will be the
 result.
 Note: You can assume that hardware resourses are not a problem.
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Can-Apache-Solr-Handle-TeraByte-Large-Data-tp3656484p3656484.html
 Sent from the Solr - User mailing list archive at Nabble.com.