date:20120920

Re: Backup strategy for SolrCloud

2012-09-20 Thread Tommaso Teofili

I also think that's a good question and currently without a "use this"
answer :-)
I think it shouldn't be hard to write a Solr service querying ZK and
replicate both conf and indexes (via SnapPuller or ZK itself) so that such
a node is responsible to back up the whole cluster in a secure storage
(NAS, EBS, etc.).

just my 2 cents,
Tommaso

2012/9/21 Otis Gospodnetic 

> Sounds good.
>
> But I think this was still a good question: Is there a way to back up
> an index that lives in SolrCloud and if so, how?
>
> Otis
> Search Analytics - http://sematext.com/search-analytics/index.html
> Performance Monitoring - http://sematext.com/spm/index.html
>
>
> On Thu, Sep 20, 2012 at 7:35 PM, Upayavira  wrote:
> > What sorts of failures are you thinking of? Power loss? Index
> > corruption? Server overload?
> >
> > Could you keep somewhat remote replicas of each shard, but not behind
> > your load balancer?
> >
> > Then, should all your customer facing nodes go down, those replicas
> > would be elected leaders. When you bring the customer facing ones back,
> > they would justd pull their indexes from your remote replicas, and you'd
> > be good to go once more.
> >
> > Upayavira
> >
> > On Thu, Sep 20, 2012, at 10:30 PM, jimtronic wrote:
> >> I'm thinking about catastrophic failure and recovery. If, for some
> >> reason,
> >> the cluster should go down or become unusable and I simply want to bring
> >> it
> >> back up as quickly as possible, what's the best way to accomplish that?
> >>
> >> Maybe I'm thinking about this incorrectly? Is this not a concern?
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/Backup-strategy-for-SolrCloud-tp4009291p4009297.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: DIH import from MySQL results in garbage text for special chars

2012-09-20 Thread Pranav Prakash

I am seeing the garbage text in browser, Luke Index Toolbox and everywhere
it is the same. My servlet container is Jetty which is the out-of-box one.
Many other special chars are getting indexed and stored properly, only few
characters causes pain.

*Pranav Prakash*

"temet nosce"



On Fri, Sep 14, 2012 at 6:36 PM, Erick Erickson wrote:

> Is your _browser_ set to handle the appropriate character set? Or whatever
> you're using to inspect your data? How about your servlet container?
>
>
>
> Best
> Erick
>
> On Mon, Sep 10, 2012 at 7:47 AM, Pranav Prakash  wrote:
> > Hi Folks,
> >
> > I am attempting to import documents to Solr from MySQL using DIH. One of
> > the field contains the text - “Future of Mobile Value Added Services
> (VAS)
> > in Australia” .Notice the character “ and ”.
> >
> > When I am importing, it gets stored as - â€œFuture of Mobile Value Added
> > Services (VAS) in Australiaâ€�.
> >
> > The datasource config clearly mentions use of UTF-8 as follows:
> >
> >> driver="com.mysql.jdbc.Driver"
> > url="jdbc:mysql://localhost/ohapp_devel"
> > user="username"
> > useUnicode="true"
> > characterEncoding="UTF-8"
> > password="password"
> > zeroDateTimeBehavior="convertToNull"
> > name="app" />
> >
> >
> > A plain SQL Select statement on the MySQL Console gives appropriate
> text. I
> > even tried using following scriptTransformer to get rid of this char, but
> > it was of no particular use in my case.
> >
> > function gsub(source, pattern, replacement) {
> >   var match, result;
> >   if (!((pattern != null) && (replacement != null))) {
> > return source;
> >   }
> >   result = '';
> >   while (source.length > 0) {
> > if ((match = source.match(pattern))) {
> >   result += source.slice(0, match.index);
> >   result += replacement;
> >   source = source.slice(match.index + match[0].length);
> > } else {
> >   result += source;
> >   source = '';
> > }
> >   }
> >   return result;
> > }
> >
> > function fixQuotes(c){
> >   c = gsub(c, /\342\200(?:\234|\235)/,'"');
> >   c = gsub(c, /\342\200(?:\230|\231)/,"'");
> >   c = gsub(c, /\342\200\223/,"-");
> >   c = gsub(c, /\342\200\246/,"...");
> >   c = gsub(c, /\303\242\342\202\254\342\204\242/,"'");
> >   c = gsub(c, /\303\242\342\202\254\302\235/,'"');
> >   c = gsub(c, /\303\242\342\202\254\305\223/,'"');
> >   c = gsub(c, /\303\242\342\202\254"/,'-');
> >   c = gsub(c, /\342\202\254\313\234/,'"');
> >   c = gsub(c, /“/, '"');
> >   return c;
> > }
> >
> > function cleanFields(row){
> >   var fieldsToClean = ['title', 'description'];
> >   for(i =0; i< fieldsToClean.length; i++){
> > var old_text = String(row.get(fieldsToClean[i]));
> > row.put(fieldsToClean[i], fixQuotes(old_text) );
> >   }
> >   return row;
> > }
> >
> > My understanding goes that this must be a very common problem. It also
> > occurs with human names which have these chars. What is an appropriate
> way
> > to get the appropriate text indexed and searchable? The fieldtype where
> > this is stored goes as follows
> >
> >   
> > 
> >   
> >   
> >   
> >   
> >   
> >> protected="protwords.txt"/>
> >  >   synonyms="synonyms.txt"
> >   ignoreCase="true"
> >   expand="true" />
> >  >   words="stopwords_en.txt"
> >   ignoreCase="true" />
> >  >   words="stopwords_en.txt"
> >   ignoreCase="true" />
> >  >   generateWordParts="1"
> >   generateNumberParts="1"
> >   catenateWords="1"
> >   catenateNumbers="1"
> >   catenateAll="0"
> >   preserveOriginal="1" />
> >   
> > 
> >
> >
> > *Pranav Prakash*
> >
> > "temet nosce"
>

Re: Solr Swap Function doesn't work when using Solr Cloud Beta

2012-09-20 Thread sam fang

Hi Hoss,

Thanks for your quick reply.

Below is my solr.xml configuration, and already set persistent to true.



  





For test1 and tets1-ondeck content, just copied from
example/solr/collection1

Then publish 1 record to test1, and query. it's ok now.

INFO: [test1] webapp=/solr path=/select
params={distrib=false&wt=javabin&rows=10&version=2&fl=id,score&df=text&NOW=1348195088691&shard.url=host1:18000/solr/test1/&start=0&q=*:*&isShard=true&fsv=true}
hits=1 status=0 QTime=1
Sep 20, 2012 10:38:08 PM org.apache.solr.core.SolrCore execute
INFO: [test1] webapp=/solr path=/select
params={ids=SOLR1000&distrib=false&wt=javabin&rows=10&version=2&df=text&NOW=1348195088691&shard.url=
host1:18000/solr/test1/&q=*:*&isShard=true} status=0 QTime=1
Sep 20, 2012 10:38:08 PM org.apache.solr.core.SolrCore execute
INFO: [test1] webapp=/solr path=/select params={q=*:*&wt=python} status=0
QTime=20


Then use core admin console page to swap, and click reload for test1 and
test1-ondeck.  if keep refresh query page, sometimes give 1 record,
sometime give 0 records.
And found the shardurl is different with the log which search before swap.
It’s shard.url=host1:18000/solr/test1-ondeck/| host1:18000/solr/test1/.

Below return 0
S Sep 20, 2012 10:41:32 PM org.apache.solr.core.SolrCore execute
INFO: [test1] webapp=/solr path=/select
params={fl=id,score&df=text&NOW=1348195292608&shard.url=host1:18000/solr/test1-ondeck/|
host1:18000/solr/test1/&start=0&q=*:*&distrib=false&isShard=true&wt=javabin&fsv=true&rows=10&version=2}
hits=0 status=0 QTime=0
Sep 20, 2012 10:41:32 PM org.apache.solr.core.SolrCore execute
INFO: [test1] webapp=/solr path=/select params={q=*:*&wt=python} status=0
QTime=14

Below return 1
Sep 20, 2012 10:42:31 PM org.apache.solr.core.SolrCore execute
INFO: [test1-ondeck] webapp=/solr path=/select
params={fl=id,score&df=text&NOW=1348195351293&shard.url=
host1:18000/solr/test1-ondeck/|
host1:18000/solr/test1/&start=0&q=*:*&distrib=false&isShard=true&wt=javabin&fsv=true&rows=10&version=2}
hits=1 status=0 QTime=1
Sep 20, 2012 10:42:31 PM org.apache.solr.core.SolrCore execute
INFO: [test1-ondeck] webapp=/solr path=/select
params={df=text&NOW=1348195351293&shard.url=
host1:18000/solr/test1-ondeck/|
host1:18000/solr/test1/&q=*:*&ids=SOLR1000&distrib=false&isShard=true&wt=javabin&rows=10&version=2}
status=0 QTime=1
Sep 20, 2012 10:42:31 PM org.apache.solr.core.SolrCore execute
INFO: [test1] webapp=/solr path=/select params={q=*:*&wt=python} status=0
QTime=9

Thanks a lot,
Sam

On Thu, Sep 20, 2012 at 8:27 PM, Chris Hostetter
wrote:

> : In Solr 3.6, core swap function works good. After switch to use Solr 4.0
> : Beta, and found it doesn't work well.
>
> can you elaborate on what exactly you mean by "doesn't work well" ? ..
> what does your solr.xml file look like? what command did you run to do the
> swap? what results did you get from those commands?  what exactly did you
> observe after teh swap and how did you observe it?
>
> : I tried to swap two cores, but it still return old core data when do the
> : search. After restart tomat which contain Solr, it will mess up when do
> the
> : search, seems it will use like oldcoreshard|newcoreshard to do the
> search.
> : Anyone hit this issue?
>
> how did you "do the search" ? is it possible you were just seeing your
> browser cache the results?  Do you have persistent="true" in your solr.xml
> file? w/o that changes made via the CoreAdmin commands won't be saved to
> disk.
>
> I just tested using both 4.0-BETA and the HEAD of the 4x branch and
> couldn't see any problems using SWAP  (i tested using 'java
> -Dsolr.solr.home=multicore/ -jar start.jar' and indexing some trivial
> docs, and then tested again after modifying the solr.xml to use
> persistent="true")
>
>
> -Hoss
>

Re: MMapDirectory

2012-09-20 Thread Lance Norskog

The Solr caches are thrown away on each hard commit. The document cache could 
be conserved across commits. Documents in segments that still exist would be 
saved. Documents in segments that are removed would be thrown away.

Perhaps the document cache should be pushed down into Lucene, to handle this 
well?

- Original Message -
| From: "Mikhail Khludnev" 
| To: solr-user@lucene.apache.org
| Sent: Thursday, September 20, 2012 11:19:58 AM
| Subject: Re: MMapDirectory
| 
| My limited understanding, confirmed by profiler though, is that doing
| mmap
| IO cost you a copying bytes from mmaped virtual memory into heap VM.
| Just
| look into java.nio.DirectByteBuffer.get(byte[], int, int) . It
| happens
| several times to me - we saw hotspot  in profiler on mmaped IO (yep,
| just
| in copying bytes!!), cache them in heap and we had hotspot moved
| after that.
| Good sample of heap cache for mmaped data is terminfos cache with
| configurable interval.
| Overal question is absolutely worth to think about.
| 
| On Thu, Sep 20, 2012 at 9:39 PM, Erick Erickson
| wrote:
| 
| > So I just had a curiosity question pop up and wanted to check it
| > out.
| > Solr has the documentCache, designed to hold stored fields while
| > various parts of a requestHandler do their tricks, keeping the
| > stored
| > content from having to be re-fetched from disk. When using
| > MMapDirectory, is this even something to worry about?
| >
| > It seems like documentCache wouldn't be all that useful, but then I
| > don't have a deep understanding here. I can imagine scenarios where
| > it
| > would be more efficient i.e. it's targeted to the documents
| > actually
| > being accessed rather than random places on disk in the fdt/fdx
| > files
| >
| > Thanks,
| > Erick
| >
| 
| 
| 
| --
| Sincerely yours
| Mikhail Khludnev
| Tech Lead
| Grid Dynamics
| 
| 
|  
|

Re: Solr Swap Function doesn't work when using Solr Cloud Beta

2012-09-20 Thread Chris Hostetter

: In Solr 3.6, core swap function works good. After switch to use Solr 4.0
: Beta, and found it doesn't work well.

can you elaborate on what exactly you mean by "doesn't work well" ? .. 
what does your solr.xml file look like? what command did you run to do the 
swap? what results did you get from those commands?  what exactly did you 
observe after teh swap and how did you observe it?

: I tried to swap two cores, but it still return old core data when do the
: search. After restart tomat which contain Solr, it will mess up when do the
: search, seems it will use like oldcoreshard|newcoreshard to do the search.
: Anyone hit this issue?

how did you "do the search" ? is it possible you were just seeing your 
browser cache the results?  Do you have persistent="true" in your solr.xml 
file? w/o that changes made via the CoreAdmin commands won't be saved to 
disk.

I just tested using both 4.0-BETA and the HEAD of the 4x branch and 
couldn't see any problems using SWAP  (i tested using 'java 
-Dsolr.solr.home=multicore/ -jar start.jar' and indexing some trivial 
docs, and then tested again after modifying the solr.xml to use 
persistent="true")


-Hoss

Re: Backup strategy for SolrCloud

2012-09-20 Thread Upayavira

What sorts of failures are you thinking of? Power loss? Index
corruption? Server overload?

Could you keep somewhat remote replicas of each shard, but not behind
your load balancer?

Then, should all your customer facing nodes go down, those replicas
would be elected leaders. When you bring the customer facing ones back,
they would justd pull their indexes from your remote replicas, and you'd
be good to go once more.

Upayavira

On Thu, Sep 20, 2012, at 10:30 PM, jimtronic wrote:
> I'm thinking about catastrophic failure and recovery. If, for some
> reason,
> the cluster should go down or become unusable and I simply want to bring
> it
> back up as quickly as possible, what's the best way to accomplish that? 
> 
> Maybe I'm thinking about this incorrectly? Is this not a concern?
> 
> 
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Backup-strategy-for-SolrCloud-tp4009291p4009297.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr4 how to make it do this?

2012-09-20 Thread george123

I have been thinking about this some more.

So my scenario of search is as follows.

A visitor types in

3 bed 2 bath condo new york

Now my schema has bed, bath, property type, city. The data going in is
denormalised csv files, so column headings are the fields.

The search consists of a near exact match for this type of data to be a
field search eg bed:3 or property type:condo.

I have been experimenting on solrs auto complete functionality.

And I think if I can somehow create a field (Auto_complete) that is
basically a concatenation. I will use a rough excel formula to explain.

Column A,B,C,D = Bed, Bath, Type, City

A1=3, B1=2, C1=condo, D1=New York.

In E1 I would write =concatenate(A1," ","bed"," ",B1," ","bath"," ",C1,"
",D1) 
the value being "3 bed 2 bath condo new york"

Now if I can generate this in the solr index somehow (I know I can do it in
my csv files beforehand but solr might be easier) and my Autocomplete
function hooks into this field (Auto_complete) if a user types in a more
natural search term, it should pick this up and I dont have to try and pre
process that search term beforehand.

Thoughts?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr4-how-to-make-it-do-this-tp4008574p4009304.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Backup strategy for SolrCloud

2012-09-20 Thread Markus Jelsma

If reindexing from raw XML files is feasible (less than 30 minutes) it would be 
the easiest option. The problem with recovering with old snapshots is that you 
have to remove bad indices from all cores and possible stale (or recoveries in 
progress) indices and replace it with your snapshot and modifiy the 
index.properties files for each core and point it to your backed up snapshot. 

This is a manual job and is prone to errors if you're want to fix it all as 
soon as possible.
 
-Original message-
> From:jimtronic 
> Sent: Thu 20-Sep-2012 23:34
> To: solr-user@lucene.apache.org
> Subject: RE: Backup strategy for SolrCloud
> 
> I'm thinking about catastrophic failure and recovery. If, for some reason,
> the cluster should go down or become unusable and I simply want to bring it
> back up as quickly as possible, what's the best way to accomplish that? 
> 
> Maybe I'm thinking about this incorrectly? Is this not a concern?
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Backup-strategy-for-SolrCloud-tp4009291p4009297.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

RE: Backup strategy for SolrCloud

2012-09-20 Thread jimtronic

I'm thinking about catastrophic failure and recovery. If, for some reason,
the cluster should go down or become unusable and I simply want to bring it
back up as quickly as possible, what's the best way to accomplish that? 

Maybe I'm thinking about this incorrectly? Is this not a concern?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Backup-strategy-for-SolrCloud-tp4009291p4009297.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: some general solr 4.0 questions

2012-09-20 Thread Petersen, Robert

That is a great idea to run the updates thru the LB also!  I like it!

Thanks for the replies guys

-Original Message-
From: jimtronic [mailto:jimtro...@gmail.com] 
Sent: Thursday, September 20, 2012 1:46 PM
To: solr-user@lucene.apache.org
Subject: Re: some general solr 4.0 questions

I've got a setup like yours -- lots of cores and replicas, but no need for 
shards -- and here's what I've found so far:

1. Zookeeper is tiny. I would think network I/O is going to be the biggest 
concern.

2. I think this is more about high availability than performance. I've been 
expirementing with taking down parts of my setup to see what happens. When 
zookeeper goes down, the solr instances still serve requests. It appears, 
however, that updating and replication stop. I want to make frequent updates so 
this is a big concern for me.

3. On ec2, I launch a server which is configured to register itself with my 
zookeeper box upon launch. When they are ready I add them to my load balancer. 
Theoretically, zookeeper would help further balance them, but right now I find 
those queries to be too slow. Since the load balancer is already distributing 
the load, I'm adding the parameter "distrib=false" to my queries. This forces 
the request to stay on the box the load balancer chose.

4. This is interesting. I started down this path of wanting to maintain a 
master, but I've moved towards a system where all of my update requests go 
through my load balancer. Since zookeeper dynamically elects a leader, no 
matter which box gets the update the leader gets it anyway. This is very nice 
for me because I want all my solr instances to be identical.

Since there's not a lot of documentation on this yet, I hope other people share 
their findings, too.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/some-general-solr-4-0-questions-tp4009267p4009286.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Backup strategy for SolrCloud

2012-09-20 Thread Walter Underwood

He explained why in the message. Because it is faster to bring up a new host 
from a snapshot.

I presume that he doesn't need the full cluster running all the time.

wunder

On Sep 20, 2012, at 2:19 PM, Markus Jelsma wrote:

> Hi,
> 
> Why do you want to back up? With enough machines and a decent replication 
> factor (3 or higher) there is usually little need to back it up. If you have 
> the space it's better to launch a second cluster in another DC.
> 
> You can also choose to increase the number of maxCommitsToKeep but it'll take 
> a lot of space and I/O if you have a frequent auto-commit enabled. Another 
> options is to keep generating raw Solr XML files, you can easily load 
> millions of documents in 10-15 minutes.
> 
> Cheers,
> 
> -Original message-
>> From:jimtronic 
>> Sent: Thu 20-Sep-2012 23:04
>> To: solr-user@lucene.apache.org
>> Subject: Backup strategy for SolrCloud
>> 
>> I'm trying to determine my options for backing up data from a SolrCloud
>> cluster.
>> 
>> For me, bringing up my cluster from scratch can take several hours. It's way
>> faster to take snapshots of the index periodically and then use one of these
>> when booting a new instance. Since I use static xml files and delta-imports,
>> everything catches up on quickly.
>> 
>> Sorry if this is a dumb question, but where do I pull the snapshots from?
>> Zookeeper? Any box in the cluster? The leader?
>> 
>> Thanks!
>> Jim
>>

RE: Backup strategy for SolrCloud

2012-09-20 Thread Markus Jelsma

Hi,

Why do you want to back up? With enough machines and a decent replication 
factor (3 or higher) there is usually little need to back it up. If you have 
the space it's better to launch a second cluster in another DC.

You can also choose to increase the number of maxCommitsToKeep but it'll take a 
lot of space and I/O if you have a frequent auto-commit enabled. Another 
options is to keep generating raw Solr XML files, you can easily load millions 
of documents in 10-15 minutes.

Cheers,


 
 
-Original message-
> From:jimtronic 
> Sent: Thu 20-Sep-2012 23:04
> To: solr-user@lucene.apache.org
> Subject: Backup strategy for SolrCloud
> 
> I'm trying to determine my options for backing up data from a SolrCloud
> cluster.
> 
> For me, bringing up my cluster from scratch can take several hours. It's way
> faster to take snapshots of the index periodically and then use one of these
> when booting a new instance. Since I use static xml files and delta-imports,
> everything catches up on quickly.
> 
> Sorry if this is a dumb question, but where do I pull the snapshots from?
> Zookeeper? Any box in the cluster? The leader?
> 
> Thanks!
> Jim
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Backup-strategy-for-SolrCloud-tp4009291.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: deleting a single value from multivalued field

2012-09-20 Thread jimtronic

Just added this today. 

https://issues.apache.org/jira/browse/SOLR-3862



--
View this message in context: 
http://lucene.472066.n3.nabble.com/deleting-a-single-value-from-multivalued-field-tp4009092p4009292.html
Sent from the Solr - User mailing list archive at Nabble.com.

Backup strategy for SolrCloud

2012-09-20 Thread jimtronic

I'm trying to determine my options for backing up data from a SolrCloud
cluster.

For me, bringing up my cluster from scratch can take several hours. It's way
faster to take snapshots of the index periodically and then use one of these
when booting a new instance. Since I use static xml files and delta-imports,
everything catches up on quickly.

Sorry if this is a dumb question, but where do I pull the snapshots from?
Zookeeper? Any box in the cluster? The leader?

Thanks!
Jim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Backup-strategy-for-SolrCloud-tp4009291.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problems with SolrEnitityProcessor + frange filterQuery

2012-09-20 Thread Jack Krupansky

Sorry, but it looks like the SolrEntityProcessor does a raw split on commas 
of its "fq" parameter, with no provision for escaping.


You should be able to combine the fq into the query parameter as a nested 
query which does not have the split issue.


-- Jack Krupansky

-Original Message- 
From: Dirceu Vieira

Sent: Thursday, September 20, 2012 4:16 PM
To: solr-user@lucene.apache.org
Subject: Re: Problems with SolrEnitityProcessor + frange filterQuery

Hi guys,

Has anybody got any idea about that?
I'm really open for any suggestions

Thanks!

Dirceu

On Thu, Sep 20, 2012 at 11:58 AM, Dirceu Vieira  wrote:


Hi,

I'm attempting to write a filter query for my SolrEntityProcessor using
{frange} over a function.
It works fine when I'm testing it on the admin, but once I move it into my
data-config.xml the query blows up because of the commas in the function.
The problem is that fq parameter can be a comma separated list, which
means that if I have commas within my query, it'll try to split it into
multiple filter queries.

Does anybody knows a way of escaping the comma or another way I can work
around that?

I've been using SolrEntityProcessor to import filtered data from a core to
another, here's the queries:

query="status:1 AND NOT priority:\-1"
fq="{!frange l=3000 u=5000}max(sum(suser_count), sum(user_count))"

I'm using Solr-4.0.0-BETA.



Best regards,

--
Dirceu Vieira Júnior
---
+47 9753 2473
dirceuvjr.blogspot.com
twitter.com/dirceuvjr





--
Dirceu Vieira Júnior
---
+47 9753 2473
dirceuvjr.blogspot.com
twitter.com/dirceuvjr

Re: some general solr 4.0 questions

2012-09-20 Thread jimtronic

I've got a setup like yours -- lots of cores and replicas, but no need for
shards -- and here's what I've found so far:

1. Zookeeper is tiny. I would think network I/O is going to be the biggest
concern.

2. I think this is more about high availability than performance. I've been
expirementing with taking down parts of my setup to see what happens. When
zookeeper goes down, the solr instances still serve requests. It appears,
however, that updating and replication stop. I want to make frequent updates
so this is a big concern for me.

3. On ec2, I launch a server which is configured to register itself with my
zookeeper box upon launch. When they are ready I add them to my load
balancer. Theoretically, zookeeper would help further balance them, but
right now I find those queries to be too slow. Since the load balancer is
already distributing the load, I'm adding the parameter "distrib=false" to
my queries. This forces the request to stay on the box the load balancer
chose.

4. This is interesting. I started down this path of wanting to maintain a
master, but I've moved towards a system where all of my update requests go
through my load balancer. Since zookeeper dynamically elects a leader, no
matter which box gets the update the leader gets it anyway. This is very
nice for me because I want all my solr instances to be identical.

Since there's not a lot of documentation on this yet, I hope other people
share their findings, too.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/some-general-solr-4-0-questions-tp4009267p4009286.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: some general solr 4.0 questions

2012-09-20 Thread Otis Gospodnetic

I'll answer the other easy ones ;)

#1 yes, no need for a ton of RAM and tons of cores.

#2 it's not the overhead, it's that zookeeper is sensitive to not
hearing from nodes and marking them dead, at least in the Hadoop and
HBase world.

#3 yes, the external LB would simply spread the query load over all
your Solr 4.0 nodes

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Thu, Sep 20, 2012 at 3:37 PM, Erik Hatcher  wrote:
> I'll answer the easy one:
>
> #4 - yes!   In fact, it would seem wise in many of these straightforward 
> cases like yours to leave standard master/slave as-is for the time being even 
> when upgrading to Solr 4.  No need to make life more complicated.  Now, if 
> you did want to have NRT where updates are pushed to the replicas as they 
> come in, then that's when the SolrCloud capabilities will come into play.
>
> But, if it ain't broke, don't fix it.
>
> Erik
>
> On Sep 20, 2012, at 14:51 , Petersen, Robert wrote:
>
>> Hello solr user group,
>>
>> I am evaluating the new Solr 4.0 beta with an eye to how to fit it into our 
>> current solr setup.  Our current setup is running on solr 3.6.1 and uses 12 
>> slaves behind a load balancer and a master which we index into, and they all 
>> have three cores (now referred to as collections in 4.0 eh?) for three 
>> disparate types of indexes.  All machines are configured with dual quad xeon 
>> cpus and 64GB main memory.  We've worked hard to keep our index sizes small 
>> despite holding millions of documents, so we have no need to shard any of 
>> the indexes.  Everything is working very well at this time.
>>
>> So to move to solr 4.0, I imagine we'd set -DnumShards=1 and spin up 11 
>> replicas, but I'm worried about the statement "For production, it's 
>> recommended that you run an external zookeeper ensemble rather than having 
>> Solr run embedded zookeeper servers."  That means we'd need at least three 
>> more machines dedicated to just running zookeeper.   So here are my 
>> questions:
>>
>>
>> 1.Could the zookeeper servers be smaller commodity servers?  Ie They 
>> wouldn't need 64GB of memory and huge CPUs right?
>>
>> 2.Is the overhead of running embedded zookeeper really great enough to 
>> require the external ensemble?  Our configuration will be pretty static, I 
>> don't anticipate having to change the zookeeper cluster once it is set up 
>> unless a machine completely dies or something.
>>
>> 3.Can we still use our external load balancer hardware to distribute 
>> queries to the solr 4.0 replicas as we do now with our slave farm?
>>
>> 4.Can solr 4.0 still run in a master- slave configuration if we don't 
>> want to use zookeeper or some of the other cloud features?
>>
>>
>> Thanks,
>>
>> Robert (Robi) Petersen
>> Senior Software Engineer
>> Site Search Specialist
>>
>>
>

Re: Problems with SolrEnitityProcessor + frange filterQuery

2012-09-20 Thread Dirceu Vieira

Hi guys,

Has anybody got any idea about that?
I'm really open for any suggestions

Thanks!

Dirceu

On Thu, Sep 20, 2012 at 11:58 AM, Dirceu Vieira  wrote:

> Hi,
>
> I'm attempting to write a filter query for my SolrEntityProcessor using
> {frange} over a function.
> It works fine when I'm testing it on the admin, but once I move it into my
> data-config.xml the query blows up because of the commas in the function.
> The problem is that fq parameter can be a comma separated list, which
> means that if I have commas within my query, it'll try to split it into
> multiple filter queries.
>
> Does anybody knows a way of escaping the comma or another way I can work
> around that?
>
> I've been using SolrEntityProcessor to import filtered data from a core to
> another, here's the queries:
>
> query="status:1 AND NOT priority:\-1"
> fq="{!frange l=3000 u=5000}max(sum(suser_count), sum(user_count))"
>
> I'm using Solr-4.0.0-BETA.
>
>
>
> Best regards,
>
> --
> Dirceu Vieira Júnior
> ---
> +47 9753 2473
> dirceuvjr.blogspot.com
> twitter.com/dirceuvjr
>
>


-- 
Dirceu Vieira Júnior
---
+47 9753 2473
dirceuvjr.blogspot.com
twitter.com/dirceuvjr

Re: Regarding Search

2012-09-20 Thread Åsmund Tokheim

Hi

You gave us quite little information to go on, but I can list some probable
reasons for why you search doesn't match any documents. In schema.xml check
that:
- you have specified fields for custid, familyname, usrname
- that those fields have the attribute indexed="true"
- that they are not of type string (see
http://stackoverflow.com/questions/7175619/apache-solr-string-or-text)

Assuming you havent made any changes to the default request handler in
solrconfig.xml you need to specify what fields you want to match you query
with. You can do this during a query by changing your q-parameter
to "usrname:admin familyname:admin". This will match any documents where
the text "admin" is in either the usrname or familyname fields. Depending
on the field type you used in schema.xml the search might be case
sensitive, so check that you use the correct case as well.

Hope this helps

Åsmund

On Thu, Sep 20, 2012 at 2:39 PM, darshan  wrote:

> HI Fellows,
>
> I had added the following fields in my data-config.xml to
> implement Data Import Handler
>
> 
>
>   
>  url="jdbc:postgresql://192.168.1.46:5432/evergreen"
>
>   user="postgres"
>
> />
>
>   
>
>
>
>   
>
>   
>
>   
>
>   
>
>   
>
> 
>
> When I perform steps of Full import Example at
> http://wiki.apache.org/solr/DataImportHandler
>
> I can successfully index on my database, BUT here the issue is after
> visiting admin page for search and search for any text, I didn't receive
> any
> output and the response is always
>
> 
>
> 
>
> 0
>
> 2
>
> 
>
> on
>
> 0
>
> admin
>
> 2.2
>
> 10
>
> 
>
> 
>
> 
>
> 
>
> Please guide me on this , am a newbie to apache solr.
>
>
>
> Thanks,
>
> Darshan
>
>
>
>
>
>

Re: some general solr 4.0 questions

2012-09-20 Thread Erik Hatcher

I'll answer the easy one:

#4 - yes!   In fact, it would seem wise in many of these straightforward cases 
like yours to leave standard master/slave as-is for the time being even when 
upgrading to Solr 4.  No need to make life more complicated.  Now, if you did 
want to have NRT where updates are pushed to the replicas as they come in, then 
that's when the SolrCloud capabilities will come into play.  

But, if it ain't broke, don't fix it.

Erik

On Sep 20, 2012, at 14:51 , Petersen, Robert wrote:

> Hello solr user group,
> 
> I am evaluating the new Solr 4.0 beta with an eye to how to fit it into our 
> current solr setup.  Our current setup is running on solr 3.6.1 and uses 12 
> slaves behind a load balancer and a master which we index into, and they all 
> have three cores (now referred to as collections in 4.0 eh?) for three 
> disparate types of indexes.  All machines are configured with dual quad xeon 
> cpus and 64GB main memory.  We've worked hard to keep our index sizes small 
> despite holding millions of documents, so we have no need to shard any of the 
> indexes.  Everything is working very well at this time.
> 
> So to move to solr 4.0, I imagine we'd set -DnumShards=1 and spin up 11 
> replicas, but I'm worried about the statement "For production, it's 
> recommended that you run an external zookeeper ensemble rather than having 
> Solr run embedded zookeeper servers."  That means we'd need at least three 
> more machines dedicated to just running zookeeper.   So here are my questions:
> 
> 
> 1.Could the zookeeper servers be smaller commodity servers?  Ie They 
> wouldn't need 64GB of memory and huge CPUs right?
> 
> 2.Is the overhead of running embedded zookeeper really great enough to 
> require the external ensemble?  Our configuration will be pretty static, I 
> don't anticipate having to change the zookeeper cluster once it is set up 
> unless a machine completely dies or something.
> 
> 3.Can we still use our external load balancer hardware to distribute 
> queries to the solr 4.0 replicas as we do now with our slave farm?
> 
> 4.Can solr 4.0 still run in a master- slave configuration if we don't 
> want to use zookeeper or some of the other cloud features?
> 
> 
> Thanks,
> 
> Robert (Robi) Petersen
> Senior Software Engineer
> Site Search Specialist
> 
>

some general solr 4.0 questions

2012-09-20 Thread Petersen, Robert

Hello solr user group,

I am evaluating the new Solr 4.0 beta with an eye to how to fit it into our 
current solr setup.  Our current setup is running on solr 3.6.1 and uses 12 
slaves behind a load balancer and a master which we index into, and they all 
have three cores (now referred to as collections in 4.0 eh?) for three 
disparate types of indexes.  All machines are configured with dual quad xeon 
cpus and 64GB main memory.  We've worked hard to keep our index sizes small 
despite holding millions of documents, so we have no need to shard any of the 
indexes.  Everything is working very well at this time.

So to move to solr 4.0, I imagine we'd set -DnumShards=1 and spin up 11 
replicas, but I'm worried about the statement "For production, it's recommended 
that you run an external zookeeper ensemble rather than having Solr run 
embedded zookeeper servers."  That means we'd need at least three more machines 
dedicated to just running zookeeper.   So here are my questions:


1.Could the zookeeper servers be smaller commodity servers?  Ie They 
wouldn't need 64GB of memory and huge CPUs right?

2.Is the overhead of running embedded zookeeper really great enough to 
require the external ensemble?  Our configuration will be pretty static, I 
don't anticipate having to change the zookeeper cluster once it is set up 
unless a machine completely dies or something.

3.Can we still use our external load balancer hardware to distribute 
queries to the solr 4.0 replicas as we do now with our slave farm?

4.Can solr 4.0 still run in a master- slave configuration if we don't want 
to use zookeeper or some of the other cloud features?


Thanks,

Robert (Robi) Petersen
Senior Software Engineer
Site Search Specialist

Re: Best way to index Solr XML from w/in the same servlet container

2012-09-20 Thread Chris Hostetter


: I've created a custom process in Solr that has a Zookeeper Watcher
: configured to pull Solr XML files from a znode. When I receive a file I can
: send the file to /update and get it indexed, but that seems inefficient. I
: could use SolrJ, but I believe that is still sending an HTTP request to
: /update. Is there a better way to do this, or is SolrJ running w/in the
: same servlet container the most efficient way to index SolrJ from w/in the
: same servlet container that is running Solr?

you could use the stream.file option to have the UpdateRequestHandler pull 
the data fro ma local file path, but i'm not sure that would really be 
much faster.  Alternatively you couuld call out directly to hte XMLLoader 
to parse it into SolrInputDocument objects and send them directly to the 
UpdateRequestProcessorChain - depends on how much of hte abstraction you 
want to bypass (ie: if you want to be able to parse multiple types of 
files, you're probably better off talking to hte request handler)

Actually -- using a LocalSolrQueryRequest object where you setup a 
ContentStream pointed at the local file might be the method with the least 
amount of overhaed and the most flexibility.



-Hoss

Re: MMapDirectory

2012-09-20 Thread Mikhail Khludnev

My limited understanding, confirmed by profiler though, is that doing mmap
IO cost you a copying bytes from mmaped virtual memory into heap VM. Just
look into java.nio.DirectByteBuffer.get(byte[], int, int) . It happens
several times to me - we saw hotspot  in profiler on mmaped IO (yep, just
in copying bytes!!), cache them in heap and we had hotspot moved after that.
Good sample of heap cache for mmaped data is terminfos cache with
configurable interval.
Overal question is absolutely worth to think about.

On Thu, Sep 20, 2012 at 9:39 PM, Erick Erickson wrote:

> So I just had a curiosity question pop up and wanted to check it out.
> Solr has the documentCache, designed to hold stored fields while
> various parts of a requestHandler do their tricks, keeping the stored
> content from having to be re-fetched from disk. When using
> MMapDirectory, is this even something to worry about?
>
> It seems like documentCache wouldn't be all that useful, but then I
> don't have a deep understanding here. I can imagine scenarios where it
> would be more efficient i.e. it's targeted to the documents actually
> being accessed rather than random places on disk in the fdt/fdx
> files
>
> Thanks,
> Erick
>

-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

Re: Split XML configuration

2012-09-20 Thread Michael Della Bitta

Ah, I just upgraded us to 3.6, and abandoned xi:include in favor of
symlinks, so I didn't know whether it was fixed or not.

Another thing I just thought of is if you want your config files to be
available from the web UI, the xi:include directives won't be
resolved, so you'll just see the literal file. Unless this has changed
as well.

Michael Della Bitta

Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game

On Thu, Sep 20, 2012 at 1:27 PM, Chris Hostetter
 wrote:
>
> : "xi:include" directives work in Solr config files, but in most (all?)
> : versions of Solr, they require absolute paths, which makes portable
> : configuration slightly more sticky. Still, a very viable solution.
>
> Huh?
>
> There were bugs in xinclude parsing up to Solr 1.4 that caused relative
> paths to be resolved in a very wonky way (relative the CWD), so that
> absolute paths were recomended for the sake of clarity/sanity, but these
> bugs have been fixed since Solr 3.1...
>
> https://issues.apache.org/jira/browse/SOLR-1656
>
>
>
> -Hoss

MMapDirectory

2012-09-20 Thread Erick Erickson

So I just had a curiosity question pop up and wanted to check it out.
Solr has the documentCache, designed to hold stored fields while
various parts of a requestHandler do their tricks, keeping the stored
content from having to be re-fetched from disk. When using
MMapDirectory, is this even something to worry about?

It seems like documentCache wouldn't be all that useful, but then I
don't have a deep understanding here. I can imagine scenarios where it
would be more efficient i.e. it's targeted to the documents actually
being accessed rather than random places on disk in the fdt/fdx
files

Thanks,
Erick

Re: Split XML configuration

2012-09-20 Thread Chris Hostetter


: "xi:include" directives work in Solr config files, but in most (all?)
: versions of Solr, they require absolute paths, which makes portable
: configuration slightly more sticky. Still, a very viable solution.

Huh?  

There were bugs in xinclude parsing up to Solr 1.4 that caused relative 
paths to be resolved in a very wonky way (relative the CWD), so that 
absolute paths were recomended for the sake of clarity/sanity, but these 
bugs have been fixed since Solr 3.1...

https://issues.apache.org/jira/browse/SOLR-1656



-Hoss

4.0.snapshot to 4.0.beta index migration

2012-09-20 Thread vybe3142

Hi
We have a bunch of data that was indexes using a 4.0 snapshot build of solr

We'd like to migrate to the 4.0.beta version. Is there a reccomended way to
migrate the indices or is reindexing the best option

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/4-0-snapshot-to-4-0-beta-index-migration-tp4009247.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: deleting a single value from multivalued field

2012-09-20 Thread Jack Krupansky

There isn’t a mechanism to update or delete only a subset of a multivalued 
field. You would have to supply the full list of values you want to have in 
the multivalued field.


You may want to offer it as a suggested improvement.

-- Jack Krupansky

-Original Message- 
From: deniz

Sent: Thursday, September 20, 2012 1:45 AM
To: solr-user@lucene.apache.org
Subject: deleting a single value from multivalued field

I am working with Solr 4.0 Beta and have a structure like this:




I want to update str field, both adding and removing some values... adding
is okay but for deleting a field I couldnt find a way for doing the
reverse...

basically i want to do the one below:


From

1

 val1
 val2
 val3



After deleting val3

1

 val1
 val2




any ideas how to this? i have tried remove and delete statements in update
curl request but till now no success...



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/deleting-a-single-value-from-multivalued-field-tp4009092.html
Sent from the Solr - User mailing list archive at Nabble.com.

Prevent Log and other math functions from returning "Infinity" and erroring out

2012-09-20 Thread Amit Nithian

Is there any reason why the log function shouldn't be modified to
always take 1+the number being requested to be log'ed? Reason I ask is
I am taking the log of the value output by another function which
could return 0. For testing, I modified it to return 1 which works but
would rather have the log function simply add 1.

Of course I could do something like log(sum(...)) but that seems a bit
much OR just create my own modified log function in my code but was
wondering if there would be any objections to filing an issue and
patch to fix math functions like this from returning "infinity"?

Thanks
Amit

RE: Wildcard searches don't work

2012-09-20 Thread Alex Cougarman

Thanks, Erick. That really helped us in learning about tokens and how the 
Analyzer works. Thank you!

Warm regards,
Alex 


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 19 September 2012 3:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Wildcard searches don't work

Take a look at admin/analysis on the text_general type You'll see that 
StandardTokenizer is breaking the input strings up into individual tokens on 
the colons and hyphens, so 2010-01-27T00:00:00Z becomes the tokens
2010 01 27T00 00 00Z

admin/analysis should be your first reflex when you encounter things like this 
...

Best
Erick


On Wed, Sep 19, 2012 at 7:00 AM, Ahmet Arslan  wrote:
>> We're having difficulty with some wildcard searches in Solr 4.0Beta. 
>> We're using a copyField to write a "tdate" to a "text_general" field. 
>> We are using the default definition for the "text_general" field 
>> type.
>>
>> > indexed="true" stored="true" />
>> > type="text_general" indexed="true" stored="true" />
>>
>> > dest="date_text"/>
>>
>> Here's the sample data it holds:
>>
>> 2010-01-27T00:00:00Z
>> 2010-01-28T00:00:00Z
>> 2010-01-31T00:00:00Z
>>
>> We run these queries and they return the expected results:
>>
>> date_text:"2010*"
>> date_text:"2010-*"
>> date_text:"2010-01*"
>> date_text:"2010-01-*"
>>
>> However, when we run these, they return nothing. What are we doing 
>> wrong?
>>
>> date_text:"*-01-27"
>> date_text:"2010-*-27"
>> date_text:"2010-01-27*"
>
> I think in your case you need to use string type instead of text_general.
>
>

Correct way of storing longitude and latitude

2012-09-20 Thread Spadez

Hi,

Sorry for all the questions today but I paid a third party coder to develop
a schema for me but now that I have more of an understanding myself I have a
questions. 

The aim is to do spacial searching so in my schema I have this:




My site doesnt seem to submit via JSON to lat_lng_0_coordinate and
latlng_1_coordinate but nothing gets submitted to latlng.

Furthermore, there is no other mention of "latlng" in my schema.xml, so its
not like there is a combine or join function in there as far as I can see.

So, my question is, does latlng have a purpose or has the coder put it in in
error?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Correct-way-of-storing-longitude-and-latitude-tp4009228.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.0 - disappointing results sharding on 1 machine

2012-09-20 Thread Yonik Seeley

Depends on where the bottlenecks are I guess.

On a single system, increasing shards decreases throughput  (this
isn't specific to Solr).  The increased parallelism *can* decrease
latency to the degree that the parts that were parallelized outweigh
the overhead.

Going from one shard to two shards is also the most extreme case since
the unsharded case as no distributed overhead whatsoever.

What's the average CPU load during your tests?
How are you testing (i.e. how many requests are in progress at the same time?)
In your unsharded case, what's taking up the bulk of the time?

-Yonik
http://lucidworks.com


On Thu, Sep 20, 2012 at 9:39 AM, Tom Mortimer  wrote:
> Hi all,
>
> After reading 
> http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/ 
> , I thought I'd do my own experiments. I used 2M docs from wikipedia, indexed 
> in Solr 4.0 Beta on a standard EC2 large instance. I compared an unsharded 
> and 2-shard configuration (the latter set up with SolrCloud following the 
> http://wiki.apache.org/solr/SolrCloud example). I wrote a simple python 
> script to randomly throw queries from a hand-compiled list at Solr. The only 
> "extra" I had turned on was facets (on document category).
>
> To my surprise, the performance of the 2-shard configuration is almost 
> exactly half that of the unsharded index -
>
> unsharded
> 4983912891 results in 24920 searches; 0 errors
> 70.02 mean qps
> 0.35s mean query time, 2.25s max, 0.00s min
> 90%   of qtimes <= 0.83s
> 99%   of qtimes <= 1.42s
> 99.9% of qtimes <= 1.68s
>
> 2-shard
> 4990351660 results in 24501 searches; 0 errors
> 34.07 mean qps
> 0.66s mean query time, 694.20s max, 0.01s min
> 90%   of qtimes <= 1.19s
> 99%   of qtimes <= 2.12s
> 99.9% of qtimes <= 2.95s
>
> All caches were set to 4096 items, and performance looks ok in both cases 
> (hit ratios close to 1.0, 0 evictions). I gave the single VM -Xmx1G and each 
> shard VM -Xmx500M.
>
> I must be doing something stupid - surely this result is unexpected? Does 
> anybody have any thoughts where it might be going wrong?
>
> cheers,
> Tom
>

Re: Solr 4.0 - disappointing results sharding on 1 machine

2012-09-20 Thread Tom Mortimer

Before anyone asks, these results were obtained warm.

On 20 Sep 2012, at 14:39, Tom Mortimer  wrote:

> Hi all,
> 
> After reading 
> http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/ 
> , I thought I'd do my own experiments. I used 2M docs from wikipedia, indexed 
> in Solr 4.0 Beta on a standard EC2 large instance. I compared an unsharded 
> and 2-shard configuration (the latter set up with SolrCloud following the 
> http://wiki.apache.org/solr/SolrCloud example). I wrote a simple python 
> script to randomly throw queries from a hand-compiled list at Solr. The only 
> "extra" I had turned on was facets (on document category).
> 
> To my surprise, the performance of the 2-shard configuration is almost 
> exactly half that of the unsharded index - 
> 
> unsharded
> 4983912891 results in 24920 searches; 0 errors
> 70.02 mean qps
> 0.35s mean query time, 2.25s max, 0.00s min
> 90%   of qtimes <= 0.83s
> 99%   of qtimes <= 1.42s
> 99.9% of qtimes <= 1.68s
> 
> 2-shard
> 4990351660 results in 24501 searches; 0 errors
> 34.07 mean qps
> 0.66s mean query time, 694.20s max, 0.01s min
> 90%   of qtimes <= 1.19s
> 99%   of qtimes <= 2.12s
> 99.9% of qtimes <= 2.95s
> 
> All caches were set to 4096 items, and performance looks ok in both cases 
> (hit ratios close to 1.0, 0 evictions). I gave the single VM -Xmx1G and each 
> shard VM -Xmx500M.
> 
> I must be doing something stupid - surely this result is unexpected? Does 
> anybody have any thoughts where it might be going wrong?
> 
> cheers,
> Tom
>

solrcloud and csv import hangs

2012-09-20 Thread dan sutton

Hi,

I'm using Solr 4.0-BETA and trying to import a CSV file as follows:

curl http://localhost:8080/solr//update -d overwrite=false -d
commit=true -d stream.contentType='text/csv;charset=utf-8' -d
stream.url=file:///dir/file.csv

I have 2 tomcat servers running on different machines and a separate
zookeeper quorum (3  zoo servers, 2 on same machine).  This is a 1
shard core, replicated to the other machine.

It seems that for a 255K line file I have 170 docs on the server that
issued the command, but on the other, the index seems to grow
unbounded?

Has anyone been seen this, or been successful in using the CSV import
with solrcloud?

Cheers,
Dan

Re: "&" char in querystring

2012-09-20 Thread Jack Krupansky

Ah... you are probably not "encoding" the & and % in your URL, so they are 
being eaten when the URL is parsed. Use % followed by the 2-digit hex ASCII 
character code. & should be %26 and % should be %25.


-- Jack Krupansky

-Original Message- 
From: Gustav

Sent: Thursday, September 20, 2012 10:13 AM
To: solr-user@lucene.apache.org
Subject: Re: "&" char in querystring

Hello Jack,
My the fieldtype is configured as following:

   
 
   
   
   
 
   

What other filter could i use to preserve the "&" char?



Another problem that came up, is when i search for ?q="0,5%" it gives an
error:
HTTP Status 400 - missing query string

Probably because of the "%" char, is there any way to escape it?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/char-in-querystring-tp4009174p4009191.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: indexing issue

2012-09-20 Thread Jack Krupansky

You probably are using a "text" field which is tokenizing the input when 
this data should probably be a "string" (or "text" with the 
KeywordAnalyzer.)


-- Jack Krupansky

-Original Message- 
From: zainu

Sent: Thursday, September 20, 2012 5:49 AM
To: solr-user@lucene.apache.org
Subject: indexing issue

Dear fellows,
I have a field in solr with value '8E0061123-8E1'. Now when i seach '8E*',
it does return me all values starting with'8E' which is totally right but it
returns nothing when i search '8E0*'. I guess it is not indexing 8E0 or so.
I want to search with all combinations likes '8E', '8E0', '8E00', '8E006'
etc. But currently it return result only when i type 8E or comeplete
''8E0061123-8E1'...any idea??



--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-issue-tp4009122.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: "&" char in querystring

2012-09-20 Thread Gustav

Hello Jack, 
My the fieldtype is configured as following:


  



  


What other filter could i use to preserve the "&" char?



Another problem that came up, is when i search for ?q="0,5%" it gives an
error:
HTTP Status 400 - missing query string

Probably because of the "%" char, is there any way to escape it?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/char-in-querystring-tp4009174p4009191.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr Write workload

2012-09-20 Thread John, Phil (CSS)

But even with XA log, am I correct in thinking that the writes themselves will 
be mostly sequential?
 
Regards,
 
Phil.



From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
Sent: Thu 20/09/2012 14:09
To: solr-user@lucene.apache.org
Subject: Re: Solr Write workload



Hi,
Right, documents are buffered in jvm heap according to ramBufferSizeMB
setting before getting indexed.
But xa log doesn't do that I don't think.

Otis
--
Performance Monitoring - http://sematext.com/spm
On Sep 20, 2012 8:11 AM, "John, Phil (CSS)"  wrote:

> Hi,
>
> We're in the process of finalising the specification for our Solr cluster
> and just wanted to double check something:
>
> What is the major IO/write workload type in Solr?
>
> From what I understand, the main workload appears to be largely sequential
> appends to segments, rather than heavily biased towards random writes.
>
> Is that largely correct?
>
> The reason I'm asking is that we're looking at SSDs (primarily for read
> performance), but by having them in a RAID array we will lose TRIM support,
> which won't be as much of an issue if random writes are fairly low.
>
> Thanks,
>
> Phil.
>
>
> This email and any attachment to it are confidential.  Unless you are the
> intended recipient, you may not use, copy or disclose either the message or
> any information contained in the message. If you are not the intended
> recipient, you should delete this email and notify the sender immediately.
>
> Any views or opinions expressed in this email are those of the sender
> only, unless otherwise stated.  All copyright in any Capita material in
> this email is reserved.
>
> All emails, incoming and outgoing, may be recorded by Capita and monitored
> for legitimate business purposes.
>
> Capita exclude all liability for any loss or damage arising or resulting
> from the receipt, use or transmission of this email to the fullest extent
> permitted by law.
>

Re: Compond File Format Advice needed - On migration to 3.6.1

2012-09-20 Thread Jack Krupansky

Seriously, if you are having trouble finding the build file, I would suggest 
that you do a lot more homework reading and studying the available Solr and 
Lucene materials online before asking for further assistance.


Start with:
http://lucene.apache.org/solr/
http://lucene.apache.org/solr/versioncontrol.html
http://wiki.apache.org/solr/HowToConfigureEclipse

-- Jack Krupansky

-Original Message- 
From: Sujatha Arun

Sent: Thursday, September 20, 2012 2:34 AM
To: solr-user@lucene.apache.org
Subject: Re: Compond File Format Advice needed - On migration to 3.6.1

Thanks Jack . Yes, this seems so !!

However I would like to fix this at code level by setting the noCFSRatio to
1.0 . But in solr 3.6.1 i am not able to find the build.xml file .
I suppose the build process has been changed since 1.3 ,can you throw some
light on how I can build source code post this change .

In 1.3  , I used to change the code from src file and compile and build
from the same directory as the build.xml file ,however  all files seem to
be jarred now .Any pointers?

Regards
Sujatha

On Thu, Sep 20, 2012 at 5:36 AM, Jack Krupansky 
wrote:



You may simply be encountering the situation where the merge size is
greater than 10% of the index size, as per this comment in the code:

/** If a merged segment will be more than this percentage
*  of the total size of the index, leave the segment as
*  non-compound file even if compound file is enabled.
*  Set to 1.0 to always use CFS regardless of merge
*  size.  Default is 0.1. */
public void setNoCFSRatio(double noCFSRatio) {

Unfortunately there currently is no way for you to set the ratio higher in
Solr.

LogMergePolicy has the same issue.

There should be some wiki doc for this, but I couldn't find any.

-- Jack Krupansky

-Original Message- From: Sujatha Arun
Sent: Tuesday, September 18, 2012 10:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Compond File Format Advice needed - On migration to 3.6.1


anybody?

On Tue, Sep 18, 2012 at 10:42 PM, Sujatha Arun 
wrote:

 Hi ,


The default Index file creation format in 3.6.1 [migrating from 1.3]
 in-spite of setting the usecompoundfile to true seems to be to create 
non

compound files due to Lucene 2790
>

.

I have tried the following ,but everything seems to create non compound
files ..


   - set  compound file format to true
   - used the TieredMergePolicy, did not change maxMergeAtOnce and
   segmentsPerTier.
   - switched back to LogByteSizeMergePolicy but this also creates non

   compound files

We are in a situation where we have several cores and hence several
indexes ,and do not want to run into too many open files error. What can
be
done to switch to compound file format from the beginning or will this
TiredMergepolicy lead us to too many open files eventually?

Regards
Sujatha

Re: "&" char in querystring

2012-09-20 Thread Jack Krupansky

Use a field type whose analyzer preserves the &. What field type are you 
using?


-- Jack Krupansky

-Original Message- 
From: Gustav

Sent: Thursday, September 20, 2012 9:05 AM
To: solr-user@lucene.apache.org
Subject: "&" char in querystring

Good Morning Everyone!
Again, i need your help Lucene comunity!
I have a query string just like this: q="johnson & johnson" and when i use
debugQuery=true i realize that the Solrparse breaks the string exactly in
the "&" char, changing my query to q="Johnson", i would like to know, is
there any way to avoid this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/char-in-querystring-tp4009174.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr 4.0 - disappointing results sharding on 1 machine

2012-09-20 Thread Tom Mortimer

Hi all,

After reading 
http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/ , 
I thought I'd do my own experiments. I used 2M docs from wikipedia, indexed in 
Solr 4.0 Beta on a standard EC2 large instance. I compared an unsharded and 
2-shard configuration (the latter set up with SolrCloud following the 
http://wiki.apache.org/solr/SolrCloud example). I wrote a simple python script 
to randomly throw queries from a hand-compiled list at Solr. The only "extra" I 
had turned on was facets (on document category).

To my surprise, the performance of the 2-shard configuration is almost exactly 
half that of the unsharded index - 

unsharded
4983912891 results in 24920 searches; 0 errors
70.02 mean qps
0.35s mean query time, 2.25s max, 0.00s min
90%   of qtimes <= 0.83s
99%   of qtimes <= 1.42s
99.9% of qtimes <= 1.68s

2-shard
4990351660 results in 24501 searches; 0 errors
34.07 mean qps
0.66s mean query time, 694.20s max, 0.01s min
90%   of qtimes <= 1.19s
99%   of qtimes <= 2.12s
99.9% of qtimes <= 2.95s

All caches were set to 4096 items, and performance looks ok in both cases (hit 
ratios close to 1.0, 0 evictions). I gave the single VM -Xmx1G and each shard 
VM -Xmx500M.

I must be doing something stupid - surely this result is unexpected? Does 
anybody have any thoughts where it might be going wrong?

cheers,
Tom

Re: ID reference field - Needed but not searchable or retrievable

2012-09-20 Thread Tom Mortimer

Hi James,

If you don't want this field to be included in user searches, just omit it from 
the search configuration (e.g. if using eDisMax parser, don't put it in the qf 
list). To keep it out of search results, exclude it from the fl list. See

http://wiki.apache.org/solr/CommonQueryParameters
and
http://wiki.apache.org/solr/ExtendedDisMax

The overhead from storing it will most likely be very small, and as Micheal 
points out it means you could potentially reference documents both ways.

Not sure about the JSON question, but in XML 
"uniqueID:7" would remove the whole doc, not 
just the uniqueID field. 

Tom


On 20 Sep 2012, at 13:38, Spadez  wrote:

> Hi.
> 
> My SQL database assigns a uniqueID to each item. I want to keep this
> uniqueID assosiated to the items that are in Solr even though I wont ever
> need to display them or have them searchable. I do however what to be able
> to target specific items in Solr with it, for updating or deleting the
> record.
> 
> Right now I have this in my schema:
> 
> 
> However, since I dont want it searchable or stored it should be this:
> 
> 
> Firstly, is this the correct way of doing this? I saw mention of a "ignore"
> attibute that can be added.
> 
> Secondly, if I wanted to do updates to the fields using JSON by targetting
> the uniqueID can I still do something like this:
> "delete": { "uniqueid":"7" },   /* delete entry
> uniqueID=7 */
> 
> Thank you for any help you can give. Hope I explained it well enough.
> 
> James
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/ID-reference-field-Needed-but-not-searchable-or-retrievable-tp4009162.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: ramBufferSizeMB

2012-09-20 Thread Otis Gospodnetic

Hi,

And there is a wonderful report in SPM for Solr that shows how your index
changes over time in terms of size, index files, segments, indexed docs,
deleted docs... very useful for understanding what's going on at that
level.

Otis
--
Performance Monitoring - http://sematext.com/spm
On Sep 20, 2012 7:49 AM, "Erick Erickson"  wrote:

> > Is it correct that a segment file is ready for merging after a commit has
> > been done (e.g. using the autoCommit property), so I will see merges of
> 100
> > and up documents (and the index writer continues writing into a new
> segment
> > file)?
>
> Yes, merging won't happen until after a segment is closed. How big the
> segments
> are depends on the MergePolicy, of which there are several. Here's a great
> blog explaining that...
>
>
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
>
> Best
> Erick
>
> On Thu, Sep 20, 2012 at 5:17 AM, "Trym R. Møller"  wrote:
> > Hi
> >
> > Thanks a lot for your answer, Erick!
> >
> > I changed the value of the autoSoftCommit property and it had the
> expected
> > effect. It can be noted that this is per Core, so I get four getReader
> calls
> > when my Solr contains four cores per autoSoftCommit interval.
> >
> > Is it correct that a segment file is ready for merging after a commit has
> > been done (e.g. using the autoCommit property), so I will see merges of
> 100
> > and up documents (and the index writer continues writing into a new
> segment
> > file)?
> >
> > It looks like the segments are being merged into 6 MB files and when
> enough
> > into 60MB files and these again into 3,5GB files.
> >
> > Best regards Trym
> >
> > Den 19-09-2012 14:49, Erick Erickson skrev:
> >
> >> I _think_ the getReader calls are being triggered by the autoSoftCommit
> >> being
> >> at one second. If so, this is probably OK. But bumping that up would
> nail
> >> whether that's the case...
> >>
> >> About RamBufferSizeMB. This has nothing to do with the size of the
> >> segments!
> >> It's just how much memory is consumed before the RAMBuffer is flushed to
> >> the _currently open_ segment. So until a hard commit happens, the
> >> currently
> >> open segment will continue to grow as successive RAMBuffers are flushed.
> >>
> >> bq:  I expected that my Lucene index segment files would be a bit
> >> bigger than 1KB
> >>
> >> Is this a typo? The 512 is specifying MB..
> >>
> >> Best
> >> Erick
> >>
> >> On Wed, Sep 19, 2012 at 6:01 AM, "Trym R. Møller" 
> wrote:
> >>>
> >>> Hi
> >>>
> >>> Using SolrCloud I have added the following to solrconfig.xml (actually
> >>> the
> >>> node in zookeeper)
> >>>  512
> >>>
> >>> After that I expected that my Lucene index segment files would be a bit
> >>> bigger than 1KB as I'm indexing very small documents
> >>> Enabling the infoStream I see a lot of "flush at getReader" (one
> segment
> >>> of
> >>> the infoStream file pasted below)
> >>>
> >>> 1. Where can I look for why documents are flushed so frequently?
> >>> 2. Does it have anything to do with "getReader" and can I do anything
> so
> >>> Solr doesn't need to get a new reader so often?
> >>>
> >>> Any comments are most welcome.
> >>>
> >>> Best regards Trym
> >>>
> >>> Furthermore I have specified
> >>> 
> >>>   18
> >>> 
> >>> 
> >>>   1000
> >>> 
> >>>
> >>>
> >>> IW 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flush at
> >>> getReader
> >>> DW 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]:
> pool-12-thread-1
> >>> startFullFlush
> >>> DW 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: anyChanges?
> >>> numDocsInRam=7 deletes=false hasTickets:false
> pendingChangesInFullFlush:
> >>> false
> >>> DWFC 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]:
> >>> addFlushableState
> >>> DocumentsWriterPerThread [pendingDeletes=gen=0, segment=_kc,
> >>> aborting=false,
> >>> numDocsInRAM=7, deleteQueue=DWDQ: [ generation: 1 ]]
> >>> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flush
> postings
> >>> as
> >>> segment _kc numDocs=7
> >>> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: new segment
> has
> >>> 0
> >>> deleted docs
> >>> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: new segment
> has
> >>> no
> >>> vectors; norms; no docValues; prox; freqs
> >>> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]:
> >>> flushedFiles=[_kc_Lucene40_0.frq, _kc.fnm, _kc_Lucene40_0.tim,
> >>> _kc_nrm.cfs,
> >>> _kc.fdx, _kc.fdt, _kc_Lucene40_0.prx, _kc_nrm.cfe, _kc_Lucene40_0.tip]
> >>> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flushed
> >>> codec=Lucene40
> >>> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flushed:
> >>> segment=_kc ramUsed=0,095 MB newFlushedSize(includes docstores)=0,003
> MB
> >>> docs/MB=2.283,058
> >>>
> >
>

Re: Solr Write workload

2012-09-20 Thread Otis Gospodnetic

Hi,
Right, documents are buffered in jvm heap according to ramBufferSizeMB
setting before getting indexed.
But xa log doesn't do that I don't think.

Otis
--
Performance Monitoring - http://sematext.com/spm
On Sep 20, 2012 8:11 AM, "John, Phil (CSS)"  wrote:

> Hi,
>
> We're in the process of finalising the specification for our Solr cluster
> and just wanted to double check something:
>
> What is the major IO/write workload type in Solr?
>
> From what I understand, the main workload appears to be largely sequential
> appends to segments, rather than heavily biased towards random writes.
>
> Is that largely correct?
>
> The reason I'm asking is that we're looking at SSDs (primarily for read
> performance), but by having them in a RAID array we will lose TRIM support,
> which won't be as much of an issue if random writes are fairly low.
>
> Thanks,
>
> Phil.
>
>
> This email and any attachment to it are confidential.  Unless you are the
> intended recipient, you may not use, copy or disclose either the message or
> any information contained in the message. If you are not the intended
> recipient, you should delete this email and notify the sender immediately.
>
> Any views or opinions expressed in this email are those of the sender
> only, unless otherwise stated.  All copyright in any Capita material in
> this email is reserved.
>
> All emails, incoming and outgoing, may be recorded by Capita and monitored
> for legitimate business purposes.
>
> Capita exclude all liability for any loss or damage arising or resulting
> from the receipt, use or transmission of this email to the fullest extent
> permitted by law.
>

"&" char in querystring

2012-09-20 Thread Gustav

Good Morning Everyone!
Again, i need your help Lucene comunity! 
I have a query string just like this: q="johnson & johnson" and when i use
debugQuery=true i realize that the Solrparse breaks the string exactly in
the "&" char, changing my query to q="Johnson", i would like to know, is
there any way to avoid this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/char-in-querystring-tp4009174.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Split XML configuration

2012-09-20 Thread Michael Della Bitta

Hi, Simone:

"xi:include" directives work in Solr config files, but in most (all?)
versions of Solr, they require absolute paths, which makes portable
configuration slightly more sticky. Still, a very viable solution.

Michael Della Bitta

Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game

On Thu, Sep 20, 2012 at 7:10 AM, Finotti Simone  wrote:
> Hi,
>
> is it possible to split schema.xml and solrconfig.xml configurations? My 
> configurations are getting quite large and I'd like to be able to partition 
> them logically in multiple files.
>
> thank you in advance,
> S

Re: ID reference field - Needed but not searchable or retrievable

2012-09-20 Thread Michael Della Bitta

Hi, James,

If you don't store or index this value, it won't exist in Solr.

If you want to be able to find these records by the unique id, you
need to index it. If you want to find the corresponding DB record from
a Solr document you brought up by other means, you'll need to store
the unique id.

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Thu, Sep 20, 2012 at 8:38 AM, Spadez  wrote:
> Hi.
>
> My SQL database assigns a uniqueID to each item. I want to keep this
> uniqueID assosiated to the items that are in Solr even though I wont ever
> need to display them or have them searchable. I do however what to be able
> to target specific items in Solr with it, for updating or deleting the
> record.
>
> Right now I have this in my schema:
> 
>
> However, since I dont want it searchable or stored it should be this:
> 
>
> Firstly, is this the correct way of doing this? I saw mention of a "ignore"
> attibute that can be added.
>
> Secondly, if I wanted to do updates to the fields using JSON by targetting
> the uniqueID can I still do something like this:
> "delete": { "uniqueid":"7" },   /* delete entry
> uniqueID=7 */
>
> Thank you for any help you can give. Hope I explained it well enough.
>
> James
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/ID-reference-field-Needed-but-not-searchable-or-retrievable-tp4009162.html
> Sent from the Solr - User mailing list archive at Nabble.com.

ID reference field - Needed but not searchable or retrievable

2012-09-20 Thread Spadez

Hi.

My SQL database assigns a uniqueID to each item. I want to keep this
uniqueID assosiated to the items that are in Solr even though I wont ever
need to display them or have them searchable. I do however what to be able
to target specific items in Solr with it, for updating or deleting the
record.

Right now I have this in my schema:


However, since I dont want it searchable or stored it should be this:


Firstly, is this the correct way of doing this? I saw mention of a "ignore"
attibute that can be added.

Secondly, if I wanted to do updates to the fields using JSON by targetting
the uniqueID can I still do something like this:
"delete": { "uniqueid":"7" },   /* delete entry
uniqueID=7 */

Thank you for any help you can give. Hope I explained it well enough.

James



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ID-reference-field-Needed-but-not-searchable-or-retrievable-tp4009162.html
Sent from the Solr - User mailing list archive at Nabble.com.

Regarding Search

2012-09-20 Thread darshan

HI Fellows,

I had added the following fields in my data-config.xml to
implement Data Import Handler



  

  

   

  

  

  

  

  



When I perform steps of Full import Example at
http://wiki.apache.org/solr/DataImportHandler

I can successfully index on my database, BUT here the issue is after
visiting admin page for search and search for any text, I didn't receive any
output and the response is always 





0

2



on

0

admin

2.2

10









Please guide me on this , am a newbie to apache solr.

 

Thanks,

Darshan

Re: SOLR memory usage jump in JVM

2012-09-20 Thread Erick Erickson

Yeah, I sent a note to the web folks there about the images.

I'll leave the rest to people who really _understand_ all that stuff

On Thu, Sep 20, 2012 at 8:31 AM, Bernd Fehling
 wrote:
> Hi Erik,
>
> thanks for the link.
> Now if we could see the images in that article that would be great :-)
>
>
> By the way, one cause for the memory jumps was located as "killer search" 
> from a user.
> The interesting part is that the verbose gc.log showed a "hiccup" in the GC.
> Which means that during a GC run right after CMS-concurrent-sweep-start but 
> before
> CMS-concurrent-sweep there is a new GC launched which interferes with the 
> running one.
> Any switches for this to serialize GC?
>
>
> Regards
> Bernd
>
>
> Am 20.09.2012 13:51, schrieb Erick Erickson:
>> Here's a wonderful writeup about GC and memory in Solr/Lucene:
>>
>> http://searchhub.org/dev/2011/03/27/garbage-collection-bootcamp-1-0/
>>
>> Best
>> Erick
>>
>> On Thu, Sep 20, 2012 at 5:49 AM, Robert Muir  wrote:
>>> On Thu, Sep 20, 2012 at 3:09 AM, Bernd Fehling
>>>  wrote:
>>>
 By the way while looking for upgrading to JDK7, the release notes say 
 under section
 "known issues" about the "PorterStemmer" bug:
 "...The recommended workaround is to specify -XX:-UseLoopPredicate on the 
 command line."
 Is this still not fixed, or won't fix?
>>>
>>> How in the world can we fix it?
>>>
>>> Oracle released a broken java version: there's nothing we can do about
>>> that. Go take it up with them.
>>>
>>> --
>>> lucidworks.com

Re: SOLR memory usage jump in JVM

2012-09-20 Thread Bernd Fehling

Hi Erik,

thanks for the link.
Now if we could see the images in that article that would be great :-)

By the way, one cause for the memory jumps was located as "killer search" from 
a user.
The interesting part is that the verbose gc.log showed a "hiccup" in the GC.
Which means that during a GC run right after CMS-concurrent-sweep-start but 
before
CMS-concurrent-sweep there is a new GC launched which interferes with the 
running one.
Any switches for this to serialize GC?

Regards
Bernd

Am 20.09.2012 13:51, schrieb Erick Erickson:
> Here's a wonderful writeup about GC and memory in Solr/Lucene:
> 
> http://searchhub.org/dev/2011/03/27/garbage-collection-bootcamp-1-0/
> 
> Best
> Erick
> 
> On Thu, Sep 20, 2012 at 5:49 AM, Robert Muir  wrote:
>> On Thu, Sep 20, 2012 at 3:09 AM, Bernd Fehling
>>  wrote:
>>
>>> By the way while looking for upgrading to JDK7, the release notes say under 
>>> section
>>> "known issues" about the "PorterStemmer" bug:
>>> "...The recommended workaround is to specify -XX:-UseLoopPredicate on the 
>>> command line."
>>> Is this still not fixed, or won't fix?
>>
>> How in the world can we fix it?
>>
>> Oracle released a broken java version: there's nothing we can do about
>> that. Go take it up with them.
>>
>> --
>> lucidworks.com

Solr Write workload

2012-09-20 Thread John, Phil (CSS)

Hi,
 
We're in the process of finalising the specification for our Solr cluster and 
just wanted to double check something:
 
What is the major IO/write workload type in Solr?
 
>From what I understand, the main workload appears to be largely sequential 
>appends to segments, rather than heavily biased towards random writes.
 
Is that largely correct?
 
The reason I'm asking is that we're looking at SSDs (primarily for read 
performance), but by having them in a RAID array we will lose TRIM support, 
which won't be as much of an issue if random writes are fairly low.
 
Thanks,
 
Phil.


This email and any attachment to it are confidential.  Unless you are the 
intended recipient, you may not use, copy or disclose either the message or any 
information contained in the message. If you are not the intended recipient, 
you should delete this email and notify the sender immediately.

Any views or opinions expressed in this email are those of the sender only, 
unless otherwise stated.  All copyright in any Capita material in this email is 
reserved.

All emails, incoming and outgoing, may be recorded by Capita and monitored for 
legitimate business purposes. 

Capita exclude all liability for any loss or damage arising or resulting from 
the receipt, use or transmission of this email to the fullest extent permitted 
by law.

Re: indexing issue

2012-09-20 Thread Erick Erickson

Not enough info to go on here, what is your fieldType?

But the first place to look is admin/analysis to see how the
text is tokenized.

Best
Erick

On Thu, Sep 20, 2012 at 5:49 AM, zainu  wrote:
> Dear fellows,
> I have a field in solr with value '8E0061123-8E1'. Now when i seach '8E*',
> it does return me all values starting with'8E' which is totally right but it
> returns nothing when i search '8E0*'. I guess it is not indexing 8E0 or so.
> I want to search with all combinations likes '8E', '8E0', '8E00', '8E006'
> etc. But currently it return result only when i type 8E or comeplete
> ''8E0061123-8E1'...any idea??
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/indexing-issue-tp4009122.html
> Sent from the Solr - User mailing list archive at Nabble.com.

indexing issue

2012-09-20 Thread zainu

Dear fellows,
I have a field in solr with value '8E0061123-8E1'. Now when i seach '8E*',
it does return me all values starting with'8E' which is totally right but it
returns nothing when i search '8E0*'. I guess it is not indexing 8E0 or so.
I want to search with all combinations likes '8E', '8E0', '8E00', '8E006'
etc. But currently it return result only when i type 8E or comeplete
''8E0061123-8E1'...any idea??



--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-issue-tp4009122.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SOLR memory usage jump in JVM

2012-09-20 Thread Erick Erickson

Here's a wonderful writeup about GC and memory in Solr/Lucene:

http://searchhub.org/dev/2011/03/27/garbage-collection-bootcamp-1-0/

Best
Erick

On Thu, Sep 20, 2012 at 5:49 AM, Robert Muir  wrote:
> On Thu, Sep 20, 2012 at 3:09 AM, Bernd Fehling
>  wrote:
>
>> By the way while looking for upgrading to JDK7, the release notes say under 
>> section
>> "known issues" about the "PorterStemmer" bug:
>> "...The recommended workaround is to specify -XX:-UseLoopPredicate on the 
>> command line."
>> Is this still not fixed, or won't fix?
>
> How in the world can we fix it?
>
> Oracle released a broken java version: there's nothing we can do about
> that. Go take it up with them.
>
> --
> lucidworks.com

Re: ramBufferSizeMB

2012-09-20 Thread Erick Erickson

> Is it correct that a segment file is ready for merging after a commit has
> been done (e.g. using the autoCommit property), so I will see merges of 100
> and up documents (and the index writer continues writing into a new segment
> file)?

Yes, merging won't happen until after a segment is closed. How big the segments
are depends on the MergePolicy, of which there are several. Here's a great
blog explaining that...

http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Best
Erick

On Thu, Sep 20, 2012 at 5:17 AM, "Trym R. Møller"  wrote:
> Hi
>
> Thanks a lot for your answer, Erick!
>
> I changed the value of the autoSoftCommit property and it had the expected
> effect. It can be noted that this is per Core, so I get four getReader calls
> when my Solr contains four cores per autoSoftCommit interval.
>
> Is it correct that a segment file is ready for merging after a commit has
> been done (e.g. using the autoCommit property), so I will see merges of 100
> and up documents (and the index writer continues writing into a new segment
> file)?
>
> It looks like the segments are being merged into 6 MB files and when enough
> into 60MB files and these again into 3,5GB files.
>
> Best regards Trym
>
> Den 19-09-2012 14:49, Erick Erickson skrev:
>
>> I _think_ the getReader calls are being triggered by the autoSoftCommit
>> being
>> at one second. If so, this is probably OK. But bumping that up would nail
>> whether that's the case...
>>
>> About RamBufferSizeMB. This has nothing to do with the size of the
>> segments!
>> It's just how much memory is consumed before the RAMBuffer is flushed to
>> the _currently open_ segment. So until a hard commit happens, the
>> currently
>> open segment will continue to grow as successive RAMBuffers are flushed.
>>
>> bq:  I expected that my Lucene index segment files would be a bit
>> bigger than 1KB
>>
>> Is this a typo? The 512 is specifying MB..
>>
>> Best
>> Erick
>>
>> On Wed, Sep 19, 2012 at 6:01 AM, "Trym R. Møller"  wrote:
>>>
>>> Hi
>>>
>>> Using SolrCloud I have added the following to solrconfig.xml (actually
>>> the
>>> node in zookeeper)
>>>  512
>>>
>>> After that I expected that my Lucene index segment files would be a bit
>>> bigger than 1KB as I'm indexing very small documents
>>> Enabling the infoStream I see a lot of "flush at getReader" (one segment
>>> of
>>> the infoStream file pasted below)
>>>
>>> 1. Where can I look for why documents are flushed so frequently?
>>> 2. Does it have anything to do with "getReader" and can I do anything so
>>> Solr doesn't need to get a new reader so often?
>>>
>>> Any comments are most welcome.
>>>
>>> Best regards Trym
>>>
>>> Furthermore I have specified
>>> 
>>>   18
>>> 
>>> 
>>>   1000
>>> 
>>>
>>>
>>> IW 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flush at
>>> getReader
>>> DW 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: pool-12-thread-1
>>> startFullFlush
>>> DW 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: anyChanges?
>>> numDocsInRam=7 deletes=false hasTickets:false pendingChangesInFullFlush:
>>> false
>>> DWFC 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]:
>>> addFlushableState
>>> DocumentsWriterPerThread [pendingDeletes=gen=0, segment=_kc,
>>> aborting=false,
>>> numDocsInRAM=7, deleteQueue=DWDQ: [ generation: 1 ]]
>>> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flush postings
>>> as
>>> segment _kc numDocs=7
>>> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: new segment has
>>> 0
>>> deleted docs
>>> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: new segment has
>>> no
>>> vectors; norms; no docValues; prox; freqs
>>> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]:
>>> flushedFiles=[_kc_Lucene40_0.frq, _kc.fnm, _kc_Lucene40_0.tim,
>>> _kc_nrm.cfs,
>>> _kc.fdx, _kc.fdt, _kc_Lucene40_0.prx, _kc_nrm.cfe, _kc_Lucene40_0.tip]
>>> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flushed
>>> codec=Lucene40
>>> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flushed:
>>> segment=_kc ramUsed=0,095 MB newFlushedSize(includes docstores)=0,003 MB
>>> docs/MB=2.283,058
>>>
>

Re: how to set Automatic dataimport in solr 4.0

2012-09-20 Thread Erick Erickson

Well, from the bullet points on the Wiki page:
Planned to be included in Solr_4.1

The JIRA referenced points to a Jar that Marko kindly provides,
you can try that.

Best
Erick


On Wed, Sep 19, 2012 at 10:22 PM, rayvicky  wrote:
> dataimport.properties
> #Thu Sep 20 10:11:09 CST 2012
> interval=1
> port=8081
> server=localhost
> doc.id=
> params=/select?qt\=/dataimport&command\=delta-import&clean\=false&commit\=true
> webapp=solr
> syncEnabled=1
> last_index_time=2012-09-20 10\:11\:04
> doc.last_index_time=2012-09-20 10\:11\:04
> syncCores=
>
> it can't  Automatic  dataimport.
> who can tell me why?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-to-set-Automatic-dataimport-in-solr-4-0-tp4009075.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Split XML configuration

2012-09-20 Thread Finotti Simone

Hi,

is it possible to split schema.xml and solrconfig.xml configurations? My 
configurations are getting quite large and I'd like to be able to partition 
them logically in multiple files.

thank you in advance,
S

Problems with SolrEnitityProcessor + frange filterQuery

2012-09-20 Thread Dirceu Vieira

Hi,

I'm attempting to write a filter query for my SolrEntityProcessor using
{frange} over a function.
It works fine when I'm testing it on the admin, but once I move it into my
data-config.xml the query blows up because of the commas in the function.
The problem is that fq parameter can be a comma separated list, which means
that if I have commas within my query, it'll try to split it into multiple
filter queries.

Does anybody knows a way of escaping the comma or another way I can work
around that?

I've been using SolrEntityProcessor to import filtered data from a core to
another, here's the queries:

query="status:1 AND NOT priority:\-1"
fq="{!frange l=3000 u=5000}max(sum(suser_count), sum(user_count))"

I'm using Solr-4.0.0-BETA.



Best regards,

-- 
Dirceu Vieira Júnior
---
+47 9753 2473
dirceuvjr.blogspot.com
twitter.com/dirceuvjr

Re: SOLR memory usage jump in JVM

2012-09-20 Thread Robert Muir

On Thu, Sep 20, 2012 at 3:09 AM, Bernd Fehling
 wrote:

> By the way while looking for upgrading to JDK7, the release notes say under 
> section
> "known issues" about the "PorterStemmer" bug:
> "...The recommended workaround is to specify -XX:-UseLoopPredicate on the 
> command line."
> Is this still not fixed, or won't fix?

How in the world can we fix it?

Oracle released a broken java version: there's nothing we can do about
that. Go take it up with them.

-- 
lucidworks.com

Dynamically field selection for Solr Suggestion (Spellcheck) multiple term query

2012-09-20 Thread zbindigonzales

Hello everybode. I already posted this question on stackoverflow but didn't
get an answer.

I am using the solr suggestion component with the following configuration:

schema.xml

solrconfig.xml

suggest
org.apache.solr.spelling.suggest.Suggester
org.apache.solr.spelling.suggest.tst.TSTLookup
spell
true

true
suggest
true
6
true
true
true
6
1000
true
100%

suggest
query

As you can see there is a field spell wich i am using for a suggestion
queries. This works great even for multiple term queries.

But what I need is to search on selected fields only.
So for example I want valid suggestions only for the fields image_memo and
username The user can dynamicly add and remove fields to search.

I know that I could do something like this:

q=(image_memo:*search* OR image_username:*search*)

But this is is slowing down dramtically if you got a lot of fields and a
multiple term query.

Example: Searching in Field memo, username, field, field1 and field2 for
term, term1 and term2.

((memo:term OR username:term OR field:term OR field1:term OR field2:term)
AND (memo:term1 OR username:term1 OR field:term1 OR field1:term1 OR
field2:term1) AND (memo:term2 OR username:term2 OR field:term2 OR
field1:term2 OR field2:term2))

Is there any way to dynamically select the spell fields. Or is there a way
that I can search for specific fields only in a multivalued field

I am using Apach Solr 4 Alpha.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Dynamically-field-selection-for-Solr-Suggestion-Spellcheck-multiple-term-query-tp4009120.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ramBufferSizeMB

2012-09-20 Thread Trym R. Møller


Hi

Thanks a lot for your answer, Erick!

I changed the value of the autoSoftCommit property and it had the 
expected effect. It can be noted that this is per Core, so I get four 
getReader calls when my Solr contains four cores per autoSoftCommit 
interval.


Is it correct that a segment file is ready for merging after a commit 
has been done (e.g. using the autoCommit property), so I will see merges 
of 100 and up documents (and the index writer continues writing into a 
new segment file)?


It looks like the segments are being merged into 6 MB files and when 
enough into 60MB files and these again into 3,5GB files.


Best regards Trym

Den 19-09-2012 14:49, Erick Erickson skrev:

I _think_ the getReader calls are being triggered by the autoSoftCommit being
at one second. If so, this is probably OK. But bumping that up would nail
whether that's the case...

About RamBufferSizeMB. This has nothing to do with the size of the segments!
It's just how much memory is consumed before the RAMBuffer is flushed to
the _currently open_ segment. So until a hard commit happens, the currently
open segment will continue to grow as successive RAMBuffers are flushed.

bq:  I expected that my Lucene index segment files would be a bit
bigger than 1KB

Is this a typo? The 512 is specifying MB..

Best
Erick

On Wed, Sep 19, 2012 at 6:01 AM, "Trym R. Møller"  wrote:

Hi

Using SolrCloud I have added the following to solrconfig.xml (actually the
node in zookeeper)
 512

After that I expected that my Lucene index segment files would be a bit
bigger than 1KB as I'm indexing very small documents
Enabling the infoStream I see a lot of "flush at getReader" (one segment of
the infoStream file pasted below)

1. Where can I look for why documents are flushed so frequently?
2. Does it have anything to do with "getReader" and can I do anything so
Solr doesn't need to get a new reader so often?

Any comments are most welcome.

Best regards Trym

Furthermore I have specified

  18


  1000



IW 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flush at getReader
DW 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: pool-12-thread-1
startFullFlush
DW 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: anyChanges?
numDocsInRam=7 deletes=false hasTickets:false pendingChangesInFullFlush:
false
DWFC 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: addFlushableState
DocumentsWriterPerThread [pendingDeletes=gen=0, segment=_kc, aborting=false,
numDocsInRAM=7, deleteQueue=DWDQ: [ generation: 1 ]]
DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flush postings as
segment _kc numDocs=7
DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: new segment has 0
deleted docs
DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: new segment has no
vectors; norms; no docValues; prox; freqs
DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]:
flushedFiles=[_kc_Lucene40_0.frq, _kc.fnm, _kc_Lucene40_0.tim, _kc_nrm.cfs,
_kc.fdx, _kc.fdt, _kc_Lucene40_0.prx, _kc_nrm.cfe, _kc_Lucene40_0.tip]
DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flushed
codec=Lucene40
DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flushed:
segment=_kc ramUsed=0,095 MB newFlushedSize(includes docstores)=0,003 MB
docs/MB=2.283,058

Re: what happends with slave during repliacation?

2012-09-20 Thread Bernd Fehling

Hi Alex,
during replication the slave is still available and serving requests but
as you can imagine the responses will be slower because of disk usage,
even with 15k rpm disks.

We have one master and two slaves. Master only for indexing, slaves for 
searching.
Only one slave is online the other is for backup. The backup gets replicated 
first.
After that the servers will be switched and the online becomes backup.

Regards
Bernd

Am 20.09.2012 09:39, schrieb Alex:
> Hi All!
> I want to replicate my Solr server.
> At the begining I want to have one master and one slave.  Master would serve 
> for indexing and slave (slaves in the future) would be used for
> searching. I was wondering if anybody could tell me what happens with slave 
> during replication. Is it unavailable? Could it serve searches? What
> happens if it has to replicate huge amount of data?
> 
> Regards,
> Alex

RE: what happends with slave during repliacation?

2012-09-20 Thread Harshvardhan Ojha

Hi Alex,

During replication also your slave will be available for searches and opens a 
new searcher just after replication. You won't get any downtime, but you might 
not have warmed cache at the moment. Please look into cache configuration for 
solr.

Regards
Harshvardhan OJha

-Original Message-
From: Alex [mailto:lot...@gmail.com] 
Sent: Thursday, September 20, 2012 1:09 PM
To: solr-user@lucene.apache.org
Subject: what happends with slave during repliacation?

Hi All!
I want to replicate my Solr server.
At the begining I want to have one master and one slave.  Master would serve 
for indexing and slave (slaves in the future) would be used for searching. I 
was wondering if anybody could tell me what happens with slave during 
replication. Is it unavailable? Could it serve searches? 
What happens if it has to replicate huge amount of data?

Regards,
Alex

RE: Nodes cannot recover and become unavailable

2012-09-20 Thread Markus Jelsma

Hi - at first i didn't recreate the Zookeeper data but i got it to work. I'll 
check the removal of the LOG line.

thanks
 
-Original message-
> From:Sami Siren 
> Sent: Wed 19-Sep-2012 17:45
> To: solr-user@lucene.apache.org
> Subject: Re: Nodes cannot recover and become unavailable
> 
> also, did you re create the cluster after upgrading to a newer
> version? I believe there were some changes made to the
> clusterstate.json recently that are not backwards compatible.
> 
> --
>  Sami Siren
> 
> 
> 
> On Wed, Sep 19, 2012 at 6:21 PM, Sami Siren  wrote:
> > Hi,
> >
> > I am having troubles understanding the reason for that NPE.
> >
> > First you could try removing the line #102 in HttpClientUtility so
> > that logging does not prevent creation of the http client in
> > SyncStrategy.
> >
> > --
> >  Sami Siren
> >
> > On Wed, Sep 19, 2012 at 5:29 PM, Markus Jelsma
> >  wrote:
> >> Hi,
> >>
> >> Since the 2012-09-17 11:10:41 build shards start to have trouble coming 
> >> back online. When i restart one node the slices on the other nodes are 
> >> throwing exceptions and cannot be queried. I'm not sure how to remedy the 
> >> problem but stopping a node or restarting it a few times seems to help it. 
> >> The problem is when i restart a node, and it happens, i must not restart 
> >> another node because that may trigger other slices becoming unavailable.
> >>
> >> Here are some parts of the log:
> >>
> >> 2012-09-19 14:13:18,149 ERROR [solr.cloud.RecoveryStrategy] - 
> >> [RecoveryThread] - : Recovery failed - trying again... core=oi_i
> >> 2012-09-19 14:13:25,818 WARN [solr.cloud.RecoveryStrategy] - 
> >> [main-EventThread] - : Stopping recovery for 
> >> zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
> >> 2012-09-19 14:13:44,497 WARN [solr.cloud.RecoveryStrategy] - [Thread-4] - 
> >> : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_jcore=oi_j
> >> 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
> >> [RecoveryThread] - : Error while trying to recover. 
> >> core=oi_i:org.apache.solr.common.SolrException: We are not the leader
> >> at 
> >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402)
> >> at 
> >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
> >> at 
> >> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199)
> >> at 
> >> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388)
> >> at 
> >> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
> >>
> >> 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
> >> [RecoveryThread] - : Recovery failed - trying again... core=oi_i
> >> 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
> >> [RecoveryThread] - : Recovery failed - max retries exceeded. core=oi_i
> >> 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - 
> >> [RecoveryThread] - : Recovery failed - I give up. core=oi_i
> >> 2012-09-19 14:14:00,333 WARN [solr.cloud.RecoveryStrategy] - 
> >> [RecoveryThread] - : Stopping recovery for 
> >> zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
> >>  ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : Sync request 
> >> error: java.lang.NullPointerException
> >>  ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : 
> >> http://nl10.host:8080/solr/oi_i/: Could not tell a replica to 
> >> recover:java.lang.NullPointerException
> >> at 
> >> org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305)
> >> at 
> >> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:102)
> >> at 
> >> org.apache.solr.client.solrj.impl.HttpSolrServer.(HttpSolrServer.java:155)
> >> at 
> >> org.apache.solr.client.solrj.impl.HttpSolrServer.(HttpSolrServer.java:128)
> >> at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:262)
> >> at 
> >> org.apache.solr.cloud.SyncStrategy.requestRecovery(SyncStrategy.java:272)
> >> at 
> >> org.apache.solr.cloud.SyncStrategy.syncToMe(SyncStrategy.java:203)
> >> at 
> >> org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:125)
> >> at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:87)
> >> at 
> >> org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:169)
> >> at 
> >> org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158)
> >> at 
> >> org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102)
> >> at 
> >> org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:275)
> >> at 
> >> org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:326)
> >> at 
> >> org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:159)
> >> at 
> >> org.apache.solr.cloud.Leade

what happends with slave during repliacation?

2012-09-20 Thread Alex


Hi All!
I want to replicate my Solr server.
At the begining I want to have one master and one slave.  Master would 
serve for indexing and slave (slaves in the future) would be used for 
searching. I was wondering if anybody could tell me what happens with 
slave during replication. Is it unavailable? Could it serve searches? 
What happens if it has to replicate huge amount of data?


Regards,
Alex

Re: SOLR memory usage jump in JVM

2012-09-20 Thread Bernd Fehling

That is the problem with a jvm, it is a virtual machine.
Ask 10 experts about a good jvm settings and you get 15 answers. May be a 
tradeoff
of the flexibility of jvm's. There is always a right setting for any application
running on a jvm but you just have to find it.
How about a Solr Wiki page about JVM settings for Solr?
The good, the bad and the ugly?
With a very short describtion why to set it (or not) and what it will affect?

By the way while looking for upgrading to JDK7, the release notes say under 
section
"known issues" about the "PorterStemmer" bug:
"...The recommended workaround is to specify -XX:-UseLoopPredicate on the 
command line."
Is this still not fixed, or won't fix?
So this could be a candidate for an entry about JVM settings on the wiki page.

Regards
Bernd

Am 19.09.2012 18:14, schrieb Rozdev29:
> I have used this setting to reduce gc pauses with CMS - java 6 u23
> 
> XX:+ParallelRefProcEnabled
> 
> With this setting, jvm does gc of weakrefs with multiple threads and pauses 
> are low.
> 
> Please use this option only when you have multiple cores.
> 
> For me, CMS gives better results
> 
> Sent from my iPhone
> 
> On Sep 19, 2012, at 8:50 AM, Walter Underwood  wrote:
> 
>> Ooh, that is a nasty one. Is this JDK 7 only or also in 6?
>>
>> It looks like the "-XX:ConcGCThreads=1" option is a workaround, is that 
>> right?
>>
>> We've had some 1.6 JVMs behave in the same way that bug describes, but I 
>> haven't verified it is because of finalizer problems.
>>
>> wunder
>>
>> On Sep 19, 2012, at 5:43 AM, Erick Erickson wrote:
>>
>>> Two in one morning
>>>
>>> The JVM bug I'm familiar with is here:
>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034
>>>
>>> FWIW,
>>> Erick
>>>
>>> On Wed, Sep 19, 2012 at 8:20 AM, Shawn Heisey  wrote:
 On 9/18/2012 9:29 PM, Lance Norskog wrote:
>
> There is a known JVM garbage collection bug that causes this. It has to do
> with reclaiming Weak references, I think in WeakHashMap. Concurrent 
> garbage
> collection collides with this bug and the result is that old field cache
> data is retained after closing the index. The bug is more common with more
> processors doing GC simultaneously.
>
> The symptom is that when you run a monitor, the memory usage rises to a
> peak, drops to a floor, rises again in the classic sawtooth pattern. When
> the GC bug happens, the ceiling becomes the floor, and the sawtooth goes
> from the new floor to a new ceiling. The two sizes are the same. So, 2G to
> 5G, over and over, suddenly it is 5G to 8G, over and over.
>
> The bug is fixed in recent Java 7 releases. I'm sorry, but I cannot find
> the bug number.

 I think I ran into this when I was looking at memory usage on my SolrJ
 indexing program.  Under Java6, memory usage in jconsole (remotely via JMX)
 was fairly constant long-term (aside from the unavoidable sawtooth).  When 
 I
 ran it under Java 7u3, it would continually grow, slowly ... but if I
 measured it with jstat on the Linux commandline rather than remotely via
 jconsole under windows, memory usage was consistent over time, just like
 under java6 with the remote jconsole.  After looking at heap dumps and
 scratching my head a lot, I finally concluded that I did not have a memory
 leak, there was a problem with remote JMX monitoring in java7.  Glad to 
 hear
 I was not imagining it, and that it's fixed now.

 Thanks,
 Shawn

>>
>> --
>> Walter Underwood
>> wun...@wunderwood.org
>>
>>
>>
>

Solr 3.6 observe connections in CLOSE_WAIT state

2012-09-20 Thread Alok Bhandari


Hello,

I am using solr 3.6.0 , I have observed many connection in CLOSE_WAIT state
after using solr server for some time. On further analysis and googling
found that I need to close the idle connections from the client which is
connecting to solr to query data and it does reduce the number of CLOSE_WAIT
connections but still some connection remain in that state.

I am using 2 shards and one observation is that if I don't use shards then I
am getting 0 CLOSE_WAIT connections. Need help of this as we need to use
distributed search using shards.

Thanks 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-6-observe-connections-in-CLOSE-WAIT-state-tp4009097.html
Sent from the Solr - User mailing list archive at Nabble.com.

68 matches

Mail list logo