Re: How to view number of backups available

2014-08-01 Thread Shalin Shekhar Mangar
No, there is no such API. It will be a good improvement though. Mind
creating a Jira issue?


On Sat, Aug 2, 2014 at 8:35 AM, Ramana OpenSource <
ramanaopensou...@gmail.com> wrote:

> Hi All,
>
> I am using Replication backup command to create snapshot of my index.
>
> http://localhost:8983/solr/replication?command=backup&numberToKeep=2
>
> At any point, If I would like to know how many number of back ups
> available, do we have any API that supports this ?
>
> The close one i see is
> http://localhost:8983/solr/replication?command=details
>
> But the above URL gives overview of snapshots available. It doesn't say how
> many number of snapshots available.
>
> 
> Sat Aug 02 08:33:37 IST 2014
> 24
> success
> Sat Aug 02 08:33:37 IST 2014
> 
> 
>
> Please suggest.
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr vs ElasticSearch

2014-08-01 Thread Jack Krupansky
Elasticsearch and Solr are both based on Lucene, so a sizeable fraction of 
performance will be similar if not identical.


IOW, they are both using the same "search engine" under the hood.

Sure, the right "tires", "transmission", and "body" can make a big 
difference in performance as well, but the engine is the point to focus on.


Back when ES first came out, Solr was not so easily scalable and ES was 
"cool" because it had a cleaner JSON-based REST API.


But now, Solr has SolrCloud and supports JSON for both input documents and 
query results, so... the differences are a lot more muted.


I would say that ES does still have a cleaner REST API, but I'm not sure how 
much that really matters for most use cases. Clearly it matters to some 
people, but I suspect a lot of people are gravitating to ES solely because 
they hear people say "You've got to check out Elasticsearch!" rather than 
for some clear and obvious benefit in terms of features, performance, and 
scalability.


-- Jack Krupansky

-Original Message- 
From: Salman Akram

Sent: Friday, August 1, 2014 1:35 AM
To: Solr Group
Subject: Re: Solr vs ElasticSearch

I did see that earlier. My main concern is search
performance/scalability/throughput which unfortunately that article didn't
address. Any benchmarks or comments about that?

We are already using SOLR but there has been a push to check elasticsearch.
All the benchmarks I have seen are at least few years old.


On Fri, Aug 1, 2014 at 4:59 AM, Otis Gospodnetic 
wrote:



Not super fresh, but more recent than the 2 links you sent:
http://blog.sematext.com/2012/08/23/solr-vs-elasticsearch-part-1-overview/

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Thu, Jul 31, 2014 at 10:33 PM, Salman Akram <
salman.ak...@northbaysolutions.net> wrote:

> This is quite an old discussion. Wanted to check any new comparisons
after
> SOLR 4 especially with regards to performance/scalability/throughput?
>
>
> On Tue, Jul 26, 2011 at 7:33 PM, Peter  wrote:
>
> > Have a look:
> >
> >
> >
>
http://stackoverflow.com/questions/2271600/elasticsearch-sphinx-lucene-solr-xapian-which-fits-for-which-usage
> >
> >
http://karussell.wordpress.com/2011/05/12/elasticsearch-vs-solr-lucene/
> >
> > Regards,
> > Peter.
> >
> > --
> > View this message in context:
> >
>
http://lucene.472066.n3.nabble.com/Solr-vs-ElasticSearch-tp3009181p3200492.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
>
> --
> Regards,
>
> Salman Akram
>





--
Regards,

Salman Akram 



How to view number of backups available

2014-08-01 Thread Ramana OpenSource
Hi All,

I am using Replication backup command to create snapshot of my index.

http://localhost:8983/solr/replication?command=backup&numberToKeep=2

At any point, If I would like to know how many number of back ups
available, do we have any API that supports this ?

The close one i see is
http://localhost:8983/solr/replication?command=details

But the above URL gives overview of snapshots available. It doesn't say how
many number of snapshots available.


Sat Aug 02 08:33:37 IST 2014
24
success
Sat Aug 02 08:33:37 IST 2014



Please suggest.


Re: Question on multi-threaded faceting

2014-08-01 Thread Vamsee Yarlagadda
I filed https://issues.apache.org/jira/browse/SOLR-6314 to track this issue
going forward.
Any ideas around this problem?

Thanks,
Vamsee


On Tue, Jul 29, 2014 at 4:00 PM, Vamsee Yarlagadda 
wrote:

> Hi,
>
> I am trying to work with multi-threaded faceting on SolrCloud and in
> the process i was hit by some issues.
>
> I am currently running the below upstream test on different SolrCloud
> configurations and i am getting a different result set per configuration.
>
> https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/request/TestFaceting.java#L654
>
> Setup:
> ** Indexed 50 docs into SolrCloud.*
>
> ** If the SolrCloud has only 1 shard, the facet field query has the below
> output (which matches with the expected upstream test output - # facet
> fields ~ 50
> ).*
>
>
>> $ curl  "
>> http://localhost:8983/solr/collection1/select?facet=true&fl=id&indent=true&q=id%3A*&facet.limit=-1&facet.threads=1000&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&rows=1&wt=xml
>> "
>>
>> 
>> 
>> 
>>   0
>>   21
>>   
>> true
>> id
>> true
>> id:*
>> -1
>> 1000
>> 
>
>   f0_ws
>>   f0_ws
>>   f0_ws
>>   f0_ws
>>   f0_ws
>>   f1_ws
>>   f1_ws
>>   f1_ws
>>   f1_ws
>>   f1_ws
>>   f2_ws
>>   f2_ws
>>   f2_ws
>>   f2_ws
>>   f2_ws
>>   f3_ws
>>   f3_ws
>>   f3_ws
>>   f3_ws
>>   f3_ws
>>   f4_ws
>>   f4_ws
>>   f4_ws
>>   f4_ws
>>   f4_ws
>>   f5_ws
>>   f5_ws
>>   f5_ws
>>   f5_ws
>>   f5_ws
>>   f6_ws
>>   f6_ws
>>   f6_ws
>>   f6_ws
>>   f6_ws
>>   f7_ws
>>   f7_ws
>
>   f7_ws
>>   f7_ws
>>   f7_ws
>>   f8_ws
>>   f8_ws
>>   f8_ws
>>   f8_ws
>>   f8_ws
>>   f9_ws
>>   f9_ws
>>   f9_ws
>>   f9_ws
>>   f9_ws
>> 
>> xml
>> 1
>>   
>> 
>> 
>>   
>> 0.0
>> 
>> 
>>   
>>   
>> 
>>   25
>>   25
>> 
>> 
>>   25
>>   25
>> 
>> 
>>   25
>>   25
>> 
>> 
>>   25
>>   25
>> 
>> 
>>   25
>>   25
>> 
>> 
>>   33
>>   17
>> 
>> 
>>   33
>>   17
>> 
>> 
>>   33
>>   17
>> 
>> 
>>   33
>>   17
>> 
>> 
>>   33
>>   17
>> 
>> 
>>   37
>>   13
>> 
>> 
>>   37
>>   13
>> 
>> 
>>   37
>>   13
>> 
>> 
>>   37
>>   13
>> 
>> 
>>   37
>>   13
>> 
>> 
>>   40
>>   10
>> 
>
>
>> 
>>   40
>>   10
>> 
>> 
>>   40
>>   10
>> 
>> 
>>   40
>>   10
>> 
>> 
>>   40
>>   10
>> 
>> 
>>   41
>>   9
>> 
>> 
>>   41
>>   9
>> 
>> 
>>   41
>>   9
>> 
>> 
>>   41
>>   9
>> 
>> 
>>   41
>>   9
>> 
>> 
>>   42
>>   8
>> 
>> 
>>   42
>>   8
>> 
>> 
>>   42
>>   8
>> 
>> 
>>   42
>>   8
>> 
>> 
>>   42
>>   8
>> 
>> 
>>   43
>>   7
>
>
>> 
>> 
>>   43
>>   7
>> 
>> 
>>   43
>>   7
>> 
>> 
>>   43
>>   7
>> 
>> 
>>   43
>>   7
>> 
>> 
>>   44
>>   6
>> 
>> 
>>   44
>>   6
>> 
>> 
>>   44
>>   6
>> 
>> 
>>   44
>>   6
>> 
>> 
>>   44
>>   6
>> 
>> 
>>   45
>>   5
>> 
>> 
>>   45
>>   5
>> 
>> 
>>   45
>>   5
>> 
>> 
>>   45
>>   5
>> 
>> 
>>   45
>>   5
>> 
>> 
>>   45
>>   5
>> 
>> 
>>   45
>>   5
>> 
>
>
>> 
>>   45
>>   5
>> 
>> 
>>   45
>>   5
>> 
>> 
>>   45
>>   5
>> 
>>   
>>   
>>   
>> 
>> 
>
>
>
> ** Now, if a create a new collection with 2 shards (>1 

Re: Solr Memory question

2014-08-01 Thread Shawn Heisey
On 8/1/2014 3:17 PM, Ethan wrote:
> Our SolrCloud setup : 3 Nodes with Zookeeper, 2 running SolrCloud.
>
> Current dataset size is 97GB, JVM is 10GB, but 6GB is used(for less garbage
> collection time).  RAM is 96GB,
>
> Our softcommit is set to 2secs and hardcommit is set to 1 hour.
>
> We are suddenly seeing high disk and network IOs.  During search the leader
> usually logs one more query with it's node name and shard information -
>
> "{NOW=1406911121656&shard.url=
> chexjvassoms006.ch.expeso.com:52158/solr/Main..
> ids=-9223372036371158536,-9223372036373602680,-9223372036618637568,-9223372036371157736..&distrib=false&timeAllowed=2000&wt=javabin&isShard=true"
>
> The actually query didn't have any of this information.  This started just
> today and causing lot of latency issues.  We have had nodes go down several
> times today.

That query is from distributed search -- it's the query that actually
retrieves the documents from the shards after the results of the initial
query have been tabulated to determine which documents are needed.  The
"ids" parameter is what tells me this.

Do you know how long those autoSoftCommit operations take?  If you are
indexing frequently enough and the commits are taking longer than the
configured interval of two seconds, you may be having multiple commits
happening at the same time.  Soft commits are faster and use fewer
resources than hard commits, but they aren't even close to free --
they're going to hit the disk and memory very hard.

One thing to note:  An hour may be too long for the hard commit
interval.  Hard commits result in a new transaction log being started,
so on restart, Solr will replay all of the updates that occurred in the
last hour.  If your update rate is low, that might be acceptable, but if
the update rate is high, that could be a LOT of updates, making Solr
restarts *very* slow.

Thanks,
Shawn



Re: Solr Memory question

2014-08-01 Thread Ethan
4.5.0.

We are trying to free memory by deleting data from 2010. But that hasn't
helped so far.


On Fri, Aug 1, 2014 at 3:13 PM, Otis Gospodnetic  wrote:

> Which version of Solr?
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Fri, Aug 1, 2014 at 11:17 PM, Ethan  wrote:
>
> > Our SolrCloud setup : 3 Nodes with Zookeeper, 2 running SolrCloud.
> >
> > Current dataset size is 97GB, JVM is 10GB, but 6GB is used(for less
> garbage
> > collection time).  RAM is 96GB,
> >
> > Our softcommit is set to 2secs and hardcommit is set to 1 hour.
> >
> > We are suddenly seeing high disk and network IOs.  During search the
> leader
> > usually logs one more query with it's node name and shard information -
> >
> > "{NOW=1406911121656&shard.url=
> > chexjvassoms006.ch.expeso.com:52158/solr/Main..
> >
> >
> ids=-9223372036371158536,-9223372036373602680,-9223372036618637568,-9223372036371157736..&distrib=false&timeAllowed=2000&wt=javabin&isShard=true"
> >
> > The actually query didn't have any of this information.  This started
> just
> > today and causing lot of latency issues.  We have had nodes go down
> several
> > times today.
> >
> > Any of you faced similar issues before?
> >
> > E
> >
>


Re: Solr Memory question

2014-08-01 Thread Otis Gospodnetic
Which version of Solr?

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Fri, Aug 1, 2014 at 11:17 PM, Ethan  wrote:

> Our SolrCloud setup : 3 Nodes with Zookeeper, 2 running SolrCloud.
>
> Current dataset size is 97GB, JVM is 10GB, but 6GB is used(for less garbage
> collection time).  RAM is 96GB,
>
> Our softcommit is set to 2secs and hardcommit is set to 1 hour.
>
> We are suddenly seeing high disk and network IOs.  During search the leader
> usually logs one more query with it's node name and shard information -
>
> "{NOW=1406911121656&shard.url=
> chexjvassoms006.ch.expeso.com:52158/solr/Main..
>
> ids=-9223372036371158536,-9223372036373602680,-9223372036618637568,-9223372036371157736..&distrib=false&timeAllowed=2000&wt=javabin&isShard=true"
>
> The actually query didn't have any of this information.  This started just
> today and causing lot of latency issues.  We have had nodes go down several
> times today.
>
> Any of you faced similar issues before?
>
> E
>


Solr Memory question

2014-08-01 Thread Ethan
Our SolrCloud setup : 3 Nodes with Zookeeper, 2 running SolrCloud.

Current dataset size is 97GB, JVM is 10GB, but 6GB is used(for less garbage
collection time).  RAM is 96GB,

Our softcommit is set to 2secs and hardcommit is set to 1 hour.

We are suddenly seeing high disk and network IOs.  During search the leader
usually logs one more query with it's node name and shard information -

"{NOW=1406911121656&shard.url=
chexjvassoms006.ch.expeso.com:52158/solr/Main..
ids=-9223372036371158536,-9223372036373602680,-9223372036618637568,-9223372036371157736..&distrib=false&timeAllowed=2000&wt=javabin&isShard=true"

The actually query didn't have any of this information.  This started just
today and causing lot of latency issues.  We have had nodes go down several
times today.

Any of you faced similar issues before?

E


RE: Debug DirectSolrSpellChecker Suggestion Sort Order

2014-08-01 Thread Dyer, James
Query results default to score.  But spelling suggestions sort by edit 
distance, with frequency as a secondary sort.  

unie => unger = 2 edits
unie => unick = 2 edits
unie => united = 3 edits
unie => unique = 3 edits
... etc ...

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Corey Gerhardt [mailto:corey.gerha...@directwest.com] 
Sent: Friday, August 01, 2014 3:01 PM
To: 'solr-user@lucene.apache.org'
Subject: Debug DirectSolrSpellChecker Suggestion Sort Order

Everything that I read says that the default sort order is by Score, yet this 
appears to me to be sorted by frequency:



10
0
4
0




unger

119





unick

16





united

16





unique

10





unity

7





unser

7





unyi

7





utke

5





uribe

3





uthe

3






I've even set in solconfig.xml:
score
Is there a way that I can debug my issue? I'm searching people names so ideally 
I'm hoping to get unyi higher in the list of suggestions.

Thanks,

Corey



Debug DirectSolrSpellChecker Suggestion Sort Order

2014-08-01 Thread Corey Gerhardt
Everything that I read says that the default sort order is by Score, yet this 
appears to me to be sorted by frequency:



10
0
4
0




unger

119





unick

16





united

16





unique

10





unity

7





unser

7





unyi

7





utke

5





uribe

3





uthe

3






I've even set in solconfig.xml:
score
Is there a way that I can debug my issue? I'm searching people names so ideally 
I'm hoping to get unyi higher in the list of suggestions.

Thanks,

Corey


Re: Extend the Solr Terms Component to implement a customized Autosuggest

2014-08-01 Thread Erick Erickson
Ummm, 400k documents is _tiny_ by Solr/Lucene standards. I've seen 150M
docs fit in 16G on Solr. I put 11M docs on my laptop

So I would _strongly_ advise that you don't worry about space at all as a
first approach and freely copy as many fields as you need to support your
use-case. Only after you've proved that this is untenable would I recommend
you develop custom code. You'll be in production much faster that way ;)

Of course this is irrelevant if each doc is "War and Peace", but

Best,
Erick


On Thu, Jul 31, 2014 at 3:29 PM, Juan Pablo Albuja 
wrote:

> Good afternoon guys, I really appreciate if someone on the community can
> help me with the following issue:
>
> I need to implement a Solr autosuggest that supports:
>
> 1.   Get autosuggestion over multivalued fields
>
> 2.   Case - Insensitiveness
>
> 3.   Look for content in the middle for example I have the value
> "Hello World" indexed, and I need to get that value when the user types
> "wor"
>
> 4.   Filter by an additional field.
>
> I was using the terms component because with it I can satisfy 1 to 3, but
> for point 4 is not possible. I also was looking at faceting searches and
> Ngram.Edge-Ngrams, but the problem with those approaches is that I need to
> copy fields over to make them tokenized or apply grams to those, and I
> don't want to do that because I have more than 6 fields that needs
> autosuggest, my index is big I have more than 400k documents and I don't
> want to increase the size.
> I was trying to Extend the terms component in order to add an additional
> filter but it uses TermsEnum that is a vector over an specific field and I
> couldn't figure out how to filter it in a really efficient way.
> Do you guys have an idea on how can I satisfy my requirements in an
> efficient way? If there is another way without using the terms component
> for me is also awesome.
>
> Thanks
>
>
>
>
> Juan Pablo Albuja
> Senior Developer
>
>
>


Re: SolrCloud Scale Struggle

2014-08-01 Thread Shawn Heisey
On 8/1/2014 4:19 AM, anand.mahajan wrote:
> My current deployment : 
>  i) I'm using Solr 4.8 and have set up a SolrCloud with 6 dedicated machines
> - 24 Core + 96 GB RAM each.
>  ii)There are over 190M docs in the SolrCloud at the moment (for all
> replicas its consuming overall disk 2340GB which implies - each doc is at
> about 5-8kb in size.)
>  iii) The docs are split into 36 Shards - and 3 replica per shard (in all
> 108 Solr Jetty processes split over 6 Servers leaving about 18 Jetty JVMs
> running on each host)



> 2. Should I have been better served had I deployed a Single Jetty Solr
> instance per server with multiple cores running inside? The servers do start
> to swap out after a couple of days of Solr uptime - right now we reboot the
> entire cluster every 4 days.

Others have already mentioned the problems with autoCommit being far too
frequent, so I'll just echo their advice to increase the intervals.

You should DEFINITELY have exactly one jetty process per server.  One
Solr process can handle *many* shard replicas (cores).  With 18 per
server, that's a LOT of overhead (especially memory) that is not required.

> 3. The routing key is not able to effectively balance the docs on available
> shards - There are a few shards with just about 2M docs - and others over
> 11M docs. Shall I split the larger shards? But I do not have more nodes /
> hardware to allocate to this deployment. In such case would splitting up the
> large shards give better read-write throughput? 
>
> 4. To remain with the current hardware - would it help if I remove 1 replica
> each from a shard? But that would mean even when just 1 node goes down for a
> shard there would be only 1 live node left that would not serve the write
> requests.

Why not just let Solr automatically handle routing with the compositeId
router?  Chances are excellent that this will result in perfect shard
balancing.  Unless you want completely manual sharding (not controlled
at all by SolrCloud), don't complicate it by trying to influence the
routing.

I think that when you say "1 live node left that would not serve the
write requests" above, you may have a misconception about SolrCloud. 
*ALL* replicas have the same indexing load.  Although the replication
handler is required when you use SolrCloud, replication is *NOT* how the
data gets on all replicas.  Each update request makes its way to the
shard leader, then the shard leader sends that update request to all
replicas, and each one independently indexes the content.  Replication
only gets used when something goes wrong and a shard needs recovery.

> 5. Also, is there a way to control where the Split Shard replicas would go?
> Is there a pattern / rule that Solr follows when it creates replicas for
> split shards?
>
> 6. I read somewhere that creating a Core would cost the OS one thread and a
> file handle. Since a core repsents an index in its entirty would it not be
> allocated the configured number of write threads? (The dafault that is 8)
>
> 7. The Zookeeper cluster is deployed on the same boxes as the Solr instance
> - Would separating the ZK cluster out help?

I don't know for sure what rules are followed when creating the replicas
(cores).  SolrCloud will make sure that replicas end up on different
nodes, with a node being a Solr JVM, *NOT* a machine.  If one node has
fewer replicas already onboard than another, it will likely be preferred
...but I actually don't know if that logic is incorporated.

Each core probably does create a thread, but there will be far more than
one filehandle.  A Lucene index (there is one in every core) is normally
composed of dozens or hundreds of files, each of which will require a
file handle.  Each network connection also uses file handles.

If load is light, putting ZK on the same nodes/disks as Solr is not a
big deal.  If load is heavy, you will want the ZK database to have its
own dedicated disk spindle(s) ... but ZK's CPU requirements are usually
very small.  Completely separate servers are not usually required, but
if you can do that, you would be much better protected against
performance problems.

Thanks,
Shawn



Re: why solr commit with serval docs

2014-08-01 Thread Shawn Heisey
On 8/1/2014 3:22 AM, rulinma wrote:
> I use solrconfig.xml as follow:  
> 
> 
>   ${solr.ulog.dir:}
> 
>  
>   15000 
> 5000
>   false 
> 
>  
>360 
>  50
> 
>   
>
> I use 2000 docs to commit once, but I query find that 2002,2003,2004 and so
> on, not 2000,4000,6000 increase. I don't know why?
> who can tell me ?

I cannot say for sure, but I would guess that the commit action is
initiated when the threshold is crossed ... but that it is not an atomic
operation that stops indexing right at that moment.  I bet that it takes
a few microseconds or milliseconds for the commit to begin, and in the
meantime, a few more documents get indexed.

Thanks,
Shawn



Re: Replacing a Solr index on the fly using SolrJ

2014-08-01 Thread Shawn Heisey
On 8/1/2014 3:21 AM, Sören Schneider wrote:
> I'm looking for a way to (programmatically) replace a Solr index
> on-the-fly using SolrJ, just as mentioned in Solr CoreAdmin Wiki[1]. I
> already managed to create a dump of the index on-the-fly.
>
> The intention is to use the dump transparently while rebuilding the
> "original" index to achieve that the index and all its files stay
> online. When the reindexing process is finished, the dump gets
> replaced vice versa with the recent index.
>
> If there's another way to solve this problem, please let me know.

This sounds like a job for core swapping -- build up a new index in
another core, then swap that core with the old one.  I assume from your
description that you are not using SolrCloud.  SolrCloud changes things,
for that you would need to use collection aliasing -- swapping cores in
SolrCloud mode is likely to cause some serious breakage.

Here's the "swap" method in my Core object:

/**
 * Swap one Solr core with another.
 *
 * @param other Core representing the other core to swap with.
 * @throws BuildException
 */
public final void swap(Core other) throws BuildException
{
CoreAdminRequest car = new CoreAdminRequest();
car.setCoreName(_name);
car.setOtherCoreName(other._name);
car.setAction(CoreAdminAction.SWAP);
try
{
car.process(_serverSolr);
updateDirectories();
other.updateDirectories();
Static.LOG.info(_prefix + _name + ": swapped with " +
other._name);
}
catch (Exception e)
{
throw new BuildException("Failed to swap cores on " +
_prefix + _name, e);
}
}

The _serverSolr object is an instance of HttpSolrServer, set to
http://host:port/solr ... I also have a _querySolr object that is set to
http://host:port/solr/corename ... that one gets used for
queries/updates.  The updateDirectories method looks up the new
instanceDir and dataDir and populates class members with the correct info.

I'm using the old solr.xml format.  Below is how I have defined two
cores that get swapped.  I used more generic directory names so that I
won't ever have a "live" core that points to a directory that says
"build" ... that would be really confusing:




Thanks,
Shawn



Stand alone Solr - no zookeeper?

2014-08-01 Thread Joel Cohen
Hi,

We're in the development phase of a new application and the current dev
team mindset leans towards running Solr (4.9) in AWS without Zookeeper. The
theory is that we can add nodes quickly to our load balancer
programmatically and get a dump of the indexes from another node and copy
them over to the new one. A RESTful API would handle other applications
talking to Solr without the need for each of them to have to use SolrJ.
Data ingestion happens nightly in bulk by way of ActiveMQ which each server
subscribes to and pulls its own copy of the indexes. Incremental updates
are very few during the day, but we would have some mechanism of getting a
new server to 'catch up' to the live servers before making it active in the
load balancer.

The only thing so far that I see as a hurdle here is the data set size vs.
heap size. If the index grows too large, then we have to increase the heap
size, which could lead to longer GC times. Servers could pop in and out of
the load balancer if they are unavailable for too long when a major GC
happens.

Current stats:
11 Gb of data (and growing)
4 Gb java heap
4 CPU, 16 Gb RAM nodes (maybe more needed?)

All thoughts are welcomed.

Thanks.
-- 
*Joel Cohen*
Devops Engineer

*GrubHub Inc.*
*jco...@grubhub.com *
646-527-7771
1065 Avenue of the Americas
15th Floor
New York, NY 10018

grubhub.com | *fb * | *tw
*
seamless.com | *fb * | *tw
*


Re: SolrCloud Scale Struggle

2014-08-01 Thread anand.mahajan
Thanks for the reply Shalin.

1. I'll try increasing the softCommit interval and the autoSoftCommit too.
One mistake I made that I realized just now is that I am using /solr/select
and expecting it to do an NRT - for NRT search its got to be /select/get
handler that needs to be used. Please confirm.

2. Also, on the number of shards - I made 36 (even with 6 machines) as I was
hoping I'd get more hardware and i'll be able to distribute existing shards
on the new boxes. That has not happened yet. But even with current
deployment - less number of shards would mean more docs per shard and would
that now slow down search queries?

3. Increasing the commit interval would mean more RAM usage and could that
make the situation bad? as there is already less RAM in there compared to
the total doc size (with all fields stored)  [FYI - ramBufferSizeMB and
maxBufferedDocs are set to default - 100MB and 1000 respectively]

4. I read DataStack Enterprise edition could be an answer here? Is there an
easy way to migrate to DSE - and something that would not cause too many
code changes? (I had a discussion with the DSE folks a few weeks ago and
they mentioned migration would be breeze from Solr to DSE and there would
not be 'any' code changes required too on the ingestion and search code.
(Perhaps I was talking to the Sales guy maybe?))  - With DSE - the data
would sit in Cassendra and the search will still be with Solr plugged into
DSE. but would that work with a 6 Node cluster?  (Sorry if I'm deviating
here a bit from the core problem i'm trying to fix - but if DSE could work
with a very minimal time and effort requirement - i wont mind trying it
out.)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592p4150619.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Scale Struggle

2014-08-01 Thread Shalin Shekhar Mangar
Increasing autoCommit doesn't increase RAM consumption. It just means that
more items would be in transaction log and that node restart/recovery will
be slower.


On Fri, Aug 1, 2014 at 7:10 PM, anand.mahajan  wrote:

> Oops - my bad - Its autoSoftCommit that is set after every doc and not an
> autoCommit.
>
> Following snippet from the solrconfig -
>
> 
>1
>true
> 
>
> 
>1
> 
>
> Shall I increase the autoCommit time as well? But would that mean more RAM
> is consumed by all instances running on the box?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592p4150615.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,
Shalin Shekhar Mangar.


RE: SolrCloud Scale Struggle

2014-08-01 Thread Doug Turnbull
Auto soft commuting that frequently can also dramatically impact
performance. Perhaps not nearly as much as a hard commit, but I would
still consider increasing it.

Also hard commits every 10 seconds at that volume is quite frequent.

I'd consider doing soft commits every 10 seconds and do hard commits
every so many minutes (10 or so?) and tweak from there.

I might also look at your segment merge strategy. With a large number
of small docs, I've seen solr spend a lot of CPU time just merging
segments. You'll probably want to tweak this to be far less aggressive.

Sent from my Windows Phone From: anand.mahajan
Sent: ‎8/‎1/‎2014 9:40 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud Scale Struggle
Oops - my bad - Its autoSoftCommit that is set after every doc and not an
autoCommit.

Following snippet from the solrconfig -


   1
   true



   1


Shall I increase the autoCommit time as well? But would that mean more RAM
is consumed by all instances running on the box?



--
View this message in context:
http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592p4150615.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Scale Struggle

2014-08-01 Thread anand.mahajan
Oops - my bad - Its autoSoftCommit that is set after every doc and not an
autoCommit. 

Following snippet from the solrconfig - 

 
   1 
   true 


 
   1 


Shall I increase the autoCommit time as well? But would that mean more RAM
is consumed by all instances running on the box?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592p4150615.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Scale Struggle

2014-08-01 Thread Shalin Shekhar Mangar
Comments inline:


On Fri, Aug 1, 2014 at 3:49 PM, anand.mahajan  wrote:

> Hello all,
>
> Struggling to get this going with SolrCloud -
>
> Requirement in brief :
>  - Ingest about 4M Used Cars listings a day and track all unique cars for
> changes
>  - 4M automated searches a day (during the ingestion phase to check if a
> doc
> exists in the index (based on values of 4-5 key fields) or it is a new one
> or an updated version)
>  - Of the 4 M - About 3M Updates to existing docs (for every non-key value
> change)
>  - About 1M inserts a day (I'm assuming these many new listings come in
> every day)
>  - Daily Bulk CSV exports of inserts / updates in last 24 hours of various
> snapshots of the data to various clients
>
> My current deployment :
>  i) I'm using Solr 4.8 and have set up a SolrCloud with 6 dedicated
> machines
> - 24 Core + 96 GB RAM each.
>  ii)There are over 190M docs in the SolrCloud at the moment (for all
> replicas its consuming overall disk 2340GB which implies - each doc is at
> about 5-8kb in size.)
>  iii) The docs are split into 36 Shards - and 3 replica per shard (in all
> 108 Solr Jetty processes split over 6 Servers leaving about 18 Jetty JVMs
> running on each host)
>  iv) There are 60 fields per doc and all fields are stored at the moment
>  :(
> (The backend is only Solr at the moment)
>  v) The current shard/routing key is a combination of Car Year, Make and
> some other car level attributes that help classify the cars
> vi) We are mostly using the default Solr config as of now - no heavy
> caching
> as the search is pretty random in nature
> vii) Autocommit is on - with maxDocs = 1
>
> Current throughput & Issues :
> With the above mentioned deployment the daily throughout is only at about
> 1.5M on average (Inserts + Updates) - falling way short of what is
> required.
> Search is slow - Some queries take about 15 seconds to return - and since
> insert is dependent on at least one Search that degrades the write
> throughput too. (This is not a Solr issue - but the app demands it so)
>
> Questions :
>
> 1. Autocommit with maxDocs = 1 - is that a goof up and could that be
> slowing
> down indexing? Its a requirement that all docs are available as soon as
> indexed.
>

The autoCommit every 1 document is definitely what's causing those
problems. You should set autoSoftCommit with a small time value, say a
minute or two and set autoCommit by both docs and time value (say 1
docs, 10 minutes). Lucene/Solr do a bunch of work on commit and reducing
them will increase throughput.

Also see
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/


>
> 2. Should I have been better served had I deployed a Single Jetty Solr
> instance per server with multiple cores running inside? The servers do
> start
> to swap out after a couple of days of Solr uptime - right now we reboot the
> entire cluster every 4 days.
>

That's a lot of jetty JVMs. I think you have oversharded your cluster. If
they're all sharing the same disk then it's even worse. We typically deploy
3-4 jetty JVMs per physical machine with 4-8GB of heap (depending on
sorting/faceting requirements). Your write traffic is actually quite small
so you probably don't need 36 shards.

You are swapping because the data is too much relative to the available
RAM. You are trying to serve 2340GB of index with 576GB of RAM under
near-real-time environment. You basically need more RAM.


>
> 3. The routing key is not able to effectively balance the docs on available
> shards - There are a few shards with just about 2M docs - and others over
> 11M docs. Shall I split the larger shards? But I do not have more nodes /
> hardware to allocate to this deployment. In such case would splitting up
> the
> large shards give better read-write throughput?
>

Your throughput problems are because of committing after every document and
less RAM. Splitting shards won't help here.

As far as the uneven shards go, there's really only one option which is
custom sharding where you shard according to your own rules but you already
have a running cluster and changing the routing option on an existing
collection is not possible.


>
> 4. To remain with the current hardware - would it help if I remove 1
> replica
> each from a shard? But that would mean even when just 1 node goes down for
> a
> shard there would be only 1 live node left that would not serve the write
> requests.


It will eventually. The thing is that if there are just two replicas and
one goes down, then the other waits a while before it will elect itself as
the leader. This is to avoid data loss.


>


> 5. Also, is there a way to control where the Split Shard replicas would go?
> Is there a pattern / rule that Solr follows when it creates replicas for
> split shards?
>

Unfortunately no. The split command doesn't accept createNodeSet param but
it is probably a good improvement. The leader of the split shard will
remain on the same node as the leade

Re: Auto suggest with adding accents

2014-08-01 Thread benjelloun
hello,
on the new suggester, when the field is multivalued="true", itsnot working


i need to try the patch "LUCENE-3842" to test auto complete but i dont know
how.
i have Solr-4.7.2 not source code.
can some one help?

Best regards,
Anass BENJELLOUN



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150609.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCloud Scale Struggle

2014-08-01 Thread anand.mahajan
Hello all,

Struggling to get this going with SolrCloud - 

Requirement in brief :
 - Ingest about 4M Used Cars listings a day and track all unique cars for
changes
 - 4M automated searches a day (during the ingestion phase to check if a doc
exists in the index (based on values of 4-5 key fields) or it is a new one
or an updated version)
 - Of the 4 M - About 3M Updates to existing docs (for every non-key value
change)
 - About 1M inserts a day (I'm assuming these many new listings come in
every day)
 - Daily Bulk CSV exports of inserts / updates in last 24 hours of various
snapshots of the data to various clients

My current deployment : 
 i) I'm using Solr 4.8 and have set up a SolrCloud with 6 dedicated machines
- 24 Core + 96 GB RAM each.
 ii)There are over 190M docs in the SolrCloud at the moment (for all
replicas its consuming overall disk 2340GB which implies - each doc is at
about 5-8kb in size.)
 iii) The docs are split into 36 Shards - and 3 replica per shard (in all
108 Solr Jetty processes split over 6 Servers leaving about 18 Jetty JVMs
running on each host)
 iv) There are 60 fields per doc and all fields are stored at the moment  :( 
(The backend is only Solr at the moment)
 v) The current shard/routing key is a combination of Car Year, Make and
some other car level attributes that help classify the cars
vi) We are mostly using the default Solr config as of now - no heavy caching
as the search is pretty random in nature 
vii) Autocommit is on - with maxDocs = 1

Current throughput & Issues :
With the above mentioned deployment the daily throughout is only at about
1.5M on average (Inserts + Updates) - falling way short of what is required.
Search is slow - Some queries take about 15 seconds to return - and since
insert is dependent on at least one Search that degrades the write
throughput too. (This is not a Solr issue - but the app demands it so)

Questions :

1. Autocommit with maxDocs = 1 - is that a goof up and could that be slowing
down indexing? Its a requirement that all docs are available as soon as
indexed.

2. Should I have been better served had I deployed a Single Jetty Solr
instance per server with multiple cores running inside? The servers do start
to swap out after a couple of days of Solr uptime - right now we reboot the
entire cluster every 4 days.

3. The routing key is not able to effectively balance the docs on available
shards - There are a few shards with just about 2M docs - and others over
11M docs. Shall I split the larger shards? But I do not have more nodes /
hardware to allocate to this deployment. In such case would splitting up the
large shards give better read-write throughput? 

4. To remain with the current hardware - would it help if I remove 1 replica
each from a shard? But that would mean even when just 1 node goes down for a
shard there would be only 1 live node left that would not serve the write
requests.

5. Also, is there a way to control where the Split Shard replicas would go?
Is there a pattern / rule that Solr follows when it creates replicas for
split shards?

6. I read somewhere that creating a Core would cost the OS one thread and a
file handle. Since a core repsents an index in its entirty would it not be
allocated the configured number of write threads? (The dafault that is 8)

7. The Zookeeper cluster is deployed on the same boxes as the Solr instance
- Would separating the ZK cluster out help?

Sorry for the long thread _ I thought of asking these all at once rather
than posting separate ones.

Thanks,
Anand



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592.html
Sent from the Solr - User mailing list archive at Nabble.com.


why solr commit with serval docs

2014-08-01 Thread rulinma
I use solrconfig.xml as follow:  


  ${solr.ulog.dir:}

 
  15000 
  5000
  false 

 
   360 
   50

  

I use 2000 docs to commit once, but I query find that 2002,2003,2004 and so
on, not 2000,4000,6000 increase. I don't know why?
who can tell me ?
thanks!




--
View this message in context: 
http://lucene.472066.n3.nabble.com/why-solr-commit-with-serval-docs-tp4150583.html
Sent from the Solr - User mailing list archive at Nabble.com.


Replacing a Solr index on the fly using SolrJ

2014-08-01 Thread Sören Schneider

Hi,

I'm looking for a way to (programmatically) replace a Solr index 
on-the-fly using SolrJ, just as mentioned in Solr CoreAdmin Wiki[1]. I 
already managed to create a dump of the index on-the-fly.


The intention is to use the dump transparently while rebuilding the 
"original" index to achieve that the index and all its files stay 
online. When the reindexing process is finished, the dump gets replaced 
vice versa with the recent index.


If there's another way to solve this problem, please let me know.

Thanks and cheers
Sören

[1] https://wiki.apache.org/solr/CoreAdmin


Re: Solr vs ElasticSearch

2014-08-01 Thread Charlie Hull

On 01/08/2014 09:53, Alexandre Rafalovitch wrote:

Thank you Charlie, very informative even if non-scientific.

About the aggregations, are they very different from:
http://heliosearch.org/solr-facet-functions/ (obviously not yet
production ready)?


They're the same sort of thing. The ES significant terms aggregation is 
particularly cool for spotting anomalies (look up Mark Harwood's blogs 
and presentations on the subject). I think the new analytic capabilities 
in Solr, Heliosearch and ES have some awesome potential.


Cheers

Charlie





Regards,
Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Fri, Aug 1, 2014 at 3:44 PM, Charlie Hull  wrote:

On 01/08/2014 06:43, Alexandre Rafalovitch wrote:


Maybe Charlie Hull can answer that:
https://twitter.com/FlaxSearch/status/494859596117602304 . He seems to
think that - at least in some cases - Solr is faster.



I'll try to expand on the tweet.

Firstly, this is a totally unscientific comparison - we'd like to have time
to develop a proper public demonstration of some of the performance
differences we've found, which hopefully we will soon...so this is far more
anecdotal than statistical! Our eventual intention is to publicise any
differences so the wider community can tell us if we've done something
wrong, or maybe improve one or both engines. Don't get me wrong, we *like*
the fact there are two cool search server projects built on Lucene!

I can think of three recent projects where we've compared the two - we
wanted to be sure we were using the best fit for our clients:
1. a search over 40-50 million news stories with relatively complex
filtering requirements - Although ES promised more granular filtering it was
a lot slower to do it. We chose Solr.
2. a pretty standard intranet search over a few million items that might
require some clever visualisation in a future phase. No real difference in
speed, we chose ES.
3. a search over 700k items in the recruitment space with some geolocation
filtering - ES seemed to be faster at indexing, but Solr was a lot faster
for searching, and probably will be equivalent at indexing once we do some
tuning. We chose Solr.

Others have told me that if your documents are rich, choose Solr: if however
you have a large number of more simple documents, choose ES as the scaling
is less painful. If you like old-school XML config, choose Solr: if you're a
bearded hipster running a startup in Shoreditch choose ES. The aggregations
in ES are *way* cool.

YMMV, of course. The *only* sensible way to choose is to try both with your
data and requirements. Benchmarks are all very well, but they don't
necessarily apply to your situation.

Cheers

Charlie




I am also doing a talk and a book on Solr vs. ElasticSearch, but I am
not really planning to address those issues either, only the feature
comparisons.

Regards,
 Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Fri, Aug 1, 2014 at 12:35 PM, Salman Akram
 wrote:


I did see that earlier. My main concern is search
performance/scalability/throughput which unfortunately that article
didn't
address. Any benchmarks or comments about that?

We are already using SOLR but there has been a push to check
elasticsearch.
All the benchmarks I have seen are at least few years old.


On Fri, Aug 1, 2014 at 4:59 AM, Otis Gospodnetic


wrote:




Not super fresh, but more recent than the 2 links you sent:

http://blog.sematext.com/2012/08/23/solr-vs-elasticsearch-part-1-overview/

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Thu, Jul 31, 2014 at 10:33 PM, Salman Akram <
salman.ak...@northbaysolutions.net> wrote:


This is quite an old discussion. Wanted to check any new comparisons


after


SOLR 4 especially with regards to performance/scalability/throughput?


On Tue, Jul 26, 2011 at 7:33 PM, Peter  wrote:


Have a look:







http://stackoverflow.com/questions/2271600/elasticsearch-sphinx-lucene-solr-xapian-which-fits-for-which-usage





http://karussell.wordpress.com/2011/05/12/elasticsearch-vs-solr-lucene/



Regards,
Peter.

--
View this message in context:





http://lucene.472066.n3.nabble.com/Solr-vs-ElasticSearch-tp3009181p3200492.html


Sent from the Solr - User mailing list archive at Nabble.com.





--
Regards,

Salman Akram







--
Regards,

Salman Akram




--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk



--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: Search on Date Field

2014-08-01 Thread Pbbhoge
Thanks Jack ..

it works for me .


Regards
Pradip



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-on-Date-Field-tp4150076p4150573.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr vs ElasticSearch

2014-08-01 Thread Alexandre Rafalovitch
Thank you Charlie, very informative even if non-scientific.

About the aggregations, are they very different from:
http://heliosearch.org/solr-facet-functions/ (obviously not yet
production ready)?

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Fri, Aug 1, 2014 at 3:44 PM, Charlie Hull  wrote:
> On 01/08/2014 06:43, Alexandre Rafalovitch wrote:
>>
>> Maybe Charlie Hull can answer that:
>> https://twitter.com/FlaxSearch/status/494859596117602304 . He seems to
>> think that - at least in some cases - Solr is faster.
>
>
> I'll try to expand on the tweet.
>
> Firstly, this is a totally unscientific comparison - we'd like to have time
> to develop a proper public demonstration of some of the performance
> differences we've found, which hopefully we will soon...so this is far more
> anecdotal than statistical! Our eventual intention is to publicise any
> differences so the wider community can tell us if we've done something
> wrong, or maybe improve one or both engines. Don't get me wrong, we *like*
> the fact there are two cool search server projects built on Lucene!
>
> I can think of three recent projects where we've compared the two - we
> wanted to be sure we were using the best fit for our clients:
> 1. a search over 40-50 million news stories with relatively complex
> filtering requirements - Although ES promised more granular filtering it was
> a lot slower to do it. We chose Solr.
> 2. a pretty standard intranet search over a few million items that might
> require some clever visualisation in a future phase. No real difference in
> speed, we chose ES.
> 3. a search over 700k items in the recruitment space with some geolocation
> filtering - ES seemed to be faster at indexing, but Solr was a lot faster
> for searching, and probably will be equivalent at indexing once we do some
> tuning. We chose Solr.
>
> Others have told me that if your documents are rich, choose Solr: if however
> you have a large number of more simple documents, choose ES as the scaling
> is less painful. If you like old-school XML config, choose Solr: if you're a
> bearded hipster running a startup in Shoreditch choose ES. The aggregations
> in ES are *way* cool.
>
> YMMV, of course. The *only* sensible way to choose is to try both with your
> data and requirements. Benchmarks are all very well, but they don't
> necessarily apply to your situation.
>
> Cheers
>
> Charlie
>
>
>>
>> I am also doing a talk and a book on Solr vs. ElasticSearch, but I am
>> not really planning to address those issues either, only the feature
>> comparisons.
>>
>> Regards,
>> Alex.
>> Personal: http://www.outerthoughts.com/ and @arafalov
>> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
>> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>>
>>
>> On Fri, Aug 1, 2014 at 12:35 PM, Salman Akram
>>  wrote:
>>>
>>> I did see that earlier. My main concern is search
>>> performance/scalability/throughput which unfortunately that article
>>> didn't
>>> address. Any benchmarks or comments about that?
>>>
>>> We are already using SOLR but there has been a push to check
>>> elasticsearch.
>>> All the benchmarks I have seen are at least few years old.
>>>
>>>
>>> On Fri, Aug 1, 2014 at 4:59 AM, Otis Gospodnetic
>>> >>>
 wrote:
>>>
>>>
 Not super fresh, but more recent than the 2 links you sent:

 http://blog.sematext.com/2012/08/23/solr-vs-elasticsearch-part-1-overview/

 Otis
 --
 Performance Monitoring * Log Analytics * Search Analytics
 Solr & Elasticsearch Support * http://sematext.com/


 On Thu, Jul 31, 2014 at 10:33 PM, Salman Akram <
 salman.ak...@northbaysolutions.net> wrote:

> This is quite an old discussion. Wanted to check any new comparisons

 after
>
> SOLR 4 especially with regards to performance/scalability/throughput?
>
>
> On Tue, Jul 26, 2011 at 7:33 PM, Peter  wrote:
>
>> Have a look:
>>
>>
>>
>

 http://stackoverflow.com/questions/2271600/elasticsearch-sphinx-lucene-solr-xapian-which-fits-for-which-usage
>>
>>
>>
 http://karussell.wordpress.com/2011/05/12/elasticsearch-vs-solr-lucene/
>>
>>
>> Regards,
>> Peter.
>>
>> --
>> View this message in context:
>>
>

 http://lucene.472066.n3.nabble.com/Solr-vs-ElasticSearch-tp3009181p3200492.html
>>
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> Regards,
>
> Salman Akram
>

>>>
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Salman Akram
>
>
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk


Re: Solr vs ElasticSearch

2014-08-01 Thread Charlie Hull

On 01/08/2014 06:43, Alexandre Rafalovitch wrote:

Maybe Charlie Hull can answer that:
https://twitter.com/FlaxSearch/status/494859596117602304 . He seems to
think that - at least in some cases - Solr is faster.


I'll try to expand on the tweet.

Firstly, this is a totally unscientific comparison - we'd like to have 
time to develop a proper public demonstration of some of the performance 
differences we've found, which hopefully we will soon...so this is far 
more anecdotal than statistical! Our eventual intention is to publicise 
any differences so the wider community can tell us if we've done 
something wrong, or maybe improve one or both engines. Don't get me 
wrong, we *like* the fact there are two cool search server projects 
built on Lucene!


I can think of three recent projects where we've compared the two - we 
wanted to be sure we were using the best fit for our clients:
1. a search over 40-50 million news stories with relatively complex 
filtering requirements - Although ES promised more granular filtering it 
was a lot slower to do it. We chose Solr.
2. a pretty standard intranet search over a few million items that might 
require some clever visualisation in a future phase. No real difference 
in speed, we chose ES.
3. a search over 700k items in the recruitment space with some 
geolocation filtering - ES seemed to be faster at indexing, but Solr was 
a lot faster for searching, and probably will be equivalent at indexing 
once we do some tuning. We chose Solr.


Others have told me that if your documents are rich, choose Solr: if 
however you have a large number of more simple documents, choose ES as 
the scaling is less painful. If you like old-school XML config, choose 
Solr: if you're a bearded hipster running a startup in Shoreditch choose 
ES. The aggregations in ES are *way* cool.


YMMV, of course. The *only* sensible way to choose is to try both with 
your data and requirements. Benchmarks are all very well, but they don't 
necessarily apply to your situation.


Cheers

Charlie



I am also doing a talk and a book on Solr vs. ElasticSearch, but I am
not really planning to address those issues either, only the feature
comparisons.

Regards,
Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Fri, Aug 1, 2014 at 12:35 PM, Salman Akram
 wrote:

I did see that earlier. My main concern is search
performance/scalability/throughput which unfortunately that article didn't
address. Any benchmarks or comments about that?

We are already using SOLR but there has been a push to check elasticsearch.
All the benchmarks I have seen are at least few years old.


On Fri, Aug 1, 2014 at 4:59 AM, Otis Gospodnetic 
wrote:



Not super fresh, but more recent than the 2 links you sent:
http://blog.sematext.com/2012/08/23/solr-vs-elasticsearch-part-1-overview/

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Thu, Jul 31, 2014 at 10:33 PM, Salman Akram <
salman.ak...@northbaysolutions.net> wrote:


This is quite an old discussion. Wanted to check any new comparisons

after

SOLR 4 especially with regards to performance/scalability/throughput?


On Tue, Jul 26, 2011 at 7:33 PM, Peter  wrote:


Have a look:






http://stackoverflow.com/questions/2271600/elasticsearch-sphinx-lucene-solr-xapian-which-fits-for-which-usage




http://karussell.wordpress.com/2011/05/12/elasticsearch-vs-solr-lucene/


Regards,
Peter.

--
View this message in context:




http://lucene.472066.n3.nabble.com/Solr-vs-ElasticSearch-tp3009181p3200492.html

Sent from the Solr - User mailing list archive at Nabble.com.





--
Regards,

Salman Akram







--
Regards,

Salman Akram



--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: Auto suggest with adding accents

2014-08-01 Thread Alexandre Rafalovitch
Perhaps the actual suggester module is a better fit then:

http://blog.mikemccandless.com/2012/09/lucenes-new-analyzing-suggester.html
http://romiawasthy.blogspot.fi/2014/06/configure-solr-suggester.html

Also:
http://jayant7k.blogspot.com/2014/03/an-interesting-suggester-in-solr.html

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Fri, Aug 1, 2014 at 3:21 PM, Otis Gospodnetic
 wrote:
> Aha.  I don't know if Solr Suggester can do that.  Let's see what others
> say.  I know http://www.sematext.com/products/autocomplete/ could do that.
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Fri, Aug 1, 2014 at 9:26 AM, benjelloun  wrote:
>
>> hello,
>>
>> you didnt enderstand well my problem i give you exemple:
>> the document contain the word "genève".
>> q="gene"  auto suggestion give "geneve"
>> q="genè" auto suggestion give "genève"
>>
>> but what i need is q="gene" auto suggestion give "genève" with accent like
>> correction of word.
>> i tried to add spellchecker to correct it but the maximum of character for
>> correction is 2
>> maybe there is other solution,
>> i give my schema of field:
>>
>> > positionIncrementGap="100" omitNorms="true">
>> 
>> 
>> 
>> > ignoreCase="true"/>
>> 
>> 
>> 
>> 
>> 
>>  > class="solr.StandardTokenizerFactory"/>replacement="$2"/>-->
>> 
>> > ignoreCase="true"/>
>> 
>> 
>> 
>> 
>>
>> thanks best regards,
>> Anass BENJELLOUN
>>
>>
>>
>>
>> 2014-07-31 18:41 GMT+02:00 Otis Gospodnetic-5 [via Lucene] <
>> ml-node+s472066n4150410...@n3.nabble.com>:
>>
>> > You need to do the opposite.  Make sure accents are NOT removed at index
>> &
>> > query time.
>> >
>> > Otis
>> > --
>> > Performance Monitoring * Log Analytics * Search Analytics
>> > Solr & Elasticsearch Support * http://sematext.com/
>> >
>> >
>> >
>> > On Thu, Jul 31, 2014 at 5:49 PM, benjelloun <[hidden email]
>> > > wrote:
>> >
>> > > hi,
>> > >
>> > > q="gene"  it suggest "geneve"
>> > > ASCIIFoldingFilter work like isolate accent
>> > >
>> > > what i need to suggest is "genève"
>> > >
>> > > any idea?
>> > >
>> > > thanks
>> > > best reagards
>> > > Anass BENJELLOUN
>> > >
>> > >
>> > >
>> > > --
>> > > View this message in context:
>> > >
>> >
>> http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150392.html
>> >
>> > > Sent from the Solr - User mailing list archive at Nabble.com.
>> > >
>> >
>> >
>> > --
>> >  If you reply to this email, your message will be added to the discussion
>> > below:
>> >
>> >
>> http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150410.html
>> >  To unsubscribe from Auto suggest with adding accents, click here
>> > <
>> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4150379&code=YW5hc3MuYm5qQGdtYWlsLmNvbXw0MTUwMzc5fC0xMDQyNjMzMDgx
>> >
>> > .
>> > NAML
>> > <
>> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
>> >
>> >
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150569.html
>> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Auto suggest with adding accents

2014-08-01 Thread Otis Gospodnetic
Aha.  I don't know if Solr Suggester can do that.  Let's see what others
say.  I know http://www.sematext.com/products/autocomplete/ could do that.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Fri, Aug 1, 2014 at 9:26 AM, benjelloun  wrote:

> hello,
>
> you didnt enderstand well my problem i give you exemple:
> the document contain the word "genève".
> q="gene"  auto suggestion give "geneve"
> q="genè" auto suggestion give "genève"
>
> but what i need is q="gene" auto suggestion give "genève" with accent like
> correction of word.
> i tried to add spellchecker to correct it but the maximum of character for
> correction is 2
> maybe there is other solution,
> i give my schema of field:
>
>  positionIncrementGap="100" omitNorms="true">
> 
> 
> 
>  ignoreCase="true"/>
> 
> 
> 
> 
> 
>   class="solr.StandardTokenizerFactory"/>replacement="$2"/>-->
> 
>  ignoreCase="true"/>
> 
> 
> 
> 
>
> thanks best regards,
> Anass BENJELLOUN
>
>
>
>
> 2014-07-31 18:41 GMT+02:00 Otis Gospodnetic-5 [via Lucene] <
> ml-node+s472066n4150410...@n3.nabble.com>:
>
> > You need to do the opposite.  Make sure accents are NOT removed at index
> &
> > query time.
> >
> > Otis
> > --
> > Performance Monitoring * Log Analytics * Search Analytics
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> >
> > On Thu, Jul 31, 2014 at 5:49 PM, benjelloun <[hidden email]
> > > wrote:
> >
> > > hi,
> > >
> > > q="gene"  it suggest "geneve"
> > > ASCIIFoldingFilter work like isolate accent
> > >
> > > what i need to suggest is "genève"
> > >
> > > any idea?
> > >
> > > thanks
> > > best reagards
> > > Anass BENJELLOUN
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150392.html
> >
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
> >
> > --
> >  If you reply to this email, your message will be added to the discussion
> > below:
> >
> >
> http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150410.html
> >  To unsubscribe from Auto suggest with adding accents, click here
> > <
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4150379&code=YW5hc3MuYm5qQGdtYWlsLmNvbXw0MTUwMzc5fC0xMDQyNjMzMDgx
> >
> > .
> > NAML
> > <
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
> >
> >
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150569.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Phrase Highlighter + Surround Query Parser

2014-08-01 Thread Salman Akram
We are having an issue in Phrase highlighter with Surround Query Parser
e.g. *"first thing" w/100 "you must" *brings correct results but also
highlights individual words of the phrase - first, thing are highlighted
where they come separately as well.

Any idea how this can be fixed?


-- 
Regards,

Salman Akram


Re: Auto suggest with adding accents

2014-08-01 Thread benjelloun
hello,

you didnt enderstand well my problem i give you exemple:
the document contain the word "genève".
q="gene"  auto suggestion give "geneve"
q="genè" auto suggestion give "genève"

but what i need is q="gene" auto suggestion give "genève" with accent like
correction of word.
i tried to add spellchecker to correct it but the maximum of character for
correction is 2
maybe there is other solution,
i give my schema of field:











 replacement="$2"/>-->







thanks best regards,
Anass BENJELLOUN




2014-07-31 18:41 GMT+02:00 Otis Gospodnetic-5 [via Lucene] <
ml-node+s472066n4150410...@n3.nabble.com>:

> You need to do the opposite.  Make sure accents are NOT removed at index &
> query time.
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
> On Thu, Jul 31, 2014 at 5:49 PM, benjelloun <[hidden email]
> > wrote:
>
> > hi,
> >
> > q="gene"  it suggest "geneve"
> > ASCIIFoldingFilter work like isolate accent
> >
> > what i need to suggest is "genève"
> >
> > any idea?
> >
> > thanks
> > best reagards
> > Anass BENJELLOUN
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150392.html
>
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150410.html
>  To unsubscribe from Auto suggest with adding accents, click here
> 
> .
> NAML
> 
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150569.html
Sent from the Solr - User mailing list archive at Nabble.com.