Solr limiting number of rows to indexed to 21500 every time.

2015-01-12 Thread Pankaj Sonawane
Hi,

I am using Solr DataImportHandler to index data from database
table(Oracle). One of the column contains String representation of XML
(Sample below).

**
*1*

*2*

*3*
*.*
*.*
*.*

* // can be 100-200*

I want solr to index each 'name' in 'option' tag against its value

ex. JSON for 1 row
"docs": [ {
"COL1": "F",
"COL2": "ASDF", "COL3": "ATCC", "COL4": 29039757, "A_s": "1", "B_s": "2", "
C_s": "3",
.
.
.
*  }*
// appending '_s' to 'name' attribute for making dynamic fields.


But while indexing data, *every time only 21500 rows get indexed*. After
these much records get indexed I got following exception:

*1320927 [Thread-15] ERROR
org.apache.solr.handler.dataimport.EntityProcessorBase  û getNext() failed
for query 'SELECT col1,col2,col3,col4,XMLSERIALIZE(col5 AS  CLOB) AS col5
FROM
tableName':org.apache.solr.handler.dataimport.DataImportHandlerException:
java.sql.SQLRecoverableException: No more data to read from socket*
*at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:63)*
*at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:378)*
*at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$600(JdbcDataSource.java:258)*
*at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:293)*
*at
org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:116)*
*at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:75)*
*at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)*
*at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)*
*at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)*
*at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)*
*at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)*
*at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)*
*at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)*
*at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)*
*Caused by: java.sql.SQLRecoverableException: No more data to read from
socket*
*at
oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1200)*
*at
oracle.jdbc.driver.T4CMAREngine.unmarshalCLR(T4CMAREngine.java:1865)*
*at
oracle.jdbc.driver.T4CMAREngine.unmarshalCLR(T4CMAREngine.java:1757)*
*at
oracle.jdbc.driver.T4CMAREngine.unmarshalCLR(T4CMAREngine.java:1750)*
*at
oracle.jdbc.driver.T4CClobAccessor.handlePrefetch(T4CClobAccessor.java:543)*
*at
oracle.jdbc.driver.T4CClobAccessor.unmarshalOneRow(T4CClobAccessor.java:197)*
*at oracle.jdbc.driver.T4CTTIrxd.unmarshal(T4CTTIrxd.java:916)*
*at oracle.jdbc.driver.T4CTTIrxd.unmarshal(T4CTTIrxd.java:835)*
*at oracle.jdbc.driver.T4C8Oall.readRXD(T4C8Oall.java:664)*
*at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:328)*
*at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:186)*
*at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:521)*
*at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:194)*
*at oracle.jdbc.driver.T4CStatement.fetch(T4CStatement.java:1074)*
*at
oracle.jdbc.driver.OracleResultSetImpl.close_or_fetch_from_next(OracleResultSetImpl.java:369)*
*at
oracle.jdbc.driver.OracleResultSetImpl.next(OracleResultSetImpl.java:273)*
*at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:370)*
*... 12 more*

*1320928 [Thread-15] ERROR org.apache.solr.handler.dataimport.DocBuilder  û
Exception while processing: e1 document : SolrInputDocument(fields:
[]):org.apache.solr.handler.dataimport.DataImportHandlerException:
java.sql.SQLRecoverableException: No more data to read from socket*
*at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:63)*
*at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:378)*
*at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$600(JdbcDataSource.java:258)*
*at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:293)*
*at
org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:116)*
*at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:75)*
*at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)*
*at
org.apache.solr.handler.dataimport.DocBuilder

Re: Frequent deletions

2015-01-12 Thread ig01
Hi,

Unfortunately this is the case, we do have hundreds of millions of documents
on one 
Solr instance/server. All our configs and schema are with default
configurations. Our index
size is 180G, does that mean that we need at least 180G heap size?

Thanks.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Frequent-deletions-tp4176689p4179122.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Beachhead

2015-01-12 Thread Shawn Heisey
On 1/12/2015 11:52 PM, William Bell wrote:
> Using Amazon Ec2, we are using one machine to replicate to other instances
> in the Region.
> 
> Instead of using 8GB of RAM, is there a way to replicate and use a LOT less
> memory?
> 
> Would like to use t2.medium...

Can you provide more details about your setup and describe the exact
problem you're having?

It's been a really long time since I actually used the master-slave
replication feature in Solr, but I don't recall any problems with it
being a major memory hog.

Thanks,
Shawn



Beachhead

2015-01-12 Thread William Bell
Using Amazon Ec2, we are using one machine to replicate to other instances
in the Region.

Instead of using 8GB of RAM, is there a way to replicate and use a LOT less
memory?

Would like to use t2.medium...

Thoughts?

-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Understanding SolrCloud Restart Behavior - 4.6 onwards

2015-01-12 Thread KNitin
Hi

 I am trying to understand the process/node restart flow in a SolrCloud
Cluster  . What are the exact set of steps occur (like core/collection
recovery, zk interaction etc) when a node is getting restarted?

I am looking to implement some business logic at a collection/node level
when solr is restarted.

Any pointers on which classes to look at would be really helpful.

Thanks
Nitin


Re: Unexplained leader initiated recovery after updates

2015-01-12 Thread Lindsay Martin
I have uncovered some additional details in the shard leader log:

2015-01-11 09:38:00.693 [qtp268575911-3617101] INFO
org.apache.solr.update.processor.LogUpdateProcessor  – [listings]
webapp=/solr path=/update
params{distrib.from=http://solr05.search.abebooks.com:8983/solr/listings/&u
pdate.distrib=TOLEADER&wt=javabin&version=2} {add=[14065572860
(1490024273004199936)]} 0 707
2015-01-11 09:38:00.913 [updateExecutor-1-thread-35734] ERROR
org.apache.solr.update.StreamingSolrServers  – error
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:196)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at 
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessi
onInputBuffer.java:160)
at 
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java
:84)
at 
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSession
InputBuffer.java:273)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpRe
sponseParser.java:140)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpRe
sponseParser.java:57)
at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.j
ava:260)
at 
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(Abs
tractHttpClientConnection.java:283)
at 
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(Def
aultClientConnection.java:251)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader
(ManagedClientConnectionImpl.java:197)
at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestE
xecutor.java:271)
at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.ja
va:123)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultReques
tDirector.java:682)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDi
rector.java:486)
at 
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient
.java:863)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient
.java:82)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient
.java:106)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient
.java:57)
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(Con
currentUpdateSolrServer.java:233)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1
145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:
615)
at java.lang.Thread.run(Thread.java:745)
2015-01-11 09:38:00.917 [qtp268575911-3616964] WARN
org.apache.solr.update.processor.DistributedUpdateProcessor  – Error
sending update
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:196)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at 
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessi
onInputBuffer.java:160)
at 
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java
:84)
at 
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSession
InputBuffer.java:273)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpRe
sponseParser.java:140)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpRe
sponseParser.java:57)
at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.j
ava:260)
at 
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(Abs
tractHttpClientConnection.java:283)
at 
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(Def
aultClientConnection.java:251)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader
(ManagedClientConnectionImpl.java:197)
at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestE
xecutor.java:271)
at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.ja
va:123)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultReques
tDirector.java:682)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDi
rector.java:486)
at 
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient
.java:863)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient
.java:82)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient
.java:106)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient
.java:57)
at 
org.a

Re: get one document per value in multivalued field

2015-01-12 Thread vit
The field must be single-valued for grouping. That is why I do not consider
this option. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/get-one-document-per-value-in-multivalued-field-tp4179056p4179065.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: get one document per value in multivalued field

2015-01-12 Thread Shawn Heisey

On 1/12/2015 3:09 PM, vit wrote:

I use Solr4.21
my multivalued filed is like that:
q=(category_id:(484986 520623 484339 519258 516227 486757) ..

How to construct a query which will show one top document per category_id
value?


This is a feature called grouping, or field collapsing.

https://wiki.apache.org/solr/FieldCollapsing
https://cwiki.apache.org/confluence/display/solr/Result+Grouping

Thanks,
Shawn



get one document per value in multivalued field

2015-01-12 Thread vit
I use Solr4.21
my multivalued filed is like that:
q=(category_id:(484986 520623 484339 519258 516227 486757) ..

How to construct a query which will show one top document per category_id
value?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/get-one-document-per-value-in-multivalued-field-tp4179056.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to configure Solr PostingsFormat block size

2015-01-12 Thread Chris Hostetter

: It looks like this is a good starting point:
: 
: http://wiki.apache.org/solr/SolrConfigXml#codecFactory

The default "SchemaCodecFactory" already supports defining a diff posting 
format per fieldType - but there isn't much in solr to let you "tweak" 
individual options on specific posting formats via configuration.

So what you'd need to do is write a small subclass of 
Lucene41PostingsFormat that called "super(yourMin, yourMax)" in it's 
constructor.




: On 01/12/2015 03:37 PM, Tom Burton-West wrote:
: > Hello all,
: > 
: > Our indexes have around 3 billion unique terms, so for Solr 3, we set
: > TermIndexInterval to about 8 times the default.  The net effect of this is
: > to reduce the size of the in-memory index by about 1/8th.  (For background
: > see for
: > http://www.hathitrust.org/blogs/large-scale-search/too-many-words-again, )
: > 
: > We would like to do something similar for Solr4.   T
: > 
: > he Lucene 4.10.2 JavaDoc for setTermIndexInterval suggests how this can be
: > done by setting the minimum and maximum size for a block in Lucene code (
: > 
http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/index/IndexWriterConfig.html#setTermIndexInterval%28int%29
: > )
: > "For example, Lucene41PostingsFormat
: > 

: > implements the term index instead based upon how terms share prefixes. To
: > configure its parameters (the minimum and maximum size for a block), you
: > would instead use Lucene41PostingsFormat.Lucene41PostingsFormat(int, int)
: > 
.
: > which can also be configured on a per-field basis"
: > 
: > How can we configure Solr to use different (i.e. non-default) mimum and
: > maximum block sizes?
: > 
: > Tom
: > 
: 
: 

-Hoss
http://www.lucidworks.com/


Re: How to configure Solr PostingsFormat block size

2015-01-12 Thread Michael Sokolov

It looks like this is a good starting point:

http://wiki.apache.org/solr/SolrConfigXml#codecFactory

-Mike

On 01/12/2015 03:37 PM, Tom Burton-West wrote:

Hello all,

Our indexes have around 3 billion unique terms, so for Solr 3, we set
TermIndexInterval to about 8 times the default.  The net effect of this is
to reduce the size of the in-memory index by about 1/8th.  (For background
see for
http://www.hathitrust.org/blogs/large-scale-search/too-many-words-again, )

We would like to do something similar for Solr4.   T

he Lucene 4.10.2 JavaDoc for setTermIndexInterval suggests how this can be
done by setting the minimum and maximum size for a block in Lucene code (
http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/index/IndexWriterConfig.html#setTermIndexInterval%28int%29
)
"For example, Lucene41PostingsFormat

implements the term index instead based upon how terms share prefixes. To
configure its parameters (the minimum and maximum size for a block), you
would instead use Lucene41PostingsFormat.Lucene41PostingsFormat(int, int)
.
which can also be configured on a per-field basis"

How can we configure Solr to use different (i.e. non-default) mimum and
maximum block sizes?

Tom





How to configure Solr PostingsFormat block size

2015-01-12 Thread Tom Burton-West
Hello all,

Our indexes have around 3 billion unique terms, so for Solr 3, we set
TermIndexInterval to about 8 times the default.  The net effect of this is
to reduce the size of the in-memory index by about 1/8th.  (For background
see for
http://www.hathitrust.org/blogs/large-scale-search/too-many-words-again, )

We would like to do something similar for Solr4.   T

he Lucene 4.10.2 JavaDoc for setTermIndexInterval suggests how this can be
done by setting the minimum and maximum size for a block in Lucene code (
http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/index/IndexWriterConfig.html#setTermIndexInterval%28int%29
)
"For example, Lucene41PostingsFormat

implements the term index instead based upon how terms share prefixes. To
configure its parameters (the minimum and maximum size for a block), you
would instead use Lucene41PostingsFormat.Lucene41PostingsFormat(int, int)
.
which can also be configured on a per-field basis"

How can we configure Solr to use different (i.e. non-default) mimum and
maximum block sizes?

Tom


Re: Custom plugin classloader issue

2015-01-12 Thread Mohmed Hussain
Thanks Chris, that worked. Loaded my Spring Application context with plugin
class loader. Was trying to resolve this since a day, and you resolved it
in a minute :)

Thanks
-Hussain

On Mon, Jan 12, 2015 at 11:29 AM, Chris Hostetter 
wrote:

>
> :I am stuck at a strange issue, I have my custom Query Component that
> has
> : to load spring application context for some additional runtime filtering
> of
> : records.
> :I have included my jars as dependency in solrConfig.xml, SOLR is able
> to
> : load my plugin but spring appplication fails to load with error.
>
> when you instantiate your Spring context, you have to make it aware of the
> ClassLoader you get from the SolrResourceLoader so that Spring knows where
> to find the resources in your jar.
>
> first google result i found about specifying a classloader when
> instantiating spring...
>
>
> https://stackoverflow.com/questions/5660115/loading-spring-context-with-specific-classloader
>
>
> : This is how my solrConfig looks
> :   
> :
> : Following is the stack trace, any pointer of what is causing the issue
> :
> : org.springframework.beans.factory.BeanDefinitionStoreException:
> IOException
> : parsing XML document from class path resource
> : [com/myapp/spring/myapp-spring-master.xml]; nested exception is
> : java.io.FileNotFoundException: class path resource
> : [com/myapp/spring/myapp-spring-master.xml] cannot be opened because it
> does
> : not exist
> : at
> :
> com.myapp.spring.MyAppApplicationContextAware.loadApplicationContext(MyAppApplicationContextAware.java:69)
> : at
> :
> com.myapp.spring.MyAppApplicationContextAware.getApplicationContext(MyAppApplicationContextAware.java:48)
> : at
> :
> org.apache.solr.handler.component.CustomSecureQueryComponent.authenticateUser(CustomSecureQueryComponent.java:294)
> : at
> :
> org.apache.solr.handler.component.CustomSecureQueryComponent.doPrefetch(CustomSecureQueryComponent.java:67)
> :
> :
> : Thanks
> : -Hussain
> :
>
> -Hoss
> http://www.lucidworks.com/
>


Re: Frequent deletions

2015-01-12 Thread Shawn Heisey
On 1/10/2015 11:46 PM, ig01 wrote:
> Thank you all for your response,
> The thing is that we have 180G index while half of it are deleted documents.
> We  tried to run an optimization in order to shrink index size but it
> crashes on ‘out of memory’ when the process reaches 120G.   
> Is it possible to optimize parts of the index? 
> Please advise what can we do in this situation.

If you are getting "OutOfMemoryError" exceptions from Java, that means
your heap isn't large enough to accomplish what you have asked the
program to do (between the configuration and what you have actually
requested).  You'll either need to allocate more memory to the heap, or
you need to change your config so less memory is required.

I see from a later reply that the 120GB size you have mentioned is your
Java heap.  Unless you've got hundreds of millions of documents on one
Solr instance/server (which would not be a good idea) and/or a serious
misconfiguration, I cannot imagine needing a heap that big for Solr.

The largest index on my dev Solr server has 98 million documents in
seven shards, with a total index size a little over 120GB (six shards
each 20GB and a seventh shard that's less than 1GB), and my heap size is
7 gigabytes.  There is a smaller index as well with 17 million docs in
three shards, that one is about 10GB on disk.  Unlike the production
servers, the dev server has all the index data contained on one server.

Here's a wiki page that covers things which cause large heap
requirements.  A later section also describes steps you can take to
reduce memory usage.

https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

How many documents do you have on a single Solr server?  Can you use a
site like http://apaste.info to share your solrconfig.xml?  I don't know
if we'll need the schema, but it might be a good idea to share that as well.

Thanks,
Shawn



Re: Custom plugin classloader issue

2015-01-12 Thread Chris Hostetter

:I am stuck at a strange issue, I have my custom Query Component that has
: to load spring application context for some additional runtime filtering of
: records.
:I have included my jars as dependency in solrConfig.xml, SOLR is able to
: load my plugin but spring appplication fails to load with error.

when you instantiate your Spring context, you have to make it aware of the 
ClassLoader you get from the SolrResourceLoader so that Spring knows where 
to find the resources in your jar.

first google result i found about specifying a classloader when 
instantiating spring...

https://stackoverflow.com/questions/5660115/loading-spring-context-with-specific-classloader


: This is how my solrConfig looks
:   
: 
: Following is the stack trace, any pointer of what is causing the issue
: 
: org.springframework.beans.factory.BeanDefinitionStoreException: IOException
: parsing XML document from class path resource
: [com/myapp/spring/myapp-spring-master.xml]; nested exception is
: java.io.FileNotFoundException: class path resource
: [com/myapp/spring/myapp-spring-master.xml] cannot be opened because it does
: not exist
: at
: 
com.myapp.spring.MyAppApplicationContextAware.loadApplicationContext(MyAppApplicationContextAware.java:69)
: at
: 
com.myapp.spring.MyAppApplicationContextAware.getApplicationContext(MyAppApplicationContextAware.java:48)
: at
: 
org.apache.solr.handler.component.CustomSecureQueryComponent.authenticateUser(CustomSecureQueryComponent.java:294)
: at
: 
org.apache.solr.handler.component.CustomSecureQueryComponent.doPrefetch(CustomSecureQueryComponent.java:67)
: 
: 
: Thanks
: -Hussain
: 

-Hoss
http://www.lucidworks.com/


Re: Unexplained leader initiated recovery after updates

2015-01-12 Thread Lindsay Martin
Here are more details about our setup:

Zookeeper:
* 3 separate hosts in same rack as Solr cluster
* Zookeeper hosts do not run any other processes

Solr:
* total servers: 24 (plus 2 cold standbys in case of host failure)
* physical memory: 65931872 kB (62 GB)
* max JVM heap size: -Xmx10880m ( 10 GB)
* only one Solr per host

On the Œindex¹ directory size front, I am seeing some differences in the
disk usage between leaders and replicas.

In 1 / 12 shards, there is no difference in size between the leader and
replica.

In 6 / 12 shards, there 1 1 G difference in size between the leader and
replica. Both have one index directory.

In 5 / 12 shards, the replica is a multiple of the leader size, due to
multiple index directories on disk.

For example, shard 1 leader has a directory named
'index.20140624071707699¹ 30 G in size. The replica has two directories:
'index.20150108052156468¹ at 31G and Œindex.20140624071556270¹ at 32G.

Thanks,

Lindsay

On 2015-01-09, 5:01 PM, "Shawn Heisey"  wrote:

>On 1/9/2015 4:54 PM, Lindsay Martin wrote:
>> I am experiencing a problem where Solr nodes go into recovery following
>>an update cycle.
>
>
>
>> For background, here are some details about our configuration:
>> * Solr 4.10.2 (problem also observed with Solr 4.6.1)
>> * 12 shards with 2 nodes per shard
>> * a single updater running in a separate subnet is posting updates
>>using the SolrJ CloudSolrServer client. Updates are triggered hourly.
>> * system is under continuous query load
>> * autoCommit is set to 821 seconds
>> * autoSoftCommit is set to 303 seconds
>
>I would suspect some kind of performance problem that likely results in
>the zkClientTimeout expiring.  I have a standard set of questions for
>performance problems.
>
>Questions about zookeeper:
>
>How many ZK nodes?  Is zookeeper on separate hardware?  If it's on the
>same hardware as Solr, is its database on the same disk spindles as the
>Solr index, or separate spindles?  Is zookeeper standalone or embedded
>in Solr?  If it's standalone, do you happen to know the java max heap
>for the zookeeper processes?
>
>Questions about Solr and the hardware:
>
>How many total Solr servers?  How much RAM is installed on each one?
>What is the max size of the Java heap?  Are you running more than one
>Solr (JVM/container) instance per machine?
>
>If you add up all the "index" directories on a server, how much disk
>space does it take?  Is the amount of disk space used similar on all of
>the servers?
>
>Thanks,
>Shawn
>



Custom plugin classloader issue

2015-01-12 Thread Mohmed Hussain
Hi All,
   I am stuck at a strange issue, I have my custom Query Component that has
to load spring application context for some additional runtime filtering of
records.
   I have included my jars as dependency in solrConfig.xml, SOLR is able to
load my plugin but spring appplication fails to load with error.

This is how my solrConfig looks
  

Following is the stack trace, any pointer of what is causing the issue

org.springframework.beans.factory.BeanDefinitionStoreException: IOException
parsing XML document from class path resource
[com/myapp/spring/myapp-spring-master.xml]; nested exception is
java.io.FileNotFoundException: class path resource
[com/myapp/spring/myapp-spring-master.xml] cannot be opened because it does
not exist
at
com.myapp.spring.MyAppApplicationContextAware.loadApplicationContext(MyAppApplicationContextAware.java:69)
at
com.myapp.spring.MyAppApplicationContextAware.getApplicationContext(MyAppApplicationContextAware.java:48)
at
org.apache.solr.handler.component.CustomSecureQueryComponent.authenticateUser(CustomSecureQueryComponent.java:294)
at
org.apache.solr.handler.component.CustomSecureQueryComponent.doPrefetch(CustomSecureQueryComponent.java:67)


Thanks
-Hussain


Re: Problem with getting node active

2015-01-12 Thread O. Klein
UpdateLog got commented. Problem solved.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-getting-node-active-tp4178942p4179013.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: filter on solr pivot data

2015-01-12 Thread Darniz
Thanks for the reply

but a filter query like -[* TO *] will give me vins which dont have a photo,
it might qualify a dealer to show up but what if that dealer has other vin
which has photo

my requirement is i want to show the dealer only if all vin  have no photos



--
View this message in context: 
http://lucene.472066.n3.nabble.com/filter-on-solr-pivot-data-tp4178451p4179011.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Extending solr analysis in index time

2015-01-12 Thread Markus Jelsma
Hi - You mention having a list with important terms, then using payloads would 
be the most straightforward i suppose. You still need a custom similarity and 
custom query parser. Payloads work for us very well.

M

 
 
-Original message-
> From:Ahmet Arslan 
> Sent: Monday 12th January 2015 19:50
> To: solr-user@lucene.apache.org
> Subject: Re: Extending solr analysis in index time
> 
> Hi Ali,
> 
> Reading your example, if you could somehow replace idf component with your 
> "importance weight",
> I think your use case looks like TFIDFSimilarity. Tf component remains same.
> 
> https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
> 
> I also suggest you ask this in lucene mailing list. Someone familiar with 
> similarity package can give insight on this.
> 
> Ahmet
> 
> 
> 
> On Monday, January 12, 2015 6:54 PM, Jack Krupansky 
>  wrote:
> Could you clarify what you mean by "Lucene reverse index"? That's not a
> term I am familiar with.
> 
> -- Jack Krupansky
> 
> 
> On Mon, Jan 12, 2015 at 1:01 AM, Ali Nazemian  wrote:
> 
> > Dear Jack,
> > Thank you very much.
> > Yeah I was thinking of function query for sorting, but I have to problems
> > in this case, 1) function query do the process at query time which I dont
> > want to. 2) I also want to have the score field for retrieving and showing
> > to users.
> >
> > Dear Alexandre,
> > Here is some more explanation about the business behind the question:
> > I am going to provide a field for each document, lets refer it as
> > "document_score". I am going to fill this field based on the information
> > that could be extracted from Lucene reverse index. Assume I have a list of
> > terms, called important terms and I am going to extract the term frequency
> > for each of the terms inside this list per each document. To be honest I
> > want to use the term frequency for calculating "document_score".
> > "document_score" should be storable since I am going to retrieve this field
> > for each document. I also want to do sorting on "document_store" in case of
> > preferred by user.
> > I hope I did convey my point.
> > Best regards.
> >
> >
> > On Mon, Jan 12, 2015 at 12:53 AM, Jack Krupansky  > >
> > wrote:
> >
> > > Won't function queries do the job at query time? You can add or multiply
> > > the tf*idf score by a function of the term frequency of arbitrary terms,
> > > using the tf, mul, and add functions.
> > >
> > > See:
> > > https://cwiki.apache.org/confluence/display/solr/Function+Queries
> > >
> > > -- Jack Krupansky
> > >
> > > On Sun, Jan 11, 2015 at 10:55 AM, Ali Nazemian 
> > > wrote:
> > >
> > > > Dear Jack,
> > > > Hi,
> > > > I think you misunderstood my need. I dont want to change the default
> > > > scoring behavior of Lucene (tf-idf) I just want to have another field
> > to
> > > do
> > > > sorting for some specific queries (not all the search business),
> > however
> > > I
> > > > am aware of Lucene payload.
> > > > Thank you very much.
> > > >
> > > > On Sun, Jan 11, 2015 at 7:15 PM, Jack Krupansky <
> > > jack.krupan...@gmail.com>
> > > > wrote:
> > > >
> > > > > You would do that with a custom similarity (scoring) class. That's an
> > > > > expert feature. In fact a SUPER-expert feature.
> > > > >
> > > > > Start by completely familiarizing yourself with how TF*IDF
> > similarity
> > > > > already works:
> > > > >
> > > > >
> > > >
> > >
> > http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
> > > > >
> > > > > And to use your custom similarity class in Solr:
> > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements#OtherSchemaElements-Similarity
> > > > >
> > > > >
> > > > > -- Jack Krupansky
> > > > >
> > > > > On Sun, Jan 11, 2015 at 9:04 AM, Ali Nazemian  > >
> > > > > wrote:
> > > > >
> > > > > > Hi everybody,
> > > > > >
> > > > > > I am going to add some analysis to Solr at the index time. Here is
> > > > what I
> > > > > > am considering in my mind:
> > > > > > Suppose I have two different fields for Solr schema, field "a" and
> > > > field
> > > > > > "b". I am going to use the created reverse index in a way that some
> > > > terms
> > > > > > are considered as important ones and tell lucene to calculate a
> > value
> > > > > based
> > > > > > on these terms frequency per each document. For example let the
> > word
> > > > > > "hello" considered as important word with the weight of "2.0".
> > > Suppose
> > > > > the
> > > > > > term frequency for this word at field "a" is 3 and at field "b" is
> > 6
> > > > for
> > > > > > document 1. Therefor the score value would be 2*3+(2*6)^2. I want
> > to
> > > > > > calculate this score based on these fields and put it in the index
> > > for
> > > > > > retrieving. My question would be how can I do such thing? First I
> > did
> > > > > > consider using term component for calculating this value from
> > outside
> > > > and
> > > > > > p

Re: Extending solr analysis in index time

2015-01-12 Thread Ahmet Arslan
Hi Ali,

Reading your example, if you could somehow replace idf component with your 
"importance weight",
I think your use case looks like TFIDFSimilarity. Tf component remains same.

https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

I also suggest you ask this in lucene mailing list. Someone familiar with 
similarity package can give insight on this.

Ahmet



On Monday, January 12, 2015 6:54 PM, Jack Krupansky  
wrote:
Could you clarify what you mean by "Lucene reverse index"? That's not a
term I am familiar with.

-- Jack Krupansky


On Mon, Jan 12, 2015 at 1:01 AM, Ali Nazemian  wrote:

> Dear Jack,
> Thank you very much.
> Yeah I was thinking of function query for sorting, but I have to problems
> in this case, 1) function query do the process at query time which I dont
> want to. 2) I also want to have the score field for retrieving and showing
> to users.
>
> Dear Alexandre,
> Here is some more explanation about the business behind the question:
> I am going to provide a field for each document, lets refer it as
> "document_score". I am going to fill this field based on the information
> that could be extracted from Lucene reverse index. Assume I have a list of
> terms, called important terms and I am going to extract the term frequency
> for each of the terms inside this list per each document. To be honest I
> want to use the term frequency for calculating "document_score".
> "document_score" should be storable since I am going to retrieve this field
> for each document. I also want to do sorting on "document_store" in case of
> preferred by user.
> I hope I did convey my point.
> Best regards.
>
>
> On Mon, Jan 12, 2015 at 12:53 AM, Jack Krupansky  >
> wrote:
>
> > Won't function queries do the job at query time? You can add or multiply
> > the tf*idf score by a function of the term frequency of arbitrary terms,
> > using the tf, mul, and add functions.
> >
> > See:
> > https://cwiki.apache.org/confluence/display/solr/Function+Queries
> >
> > -- Jack Krupansky
> >
> > On Sun, Jan 11, 2015 at 10:55 AM, Ali Nazemian 
> > wrote:
> >
> > > Dear Jack,
> > > Hi,
> > > I think you misunderstood my need. I dont want to change the default
> > > scoring behavior of Lucene (tf-idf) I just want to have another field
> to
> > do
> > > sorting for some specific queries (not all the search business),
> however
> > I
> > > am aware of Lucene payload.
> > > Thank you very much.
> > >
> > > On Sun, Jan 11, 2015 at 7:15 PM, Jack Krupansky <
> > jack.krupan...@gmail.com>
> > > wrote:
> > >
> > > > You would do that with a custom similarity (scoring) class. That's an
> > > > expert feature. In fact a SUPER-expert feature.
> > > >
> > > > Start by completely familiarizing yourself with how TF*IDF
> similarity
> > > > already works:
> > > >
> > > >
> > >
> >
> http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
> > > >
> > > > And to use your custom similarity class in Solr:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements#OtherSchemaElements-Similarity
> > > >
> > > >
> > > > -- Jack Krupansky
> > > >
> > > > On Sun, Jan 11, 2015 at 9:04 AM, Ali Nazemian  >
> > > > wrote:
> > > >
> > > > > Hi everybody,
> > > > >
> > > > > I am going to add some analysis to Solr at the index time. Here is
> > > what I
> > > > > am considering in my mind:
> > > > > Suppose I have two different fields for Solr schema, field "a" and
> > > field
> > > > > "b". I am going to use the created reverse index in a way that some
> > > terms
> > > > > are considered as important ones and tell lucene to calculate a
> value
> > > > based
> > > > > on these terms frequency per each document. For example let the
> word
> > > > > "hello" considered as important word with the weight of "2.0".
> > Suppose
> > > > the
> > > > > term frequency for this word at field "a" is 3 and at field "b" is
> 6
> > > for
> > > > > document 1. Therefor the score value would be 2*3+(2*6)^2. I want
> to
> > > > > calculate this score based on these fields and put it in the index
> > for
> > > > > retrieving. My question would be how can I do such thing? First I
> did
> > > > > consider using term component for calculating this value from
> outside
> > > and
> > > > > put it back to Solr index, but it seems it is not efficient enough.
> > > > >
> > > > > Thank you very much.
> > > > > Best regards.
> > > > >
> > > > > --
> > > > > A.Nazemian
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > A.Nazemian
> > >
> >
>
>
>
> --
> A.Nazemian
>


Re: Why suggestions can be that slow?

2015-01-12 Thread FiMko
Erick, thanks for the answer. You're absolutely right!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-suggestions-can-be-that-slow-tp4178944p4179007.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Distributed unit tests and SSL doesn't have a valid keystore

2015-01-12 Thread Mark Miller
I'd have to do some digging. Hossman might know offhand. You might just
want to use @SupressSSL on the tests :)

- Mark

On Mon Jan 12 2015 at 8:45:11 AM Markus Jelsma 
wrote:

> Hi - in a small Maven project depending on Solr 4.10.3, running unit tests
> that extend BaseDistributedSearchTestCase randomly fail with "SSL doesn't
> have a valid keystore", and a lot of zombie threads. We have a
> solrtest.keystore file laying around, but where to put it?
>
> Thanks,
> Markus
>


Re: SolrCloud shard leader elections - Altering zookeeper sequence numbers

2015-01-12 Thread Erick Erickson
Just skimming, but the problem here that I ran into was with the
listeners. Each _Solr_ instance out there is listening to one of the
ephemeral nodes (the "one in front"). So deleting a node does _not_
change which ephemeral node the associated Solr instance is listening
to.

So, for instance, when you delete S2..n-01 and re-add it, S2 is
still looking at S1n-00 and will continue looking at
S1...n-00 until S1n-00 is deleted.

Deleting S2..n-01 will wake up S3 though, which should now be
looking at S1n-000. Now you have two Solr listeners looking at
the same ephemeral node. The key is that deleting S2...n-01 does
_not_ wake up S2, just any solr instance that has a watch on the
associated ephemeral node.

The code you want is in LeaderElector.checkIfIamLeader to understand
how it all works. Be aware that the sortSeqs call sorts the nodes by
1> sequence number
2> string comparison.

Which has the unfortunate characteristic of a secondary sort by
session ID. So two nodes with the same sequence number can sort before
or after each other depending on which one gets a session higher/lower
than the other.

This is quite tricky to get right, I once created a patch for 4.10.3
by applying things in this order (some minor tweaks required). All
SOLR-
6115
6512
6577
6513
6517
6670
6691

Good luck!
Erick




On Mon, Jan 12, 2015 at 8:54 AM, Zisis Tachtsidis  wrote:
> SolrCloud uses ZooKeeper sequence flags to keep track of the order in which
> nodes register themselves as leader candidates. The node with the lowest
> sequence number wins as leader of the shard.
>
> What I'm trying to do is to keep the leader re-assignments to the minimum
> during a rolling restart. In this direction I change the zk sequence numbers
> on the SolrCloud nodes when all nodes of the cluster are up and active. I'm
> using Solr 4.10.0 and I'm aware of SOLR-6491 which has a similar purpose but
> I'm trying to do it from "outside", using the existing APIs without editing
> Solr source code.
>
> == TYPICAL SCENARIO ==
> Suppose we have 3 Solr instances S1,S2,S3. They are started in the same
> order and the zk sequences assigned have as follows
> S1:-n_00 (LEADER)
> S2:-n_01
> S3:-n_02
>
> In a rolling restart we'll get S2 as leader (after S1 shutdown), then S3
> (after S2 shutdown) and finally S1(after S3 shutdown), 3 changes in total.
>
> == MY ATTEMPT ==
> By using SolrZkClient and the Zookeeper multi API  I found a way to get rid
> of the old zknodes that participate in a shard's leader election and write
> new ones where we can assign the sequence number of our liking.
>
> S1:-n_00 (no code running here)
> S2:-n_04 (code deleting zknode -n_01 and creating
> -n_04)
> S3:-n_03 (code deleting zknode -n_02 and creating
> -n_03)
>
> In a rolling restart I'd expect to have S3 as leader (after S1 shutdown), no
> change (after S2 shutdown) and finally S1(after S3 shutdown), that is 2
> changes. This will be constant no matter how many servers are added in
> SolrCloud while in the first scenarion the # of re-assignments equals the #
> of Solr servers.
>
> The problem occurs when S1 (LEADER) is shut down. The elections that take
> place still set S2 as leader, It's like ignoring the new sequence numbers.
> When I go to /solr/#/~cloud?view=tree the new sequence numbers are listed
> under "/collections" based on which S3 should have become the leader.
> Do you have any idea why the new state is not acknowledged during the
> elections? Is something cached? Or to put it bluntly do I have any chance
> down this path? If not what are my options? Is it possible to apply all
> patches under SOLR-6491 in isolation and continue from there?
>
> Thank you.
>
> Extra info which might help follows
> 1. Some logging related to leader elections after S1 has been shut down
> S2 - org.apache.solr.cloud.SyncStrategy Leader's attempt to sync with
> shard failed, moving to the next candidate
> S2 - org.apache.solr.cloud.ShardLeaderElectionContext We failed sync,
> but we have no versions - we can't sync in that
>case - we were active before, so become leader anyway
>
> S3 - org.apache.solr.cloud.LeaderElector Our node is no longer in line
> to be leader
>
> 2. And some sample code on how I perform the ZK re-sequencing
>// Read current zk nodes for a specific collection
>
> solrServer.getZkStateReader().getZkClient().getSolrZooKeeper().getChildren("/collections/core/leader_elect/shard1
>   /election", true)
>// node deletion
>   Op.delete(path, -1)
>// node creation
>   Op.create(createPath, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE,
> CreateMode.EPHEMERAL_SEQUENTIAL);
>// Perform operations
>
> solrServer.getZkStateReader().getZkClient().getSolrZooKeeper().multi(opsList);
>   solrServer.getZkStateReader().updateClusterState(true);
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.

RE: Determining the Number of Solr Shards

2015-01-12 Thread Andrew Butkus
We decided to downgrade to 20 shards again, as we kept having the query time 
spikes, if it was a memory issue, I would assume we would have the same 
performance issues with 20 shards, so I think this is maybe a problem in solr 
rather than our configuration / amount of ram.


In anycase, we have thought about adding some more servers to the solrcloud we 
have, is there an easy way to add servers to a quoram without having to reshard 
and re-index? I have looked at the collections API and not discovered one yet 
...

Thanks

Andy


-Original Message-
From: Jack Krupansky [mailto:jack.krupan...@gmail.com] 
Sent: 08 January 2015 22:17
To: solr-user@lucene.apache.org
Subject: Re: Determining the Number of Solr Shards

My final advice would be my standard proof of concept implementation advice
- test a configuration with 10% (or 5%) of the target data size and 10% (or
5%) of the estimated resource requirements (maybe 25% of the estimated RAM) and 
see how well it performs.

Take the actual index size and multiply by 10 (or 20 for a 5% load) to get a 
closer estimate of total storage required.

If a 10% load fails to perform well with 25% of the total estimated RAM, then 
you can be sure that you'll have problems with 10x the data and only 4x the 
RAM. Increase the RAM for that 10 load until you get acceptable performance for 
both indexing and a full range of queries, and then use 10x that RAM for the 
RAM for the 100% load. That's the OS system memory for file caching, not the 
total system RAM.

-- Jack Krupansky

On Thu, Jan 8, 2015 at 4:55 PM, Nishanth S  wrote:

> Thanks guys for your inputs I would be looking at around 100 Tb of 
> total  index size  with 5100 million documents  for  a period of  30 
> days before we purge the  indexes.I had estimated it slightly on the  
> higher side of things but that's where I feel we would be.
>
> Thanks,
> Nishanth
>
> On Wed, Jan 7, 2015 at 7:50 PM, Shawn Heisey  wrote:
>
> > On 1/7/2015 7:14 PM, Nishanth S wrote:
> > > Thanks Shawn and Walter.Yes those are 12,000 writes/second.Reads  
> > > for
> the
> > > moment would be in the 1000 reads/second. Guess finding out the 
> > > right number  of  shards would be my starting point.
> >
> > I don't think indexing 12000 docs per second would be too much for 
> > Solr to handle, as long as you architect the indexing application properly.
> > You would likely need to have several indexing threads or processes 
> > that index in parallel.  Solr is fully thread-safe and can handle 
> > several indexing requests at the same time.  If the indexing 
> > application is single-threaded, indexing speed will not reach its full 
> > potential.
> >
> > Be aware that indexing at the same time as querying will reduce the 
> > number of queries per second that you can handle.  In an environment 
> > where both reads and writes are heavy like you have described, more 
> > shards and/or more replicas might be required.
> >
> > For the query side ... even 1000 queries per second is a fairly 
> > heavy query rate.  You're likely to need at least a few replicas, 
> > possibly several, to handle that.  The type and complexity of the 
> > queries you do will make a big difference as well.  To handle that 
> > query level, I would still recommend only running one shard replica 
> > on each server.  If you have three shards and three replicas, that means 9 
> > Solr servers.
> >
> > How many documents will you have in total?  You said they are about 
> > 6KB each ... but depending on the fieldType definitions (and the 
> > analysis chain for TextField types), 6KB might be very large or fairly 
> > small.
> >
> > Do you have any idea how large the Solr index will be with all your 
> > documents?  Estimating that will require indexing a significant 
> > percentage of your documents with the actual schema and config that 
> > you will use in production.
> >
> > If I know how many documents you have, how large the full index will 
> > be, and can see an example of the more complex queries you will do, 
> > I can make *preliminary* guesses about the number of shards you 
> > might need.  I do have to warn you that it will only be a guess.  
> > You'll have to experiment to see what works best.
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: How to apply SOLR-6024 to Solr 4.8

2015-01-12 Thread Shawn Heisey
On 1/12/2015 4:20 AM, Elran Dvir wrote:
> I am trying to apply SOLR-6024 patch to Solr 4.8.
> I have some compilation errors with it (detailed in Jira: 
> https://issues.apache.org/jira/i#browse/SOLR-6024).
> How can I change the patch to be applied to 4.8?

The compile errors seem to indicate that DocValues in Lucene underwent
some fairly significant low-level changes in either 4.9 or 4.10.

Figuring out how to fix it is going to require some significant
low-level internal Lucene knowledge, and ultimately it might not be
possible.  Upgrading Solr to 4.10.3 is probably going to be easier than
going down that rabbit hole.

I tracked down the lucene issue where at least one of the related
changes was made after 4.8 came out, but it would be very dangerous to
start applying those kinds of patches without the context of other
patches that came before and after.

Thanks,
Shawn



SolrCloud shard leader elections - Altering zookeeper sequence numbers

2015-01-12 Thread Zisis Tachtsidis
SolrCloud uses ZooKeeper sequence flags to keep track of the order in which
nodes register themselves as leader candidates. The node with the lowest
sequence number wins as leader of the shard.

What I'm trying to do is to keep the leader re-assignments to the minimum
during a rolling restart. In this direction I change the zk sequence numbers
on the SolrCloud nodes when all nodes of the cluster are up and active. I'm
using Solr 4.10.0 and I'm aware of SOLR-6491 which has a similar purpose but
I'm trying to do it from "outside", using the existing APIs without editing
Solr source code.

== TYPICAL SCENARIO ==
Suppose we have 3 Solr instances S1,S2,S3. They are started in the same
order and the zk sequences assigned have as follows
S1:-n_00 (LEADER)
S2:-n_01
S3:-n_02

In a rolling restart we'll get S2 as leader (after S1 shutdown), then S3
(after S2 shutdown) and finally S1(after S3 shutdown), 3 changes in total.

== MY ATTEMPT ==
By using SolrZkClient and the Zookeeper multi API  I found a way to get rid
of the old zknodes that participate in a shard's leader election and write
new ones where we can assign the sequence number of our liking. 

S1:-n_00 (no code running here)
S2:-n_04 (code deleting zknode -n_01 and creating
-n_04)
S3:-n_03 (code deleting zknode -n_02 and creating
-n_03)

In a rolling restart I'd expect to have S3 as leader (after S1 shutdown), no
change (after S2 shutdown) and finally S1(after S3 shutdown), that is 2
changes. This will be constant no matter how many servers are added in
SolrCloud while in the first scenarion the # of re-assignments equals the #
of Solr servers.

The problem occurs when S1 (LEADER) is shut down. The elections that take
place still set S2 as leader, It's like ignoring the new sequence numbers.
When I go to /solr/#/~cloud?view=tree the new sequence numbers are listed
under "/collections" based on which S3 should have become the leader.
Do you have any idea why the new state is not acknowledged during the
elections? Is something cached? Or to put it bluntly do I have any chance
down this path? If not what are my options? Is it possible to apply all
patches under SOLR-6491 in isolation and continue from there?

Thank you. 

Extra info which might help follows
1. Some logging related to leader elections after S1 has been shut down
S2 - org.apache.solr.cloud.SyncStrategy Leader's attempt to sync with
shard failed, moving to the next candidate
S2 - org.apache.solr.cloud.ShardLeaderElectionContext We failed sync,
but we have no versions - we can't sync in that 
   case - we were active before, so become leader anyway

S3 - org.apache.solr.cloud.LeaderElector Our node is no longer in line
to be leader

2. And some sample code on how I perform the ZK re-sequencing
   // Read current zk nodes for a specific collection
 
solrServer.getZkStateReader().getZkClient().getSolrZooKeeper().getChildren("/collections/core/leader_elect/shard1
  /election", true)
   // node deletion
  Op.delete(path, -1) 
   // node creation
  Op.create(createPath, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE,
CreateMode.EPHEMERAL_SEQUENTIAL);
   // Perform operations
 
solrServer.getZkStateReader().getZkClient().getSolrZooKeeper().multi(opsList);
  solrServer.getZkStateReader().updateClusterState(true);




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-shard-leader-elections-Altering-zookeeper-sequence-numbers-tp4178973.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Extending solr analysis in index time

2015-01-12 Thread Jack Krupansky
Could you clarify what you mean by "Lucene reverse index"? That's not a
term I am familiar with.

-- Jack Krupansky

On Mon, Jan 12, 2015 at 1:01 AM, Ali Nazemian  wrote:

> Dear Jack,
> Thank you very much.
> Yeah I was thinking of function query for sorting, but I have to problems
> in this case, 1) function query do the process at query time which I dont
> want to. 2) I also want to have the score field for retrieving and showing
> to users.
>
> Dear Alexandre,
> Here is some more explanation about the business behind the question:
> I am going to provide a field for each document, lets refer it as
> "document_score". I am going to fill this field based on the information
> that could be extracted from Lucene reverse index. Assume I have a list of
> terms, called important terms and I am going to extract the term frequency
> for each of the terms inside this list per each document. To be honest I
> want to use the term frequency for calculating "document_score".
> "document_score" should be storable since I am going to retrieve this field
> for each document. I also want to do sorting on "document_store" in case of
> preferred by user.
> I hope I did convey my point.
> Best regards.
>
>
> On Mon, Jan 12, 2015 at 12:53 AM, Jack Krupansky  >
> wrote:
>
> > Won't function queries do the job at query time? You can add or multiply
> > the tf*idf score by a function of the term frequency of arbitrary terms,
> > using the tf, mul, and add functions.
> >
> > See:
> > https://cwiki.apache.org/confluence/display/solr/Function+Queries
> >
> > -- Jack Krupansky
> >
> > On Sun, Jan 11, 2015 at 10:55 AM, Ali Nazemian 
> > wrote:
> >
> > > Dear Jack,
> > > Hi,
> > > I think you misunderstood my need. I dont want to change the default
> > > scoring behavior of Lucene (tf-idf) I just want to have another field
> to
> > do
> > > sorting for some specific queries (not all the search business),
> however
> > I
> > > am aware of Lucene payload.
> > > Thank you very much.
> > >
> > > On Sun, Jan 11, 2015 at 7:15 PM, Jack Krupansky <
> > jack.krupan...@gmail.com>
> > > wrote:
> > >
> > > > You would do that with a custom similarity (scoring) class. That's an
> > > > expert feature. In fact a SUPER-expert feature.
> > > >
> > > > Start by completely familiarizing yourself with how TF*IDF
> similarity
> > > > already works:
> > > >
> > > >
> > >
> >
> http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
> > > >
> > > > And to use your custom similarity class in Solr:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements#OtherSchemaElements-Similarity
> > > >
> > > >
> > > > -- Jack Krupansky
> > > >
> > > > On Sun, Jan 11, 2015 at 9:04 AM, Ali Nazemian  >
> > > > wrote:
> > > >
> > > > > Hi everybody,
> > > > >
> > > > > I am going to add some analysis to Solr at the index time. Here is
> > > what I
> > > > > am considering in my mind:
> > > > > Suppose I have two different fields for Solr schema, field "a" and
> > > field
> > > > > "b". I am going to use the created reverse index in a way that some
> > > terms
> > > > > are considered as important ones and tell lucene to calculate a
> value
> > > > based
> > > > > on these terms frequency per each document. For example let the
> word
> > > > > "hello" considered as important word with the weight of "2.0".
> > Suppose
> > > > the
> > > > > term frequency for this word at field "a" is 3 and at field "b" is
> 6
> > > for
> > > > > document 1. Therefor the score value would be 2*3+(2*6)^2. I want
> to
> > > > > calculate this score based on these fields and put it in the index
> > for
> > > > > retrieving. My question would be how can I do such thing? First I
> did
> > > > > consider using term component for calculating this value from
> outside
> > > and
> > > > > put it back to Solr index, but it seems it is not efficient enough.
> > > > >
> > > > > Thank you very much.
> > > > > Best regards.
> > > > >
> > > > > --
> > > > > A.Nazemian
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > A.Nazemian
> > >
> >
>
>
>
> --
> A.Nazemian
>


ApacheCon 2015 at Austin, TX

2015-01-12 Thread CP Mishra
Hi,

I am planning to attend ApacheCon 2015 at Austin, TX (Apr 13-16th) and
wondering if there will be lucene/solr sessions in it.

Anyone else planning to attend?

Thanks,
CP


Re: leader split-brain at least once a day - need help

2015-01-12 Thread Mark Miller
bq. ClusterState says we are the leader, but locally we don't think so

Generally this is due to some bug. One bug that can lead to it was recently
fixed in 4.10.3 I think. What version are you on?

- Mark

On Mon Jan 12 2015 at 7:35:47 AM Thomas Lamy  wrote:

> Hi,
>
> I found no big/unusual GC pauses in the Log (at least manually; I found
> no free solution to analyze them that worked out of the box on a
> headless debian wheezy box). Eventually i tried with -Xmx8G (was 64G
> before) on one of the nodes, after checking allocation after 1 hour run
> time was at about 2-3GB. That didn't move the time frame where a restart
> was needed, so I don't think Solr's JVM GC is the problem.
> We're trying to get all of our node's logs (zookeeper and solr) into
> Splunk now, just to get a better sorted view of what's going on in the
> cloud once a problem occurs. We're also enabling GC logging for
> zookeeper; maybe we were missing problems there while focussing on solr
> logs.
>
> Thomas
>
>
> Am 08.01.15 um 16:33 schrieb Yonik Seeley:
> > It's worth noting that those messages alone don't necessarily signify
> > a problem with the system (and it wouldn't be called "split brain").
> > The async nature of updates (and thread scheduling) along with
> > stop-the-world GC pauses that can change leadership, cause these
> > little windows of inconsistencies that we detect and log.
> >
> > -Yonik
> > http://heliosearch.org - native code faceting, facet functions,
> > sub-facets, off-heap data
> >
> >
> > On Wed, Jan 7, 2015 at 5:01 AM, Thomas Lamy 
> wrote:
> >> Hi there,
> >>
> >> we are running a 3 server cloud serving a dozen
> >> single-shard/replicate-everywhere collections. The 2 biggest
> collections are
> >> ~15M docs, and about 13GiB / 2.5GiB size. Solr is 4.10.2, ZK 3.4.5,
> Tomcat
> >> 7.0.56, Oracle Java 1.7.0_72-b14
> >>
> >> 10 of the 12 collections (the small ones) get filled by DIH full-import
> once
> >> a day starting at 1am. The second biggest collection is updated usind
> DIH
> >> delta-import every 10 minutes, the biggest one gets bulk json updates
> with
> >> commits once in 5 minutes.
> >>
> >> On a regular basis, we have a leader information mismatch:
> >> org.apache.solr.update.processor.DistributedUpdateProcessor; Request
> says it
> >> is coming from leader, but we are the leader
> >> or the opposite
> >> org.apache.solr.update.processor.DistributedUpdateProcessor;
> ClusterState
> >> says we are the leader, but locally we don't think so
> >>
> >> One of these pop up once a day at around 8am, making either some cores
> going
> >> to "recovery failed" state, or all cores of at least one cloud node into
> >> state "gone".
> >> This started out of the blue about 2 weeks ago, without changes to
> neither
> >> software, data, or client behaviour.
> >>
> >> Most of the time, we get things going again by restarting solr on the
> >> current leader node, forcing a new election - can this be triggered
> while
> >> keeping solr (and the caches) up?
> >> But sometimes this doesn't help, we had an incident last weekend where
> our
> >> admins didn't restart in time, creating millions of entries in
> >> /solr/oversser/queue, making zk close the connection, and leader
> re-elect
> >> fails. I had to flush zk, and re-upload collection config to get solr up
> >> again (just like in https://gist.github.com/
> isoboroff/424fcdf63fa760c1d1a7).
> >>
> >> We have a much bigger cloud (7 servers, ~50GiB Data in 8 collections,
> 1500
> >> requests/s) up and running, which does not have these problems since
> >> upgrading to 4.10.2.
> >>
> >>
> >> Any hints on where to look for a solution?
> >>
> >> Kind regards
> >> Thomas
> >>
> >> --
> >> Thomas Lamy
> >> Cytainment AG & Co KG
> >> Nordkanalstrasse 52
> >> 20097 Hamburg
> >>
> >> Tel.: +49 (40) 23 706-747
> >> Fax: +49 (40) 23 706-139
> >> Sitz und Registergericht Hamburg
> >> HRA 98121
> >> HRB 86068
> >> Ust-ID: DE213009476
> >>
>
>
> --
> Thomas Lamy
> Cytainment AG & Co KG
> Nordkanalstrasse 52
> 20097 Hamburg
>
> Tel.: +49 (40) 23 706-747
> Fax: +49 (40) 23 706-139
>
> Sitz und Registergericht Hamburg
> HRA 98121
> HRB 86068
> Ust-ID: DE213009476
>
>


Re: Why suggestions can be that slow?

2015-01-12 Thread Erick Erickson
Don't build it on every invocation. You only need to build the
suggester when a new searcher
is opened, i.e. omit suggest.build=true

Best,
Erick

On Mon, Jan 12, 2015 at 7:31 AM, FiMko  wrote:
> Hi all,
>
> I'm experimenting with  Solr Suggester
>   . I have
> configured the functionality as per the mentioned page. In my Solr
> collection I have 32607 documents. The SuggestComponent is configured to
> search suggestions through field of type "text_auto" as described below:
>
> schema.xml
>  stored="true" multiValued="false" />
> ...
> 
>   
> 
> 
>   
> 
>
> I'm able to receive the suggestions but Solr query execution time (QTime) is
> always above 900ms.
> The query:
> http://localhost:8983/solr/mycollection/suggest?suggest=true&suggest.build=true&suggest.dictionary=mySuggester&suggest.q=externa
> Solr ver.: 4.8.1
> PC: Windows 7 Pro, 8GB, 3.2GHz.
>
> Any ideas or suggestions on how to profile the query execution are very
> welcome!
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Why-suggestions-can-be-that-slow-tp4178944.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Why suggestions can be that slow?

2015-01-12 Thread FiMko
Seems I know the answer. The example query from mentioned above page: 
http://localhost:8983/solr/techproducts/suggest?suggest=true&suggest.build=true&suggest.dictionary=mySuggester&wt=json&suggest.q=elec

  
each time rebuild the index. After I removed the "suggest.build=true"
parameter the query takes 5ms maximum now.

The page even warns: "note, however, that you would likely not want to build
the index on every query" so this just a lack of attention from me.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-suggestions-can-be-that-slow-tp4178944p4178948.html
Sent from the Solr - User mailing list archive at Nabble.com.


Why suggestions can be that slow?

2015-01-12 Thread FiMko
Hi all,

I'm experimenting with  Solr Suggester
  . I have
configured the functionality as per the mentioned page. In my Solr
collection I have 32607 documents. The SuggestComponent is configured to
search suggestions through field of type "text_auto" as described below:

schema.xml

...

  


  


I'm able to receive the suggestions but Solr query execution time (QTime) is
always above 900ms.
The query:
http://localhost:8983/solr/mycollection/suggest?suggest=true&suggest.build=true&suggest.dictionary=mySuggester&suggest.q=externa
Solr ver.: 4.8.1
PC: Windows 7 Pro, 8GB, 3.2GHz.

Any ideas or suggestions on how to profile the query execution are very
welcome!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-suggestions-can-be-that-slow-tp4178944.html
Sent from the Solr - User mailing list archive at Nabble.com.


Problem with getting node active

2015-01-12 Thread O. Klein
I have 4 cores, of which 2 recover just fine and 2 others never get really
active. Not when deleting index or changing clusterstate.json.

So I created a new collection (1 shard, 2 replicas on Solr 4.5 with 3
zookeeper ensemble) and added 1 document to it. It never gets active. Not
even on leader.

There are no exceptions in log.

How do I get zookeeper to see node as active again?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-getting-node-active-tp4178942.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solrcloud nodes registering as 127.0.1.1

2015-01-12 Thread Michael Della Bitta
Another way of doing it is by setting the -Dhost=$hostname parameter when
you start Solr.

Michael Della Bitta

Senior Software Engineer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions

w: appinions.com 

On Mon, Jan 12, 2015 at 7:15 AM, Matteo Grolla 
wrote:

> Solved!
> ubuntu has an entry like this in /etc/hosts
>
> 127.0.1.1   
>
> to properly run solrcloud one must substitute 127.0.1.1 with a real
> (possibly permanent) ip address
>
>
>
> Il giorno 12/gen/2015, alle ore 12:47, Matteo Grolla ha scritto:
>
> > Hi,
> >   hope someone can help me troubleshoot this issue.
> > I'm trying to setup a solrcloud cluster with
> >
> > -zookeeper on 192.168.1.8 (osx mac)
> > -solr1 on 192.168.1.10(virtualized ubuntu running on mac)
> > -solr2 on 192.168.1.3 (ubuntu on another pc)
> >
> > the problem is that both nodes register on zookeeper as 127.0.1.1 so
> they appear as the same node
> > here's a message from solr log
> >
> > 5962 [zkCallback-2-thread-1] INFO
> org.apache.solr.cloud.DistributedQueue  – LatchChildWatcher fired on path:
> /overseer/queue state: SyncConnected type NodeChildrenChanged
> > 5965
> [OverseerStateUpdate-93130698829725696-127.0.1.1:8983_solr-n_00]
> INFO  org.apache.solr.cloud.Overseer  – Update state numShards=2 message={
> >  "operation":"state",
> >  "core_node_name":"core_node1",
> >  "numShards":"2",
> >  "shard":"shard1",
> >  "roles":null,
> >  "state":"active",
> >  "core":"collection1",
> >  "collection":"collection1",
> >  "node_name":"127.0.1.1:8983_solr",
> >  "base_url":"http://127.0.1.1:8983/solr"}
> >
> >
> > I'm able to run the cluster if I change jetty.port to one of the nodes,
> but I'd really like some help troubleshooting this issue.
> >
> > Thanks
>
>


Distributed unit tests and SSL doesn't have a valid keystore

2015-01-12 Thread Markus Jelsma
Hi - in a small Maven project depending on Solr 4.10.3, running unit tests that 
extend BaseDistributedSearchTestCase randomly fail with "SSL doesn't have a 
valid keystore", and a lot of zombie threads. We have a solrtest.keystore file 
laying around, but where to put it?

Thanks,
Markus


Re: leader split-brain at least once a day - need help

2015-01-12 Thread Thomas Lamy

Hi,

I found no big/unusual GC pauses in the Log (at least manually; I found 
no free solution to analyze them that worked out of the box on a 
headless debian wheezy box). Eventually i tried with -Xmx8G (was 64G 
before) on one of the nodes, after checking allocation after 1 hour run 
time was at about 2-3GB. That didn't move the time frame where a restart 
was needed, so I don't think Solr's JVM GC is the problem.
We're trying to get all of our node's logs (zookeeper and solr) into 
Splunk now, just to get a better sorted view of what's going on in the 
cloud once a problem occurs. We're also enabling GC logging for 
zookeeper; maybe we were missing problems there while focussing on solr 
logs.


Thomas


Am 08.01.15 um 16:33 schrieb Yonik Seeley:

It's worth noting that those messages alone don't necessarily signify
a problem with the system (and it wouldn't be called "split brain").
The async nature of updates (and thread scheduling) along with
stop-the-world GC pauses that can change leadership, cause these
little windows of inconsistencies that we detect and log.

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data


On Wed, Jan 7, 2015 at 5:01 AM, Thomas Lamy  wrote:

Hi there,

we are running a 3 server cloud serving a dozen
single-shard/replicate-everywhere collections. The 2 biggest collections are
~15M docs, and about 13GiB / 2.5GiB size. Solr is 4.10.2, ZK 3.4.5, Tomcat
7.0.56, Oracle Java 1.7.0_72-b14

10 of the 12 collections (the small ones) get filled by DIH full-import once
a day starting at 1am. The second biggest collection is updated usind DIH
delta-import every 10 minutes, the biggest one gets bulk json updates with
commits once in 5 minutes.

On a regular basis, we have a leader information mismatch:
org.apache.solr.update.processor.DistributedUpdateProcessor; Request says it
is coming from leader, but we are the leader
or the opposite
org.apache.solr.update.processor.DistributedUpdateProcessor; ClusterState
says we are the leader, but locally we don't think so

One of these pop up once a day at around 8am, making either some cores going
to "recovery failed" state, or all cores of at least one cloud node into
state "gone".
This started out of the blue about 2 weeks ago, without changes to neither
software, data, or client behaviour.

Most of the time, we get things going again by restarting solr on the
current leader node, forcing a new election - can this be triggered while
keeping solr (and the caches) up?
But sometimes this doesn't help, we had an incident last weekend where our
admins didn't restart in time, creating millions of entries in
/solr/oversser/queue, making zk close the connection, and leader re-elect
fails. I had to flush zk, and re-upload collection config to get solr up
again (just like in https://gist.github.com/isoboroff/424fcdf63fa760c1d1a7).

We have a much bigger cloud (7 servers, ~50GiB Data in 8 collections, 1500
requests/s) up and running, which does not have these problems since
upgrading to 4.10.2.


Any hints on where to look for a solution?

Kind regards
Thomas

--
Thomas Lamy
Cytainment AG & Co KG
Nordkanalstrasse 52
20097 Hamburg

Tel.: +49 (40) 23 706-747
Fax: +49 (40) 23 706-139
Sitz und Registergericht Hamburg
HRA 98121
HRB 86068
Ust-ID: DE213009476




--
Thomas Lamy
Cytainment AG & Co KG
Nordkanalstrasse 52
20097 Hamburg

Tel.: +49 (40) 23 706-747
Fax: +49 (40) 23 706-139

Sitz und Registergericht Hamburg
HRA 98121
HRB 86068
Ust-ID: DE213009476



Re: solrcloud nodes registering as 127.0.1.1

2015-01-12 Thread Matteo Grolla
Solved!
ubuntu has an entry like this in /etc/hosts

127.0.1.1   

to properly run solrcloud one must substitute 127.0.1.1 with a real (possibly 
permanent) ip address



Il giorno 12/gen/2015, alle ore 12:47, Matteo Grolla ha scritto:

> Hi,
>   hope someone can help me troubleshoot this issue.
> I'm trying to setup a solrcloud cluster with
> 
> -zookeeper on 192.168.1.8 (osx mac)
> -solr1 on 192.168.1.10(virtualized ubuntu running on mac)
> -solr2 on 192.168.1.3 (ubuntu on another pc)
> 
> the problem is that both nodes register on zookeeper as 127.0.1.1 so they 
> appear as the same node
> here's a message from solr log
> 
> 5962 [zkCallback-2-thread-1] INFO  org.apache.solr.cloud.DistributedQueue  – 
> LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type 
> NodeChildrenChanged
> 5965 [OverseerStateUpdate-93130698829725696-127.0.1.1:8983_solr-n_00] 
> INFO  org.apache.solr.cloud.Overseer  – Update state numShards=2 message={
>  "operation":"state",
>  "core_node_name":"core_node1",
>  "numShards":"2",
>  "shard":"shard1",
>  "roles":null,
>  "state":"active",
>  "core":"collection1",
>  "collection":"collection1",
>  "node_name":"127.0.1.1:8983_solr",
>  "base_url":"http://127.0.1.1:8983/solr"}
> 
> 
> I'm able to run the cluster if I change jetty.port to one of the nodes, but 
> I'd really like some help troubleshooting this issue.
> 
> Thanks



solrcloud nodes registering as 127.0.1.1

2015-01-12 Thread Matteo Grolla
Hi,
hope someone can help me troubleshoot this issue.
I'm trying to setup a solrcloud cluster with

-zookeeper on 192.168.1.8   (osx mac)
-solr1 on 192.168.1.10  (virtualized ubuntu running on mac)
-solr2 on 192.168.1.3   (ubuntu on another pc)

the problem is that both nodes register on zookeeper as 127.0.1.1 so they 
appear as the same node
here's a message from solr log

5962 [zkCallback-2-thread-1] INFO  org.apache.solr.cloud.DistributedQueue  – 
LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type 
NodeChildrenChanged
5965 [OverseerStateUpdate-93130698829725696-127.0.1.1:8983_solr-n_00] 
INFO  org.apache.solr.cloud.Overseer  – Update state numShards=2 message={
  "operation":"state",
  "core_node_name":"core_node1",
  "numShards":"2",
  "shard":"shard1",
  "roles":null,
  "state":"active",
  "core":"collection1",
  "collection":"collection1",
  "node_name":"127.0.1.1:8983_solr",
  "base_url":"http://127.0.1.1:8983/solr"}


I'm able to run the cluster if I change jetty.port to one of the nodes, but I'd 
really like some help troubleshooting this issue.

Thanks

How to apply SOLR-6024 to Solr 4.8

2015-01-12 Thread Elran Dvir
Hi all,

I am trying to apply SOLR-6024 patch to Solr 4.8.
I have some compilation errors with it (detailed in Jira: 
https://issues.apache.org/jira/i#browse/SOLR-6024).
How can I change the patch to be applied to 4.8?

Thanks.
 




Re: Frequent deletions

2015-01-12 Thread ig01
Hi,

We gave 120G to JVM, while we have 140G memory on this machine.
We use the default merge policy("TieredMergePolicy"), and there are 54
segments in our index.
We tried to perform an optimization with different numbers of maxSegments
(53 and less)
it didn't help.
How much memory we need for 180G optimization?
Is every update deletes the document and creates a new one?
How can commit with expungeDeletes=true affect performance?
Currently we do not have a performance issue.

Thanks in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Frequent-deletions-tp4176689p4178875.html
Sent from the Solr - User mailing list archive at Nabble.com.