Re: SolrJ Socket Leak

2014-02-17 Thread Kiran Chitturi
Jared,

I faced a similar issue when using CloudSolrServer with Solr. As Shawn
pointed out the 'TIME_WAIT' status happens when the connection is closed
by the http client. HTTP client closes connection whenever it thinks the
connection is stale
(https://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html
#d5e405). Even the docs point out the stale connection checking cannot be
all reliable. 

I see two ways to get around this:

1. Enable 'SO_REUSEADDR'
2. Disable stale connection checks.

Also by default, when we create CSS it does not explicitly configure any
http client parameters
(https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/a
pache/solr/client/solrj/impl/CloudSolrServer.java#L124). In this case, the
default configuration parameters (max connections, max connections per
host) are used for a http connection. You can explicitly configure these
params when creating CSS using HttpClientUtil:

ModifiableSolrParams params = new ModifiableSolrParams();
params.set(HttpClientUtil.PROP_MAX_CONNECTIONS, 128);
params.set(HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, 32);
params.set(HttpClientUtil.PROP_FOLLOW_REDIRECTS, false);
params.set(HttpClientUtil.PROP_CONNECTION_TIMEOUT, 3);
httpClient = HttpClientUtil.createClient(params);

final HttpClient client = HttpClientUtil.createClient(params);
LBHttpSolrServer lb = new LBHttpSolrServer(client);
CloudSolrServer server = new CloudSolrServer(zkConnect, lb);


Currently, I am using http client 4.3.2 and building the client when
creating the CSS. I also use 'SO_REUSEADDR' option and I haven't seen the
'TIME_WAIT'  after this (may be because of better handling of stale
connections in 4.3.2 or because of 'SO_REUSEADDR' param enabled). My
current http client code looks like this: (works only with http client
4.3.2)

HttpClientBuilder httpBuilder = HttpClientBuilder.create();

Builder socketConfig =  SocketConfig.custom();
socketConfig.setSoReuseAddress(true);
socketConfig.setSoTimeout(1);
httpBuilder.setDefaultSocketConfig(socketConfig.build());
httpBuilder.setMaxConnTotal(300);
httpBuilder.setMaxConnPerRoute(100);

httpBuilder.disableRedirectHandling();
httpBuilder.useSystemProperties();
LBHttpSolrServer lb = new LBHttpSolrServer(httpClient, parser)
CloudSolrServer server = new CloudSolrServer(zkConnect, lb);


There should be a way to configure socket reuse with 4.2.3 too. You can
try different configurations. I am surprised you have 'TIME_WAIT'
connections even after 30 minutes because 'TIME_WAIT' connection should be
closed by default in 2 mins by O.S I think.


HTH,

-- 
Kiran Chitturi,


On 2/13/14 12:38 PM, Jared Rodriguez jrodrig...@kitedesk.com wrote:

I am using solr/solrj 4.6.1 along with the apache httpclient 4.3.2 as part
of a web application which connects to the solr server via solrj
using CloudSolrServer();  The web application is wired up with Guice, and
there is a single instance of the CloudSolrServer class used by all
inbound
requests.  All this is running on Amazon.

Basically, everything looks and runs fine for a while, but even with
moderate concurrency, solrj starts leaving sockets open.  We are handling
only about 250 connections to the web app per minute and each of these
issues from 3 - 7 requests to solr.  Over a 30 minute period of this type
of use, we end up with many 1000s of lingering sockets.  I can see these
when running netstats

tcp0  0 ip-10-80-14-26.ec2.in:41098
ip-10-99-145-47.ec2.i:glrpc
TIME_WAIT

All to the same target host, which is my solr server. There are no other
pieces of infrastructure on that box, just solr.  Eventually, the server
just dies as no further sockets can be opened and the opened ones are not
reused.

The solr server itself is unphased and running like a champ.  Average
timer
per request of 0.126, as seen in the solr web app admin UI query handler
stats.

Apache httpclient had a bunch of leakage from version 4.2.x that they
cleaned up and refactored in 4.3.x, which is why I upgraded.  Currently,
solrj makes use of the old leaky 4.2 classes for establishing connections
and using a connection pool.

http://www.apache.org/dist/httpcomponents/httpclient/RELEASE_NOTES-4.3.x.t
xt



-- 
Jared Rodriguez



Re: DIH

2014-02-17 Thread Mikhail Khludnev
On Sat, Feb 15, 2014 at 1:07 PM, Shawn Heisey s...@elyograg.org wrote:

 On 2/14/2014 10:45 PM, William Bell wrote:
  On virtual cores the DIH handler is really slow. On a 12 core box it only
  uses 1 core while indexing.
 
  Does anyone know how to do Java threading from a SQL query into Solr?
  Examples?
 
  I can use SolrJ to do it, or I might be able to modify DIH to enable
  threading.
 
  At some point in 3.x threading was enabled in DIH, but it was removed
 since
  people where having issues with it (we never did).

 If you know how to fix DIH so it can do multiple indexing threads
 safely, please open an issue and upload a patch.

Please! Don't do it. Never again!
https://issues.apache.org/jira/browse/SOLR-3011

As far as I understand the general idea is to find the DIH successor
https://issues.apache.org/jira/browse/SOLR-4799?focusedCommentId=13738424page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13738424



 I'm still using DIH for full rebuilds, but I'd actually like to replace
 it with a rebuild routine written in SolrJ.  I currently achieve decent
 speed by running DIH on all my shards at the same time.

 I do use SolrJ for once-a-minute index maintenance, but the code that
 I've written to pull data out of SQL and write it to Solr is not able to
 index millions of documents in a single thread as fast as DIH does.  I
 have been building a multithreaded design in my head, but I haven't had
 a chance to write real code and see whether it's actually a good design.

 For me, the bottleneck is definitely Solr, not the database.  I recently
 wrote a test program that uses my current SolrJ indexing method.  If I
 skip the server.add(docs) line, it can read all 91 million docs from
 the database and build SolrInputDocument objects for them in 2.5 hours
 or less, all with a single thread.  When I do a real rebuild with DIH,
 it takes a little more than 4.5 hours -- and that is inherently
 multithreaded, because it's doing all the shards simultaneously.  I have
 no idea how long it would take with a single-threaded SolrJ program.

 Thanks,
 Shawn




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Solr index filename doesn't match with solr vesion

2014-02-17 Thread Nguyen Manh Tien
Thanks Shawn, Tri for your infos, explanation.
Tien


On Mon, Feb 17, 2014 at 1:36 PM, Tri Cao tm...@me.com wrote:

 Lucene main file formats actually don't change a lot in 4.x (or even 5.x),
 and the newer codecs just delegate to previous versions for most file
 types. The newer file types don't typically include Lucene's version in
 file names.

 For example, Lucene 4.6 codes basically delegate stored fields and term
 vector file format to 4.1, doc format to 4.0, etc. and only implement the
 new segment info/fields info formats (the .si and .fnm files).


 https://github.com/apache/lucene-solr/blob/lucene_solr_4_6/lucene/core/src/java/org/apache/lucene/codecs/lucene46/Lucene46Codec.java#L50

 Hope this helps,
 Tri


 On Feb 16, 2014, at 08:52 PM, Shawn Heisey s...@elyograg.org wrote:

 On 2/16/2014 7:25 PM, Nguyen Manh Tien wrote:

 I upgraded recently from solr 4.0 to solr 4.6,

 I check solr index folder and found there file

 _aars_*Lucene41*_0.doc

 _aars_*Lucene41*_0.pos

 _aars_*Lucene41*_0.tim

 _aars_*Lucene41*_0.tip

 I don't know why it don't have *Lucene46* in file name.


 This is an indication that this part of the index is using a file format
 introduced in Lucene 4.1.

 Here's what I have for one of my index segments on a Solr 4.6.1 server:

 _5s7_2h.del
 _5s7.fdt
 _5s7.fdx
 _5s7.fnm
 _5s7_Lucene41_0.doc
 _5s7_Lucene41_0.pos
 _5s7_Lucene41_0.tim
 _5s7_Lucene41_0.tip
 _5s7_Lucene45_0.dvd
 _5s7_Lucene45_0.dvm
 _5s7.nvd
 _5s7.nvm
 _5s7.si
 _5s7.tvd
 _5s7.tvx

 It shows the same pieces as your list, but I am also using docValues in
 my index, and those files indicate that they are using the format from
 Lucene 4.5. I'm not sure why there are not version numbers in *all* of
 the file extensions -- that happens in the Lucene layer, which is a bit
 of a mystery to me.

 Thanks,
 Shawn




Re: DIH

2014-02-17 Thread Alexandre Rafalovitch
There has been a couple of discussions to find DIH successor
(including on HelioSearch list), but no real momentum as far as I can
tell.

I think somebody will have to really pitch in and do the same couple
of scenarios DIH does in several different frameworks (TodoMVC style).
That should get it going.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Mon, Feb 17, 2014 at 7:40 PM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
 On Sat, Feb 15, 2014 at 1:07 PM, Shawn Heisey s...@elyograg.org wrote:

 On 2/14/2014 10:45 PM, William Bell wrote:
  On virtual cores the DIH handler is really slow. On a 12 core box it only
  uses 1 core while indexing.
 
  Does anyone know how to do Java threading from a SQL query into Solr?
  Examples?
 
  I can use SolrJ to do it, or I might be able to modify DIH to enable
  threading.
 
  At some point in 3.x threading was enabled in DIH, but it was removed
 since
  people where having issues with it (we never did).

 If you know how to fix DIH so it can do multiple indexing threads
 safely, please open an issue and upload a patch.

 Please! Don't do it. Never again!
 https://issues.apache.org/jira/browse/SOLR-3011

 As far as I understand the general idea is to find the DIH successor
 https://issues.apache.org/jira/browse/SOLR-4799?focusedCommentId=13738424page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13738424



 I'm still using DIH for full rebuilds, but I'd actually like to replace
 it with a rebuild routine written in SolrJ.  I currently achieve decent
 speed by running DIH on all my shards at the same time.

 I do use SolrJ for once-a-minute index maintenance, but the code that
 I've written to pull data out of SQL and write it to Solr is not able to
 index millions of documents in a single thread as fast as DIH does.  I
 have been building a multithreaded design in my head, but I haven't had
 a chance to write real code and see whether it's actually a good design.

 For me, the bottleneck is definitely Solr, not the database.  I recently
 wrote a test program that uses my current SolrJ indexing method.  If I
 skip the server.add(docs) line, it can read all 91 million docs from
 the database and build SolrInputDocument objects for them in 2.5 hours
 or less, all with a single thread.  When I do a real rebuild with DIH,
 it takes a little more than 4.5 hours -- and that is inherently
 multithreaded, because it's doing all the shards simultaneously.  I have
 no idea how long it would take with a single-threaded SolrJ program.

 Thanks,
 Shawn




 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com


Re: DIH

2014-02-17 Thread Ahmet Arslan
Hi Mikhail,

Can you please elaborate what do you mean? 
My understanding is that there is no multi-threading support in DIH. For some 
reasons, it won't have. Am I correct?

Regarding apache flume, how it can be dih replacement? Can I index rich 
documents on my disk using flume? Can I fetch documents from 
wikipedia,jira,twitter,dropbox,rdbms,rss,file system by using it?

Ahmet



On Monday, February 17, 2014 10:41 AM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:
On Sat, Feb 15, 2014 at 1:07 PM, Shawn Heisey s...@elyograg.org wrote:

 On 2/14/2014 10:45 PM, William Bell wrote:
  On virtual cores the DIH handler is really slow. On a 12 core box it only
  uses 1 core while indexing.
 
  Does anyone know how to do Java threading from a SQL query into Solr?
  Examples?
 
  I can use SolrJ to do it, or I might be able to modify DIH to enable
  threading.
 
  At some point in 3.x threading was enabled in DIH, but it was removed
 since
  people where having issues with it (we never did).

 If you know how to fix DIH so it can do multiple indexing threads
 safely, please open an issue and upload a patch.

Please! Don't do it. Never again!
https://issues.apache.org/jira/browse/SOLR-3011

As far as I understand the general idea is to find the DIH successor
https://issues.apache.org/jira/browse/SOLR-4799?focusedCommentId=13738424page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13738424




 I'm still using DIH for full rebuilds, but I'd actually like to replace
 it with a rebuild routine written in SolrJ.  I currently achieve decent
 speed by running DIH on all my shards at the same time.

 I do use SolrJ for once-a-minute index maintenance, but the code that
 I've written to pull data out of SQL and write it to Solr is not able to
 index millions of documents in a single thread as fast as DIH does.  I
 have been building a multithreaded design in my head, but I haven't had
 a chance to write real code and see whether it's actually a good design.

 For me, the bottleneck is definitely Solr, not the database.  I recently
 wrote a test program that uses my current SolrJ indexing method.  If I
 skip the server.add(docs) line, it can read all 91 million docs from
 the database and build SolrInputDocument objects for them in 2.5 hours
 or less, all with a single thread.  When I do a real rebuild with DIH,
 it takes a little more than 4.5 hours -- and that is inherently
 multithreaded, because it's doing all the shards simultaneously.  I have
 no idea how long it would take with a single-threaded SolrJ program.

 Thanks,
 Shawn




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com




Re: DIH

2014-02-17 Thread Alexandre Rafalovitch
I haven't tried Apache Flume but the manual seems to suggest 'yes' to
a large number of your checklist items:
http://flume.apache.org/FlumeUserGuide.html

When you say 'rich document' indexing, the keyword you are looking for
is (Apache) Tika, as that's what actually doing the job under the
covers.

Whether it can replicate your specific requirements, is a question
only you can answer for yourself of course. When you do, maybe let us
know, so we can learn too. :-)

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Mon, Feb 17, 2014 at 8:11 PM, Ahmet Arslan iori...@yahoo.com wrote:
 Hi Mikhail,

 Can you please elaborate what do you mean?
 My understanding is that there is no multi-threading support in DIH. For some 
 reasons, it won't have. Am I correct?

 Regarding apache flume, how it can be dih replacement? Can I index rich 
 documents on my disk using flume? Can I fetch documents from 
 wikipedia,jira,twitter,dropbox,rdbms,rss,file system by using it?

 Ahmet



 On Monday, February 17, 2014 10:41 AM, Mikhail Khludnev 
 mkhlud...@griddynamics.com wrote:
 On Sat, Feb 15, 2014 at 1:07 PM, Shawn Heisey s...@elyograg.org wrote:

 On 2/14/2014 10:45 PM, William Bell wrote:
  On virtual cores the DIH handler is really slow. On a 12 core box it only
  uses 1 core while indexing.
 
  Does anyone know how to do Java threading from a SQL query into Solr?
  Examples?
 
  I can use SolrJ to do it, or I might be able to modify DIH to enable
  threading.
 
  At some point in 3.x threading was enabled in DIH, but it was removed
 since
  people where having issues with it (we never did).

 If you know how to fix DIH so it can do multiple indexing threads
 safely, please open an issue and upload a patch.

 Please! Don't do it. Never again!
 https://issues.apache.org/jira/browse/SOLR-3011

 As far as I understand the general idea is to find the DIH successor
 https://issues.apache.org/jira/browse/SOLR-4799?focusedCommentId=13738424page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13738424




 I'm still using DIH for full rebuilds, but I'd actually like to replace
 it with a rebuild routine written in SolrJ.  I currently achieve decent
 speed by running DIH on all my shards at the same time.

 I do use SolrJ for once-a-minute index maintenance, but the code that
 I've written to pull data out of SQL and write it to Solr is not able to
 index millions of documents in a single thread as fast as DIH does.  I
 have been building a multithreaded design in my head, but I haven't had
 a chance to write real code and see whether it's actually a good design.

 For me, the bottleneck is definitely Solr, not the database.  I recently
 wrote a test program that uses my current SolrJ indexing method.  If I
 skip the server.add(docs) line, it can read all 91 million docs from
 the database and build SolrInputDocument objects for them in 2.5 hours
 or less, all with a single thread.  When I do a real rebuild with DIH,
 it takes a little more than 4.5 hours -- and that is inherently
 multithreaded, because it's doing all the shards simultaneously.  I have
 no idea how long it would take with a single-threaded SolrJ program.

 Thanks,
 Shawn




 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com




Re: Solr Load Testing Issues

2014-02-17 Thread Annette Newton
Sorry I didn't make myself clear.  I have 20 machines in the configuration,
each shard/replica is on it's own machine.


On 14 February 2014 19:44, Shawn Heisey s...@elyograg.org wrote:

 On 2/14/2014 5:28 AM, Annette Newton wrote:
  Solr Version: 4.3.1
  Number Shards: 10
  Replicas: 1
  Heap size: 15GB
  Machine RAM: 30GB
  Zookeeper timeout: 45 seconds
 
  We are continuing the fight to keep our solr setup functioning.  As a
  result of this we have made significant changes to our schema to reduce
 the
  amount of data we write.  I setup a new cluster to reindex our data,
  initially I ran the import with no replicas, and achieved quite
 impressive
  results.  Our peak was 60,000 new documents per minute, no shard loses,
 no
  outages due to garbage collection (which is an issue we see in
 production),
  at the end of the load the index stood at 97,000,000 documents and 20GB
 per
  shard.  During the highest insertion rate I would say that querying
  suffered, but that is not of concern right now.

 Solr 4.3.1 has a number of problems when it comes to large clouds.
 Upgrading to 4.6.1 would be strongly advisable, but that's only
 something to try after looking into the rest of what I have to say.

 If I read what you've written correctly, you are running all this on one
 machine.  To put it bluntly, this isn't going to work well unless you
 put a LOT more memory into that machine.

 For good performance, Solr relies on the OS disk cache, because reading
 from the disk is VERY expensive in terms of time.  The OS will
 automatically use RAM that's not being used for other purposes for the
 disk cache, so that it can avoid reading off the disk as much as possible.

 http://wiki.apache.org/solr/SolrPerformanceProblems

 Below is a summary of what that Wiki page says, with your numbers as I
 understand them.  If I am misunderstanding your numbers, then this
 advice may need adjustment.  Note that when I see one replica I take
 that to mean replicationFactor=1, so there is only one copy of the
 index.  If you actually mean that you have *two* copies, then you have
 twice as much data as I've indicated below, and your requirements will
 be even larger:

 With ten shards that are each 20GB in size, your total index size is
 200GB.  With 15 GB of heap, your ideal memory size for that server would
 be 215GB -- the 15GB heap plus enough extra to fit the entire 200GB
 index into RAM.

 In reality you probably don't need that much, but it's likely that you
 would need at least half the index to fit into RAM at any one moment,
 which adds up to 115GB.  If you're prepared to deal with
 moderate-to-severe performance problems, you **MIGHT** be able to get
 away with only 25% of the index fitting into RAM, which still requires
 65GB of RAM, but with SolrCloud, such performance problems usually mean
 that the cloud won't be stable, so it's not advisable to even try it.

 One of the bits of advice on the wiki page is to split your index into
 shards and put it on more machines, which drops the memory requirements
 for each machine.  You're already using a multi-shard SolrCloud, so you
 probably just need more hardware.  If you had one 20GB shard on a
 machine with 30GB of RAM, you could probably use a heap size of 4-8GB
 per machine and have plenty of RAM left over to cache the index very
 well.  You could most likely add another 50% to the index size and still
 be OK.

 Thanks,
 Shawn




-- 

Annette Newton

Database Administrator

ServiceTick Ltd



T:+44(0)1603 618326



Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ

www.servicetick.com

*www.sessioncam.com http://www.sessioncam.com*

-- 
*This message is confidential and is intended to be read solely by the 
addressee. The contents should not be disclosed to any other person or 
copies taken unless authorised to do so. If you are not the intended 
recipient, please notify the sender and permanently delete this message. As 
Internet communications are not secure ServiceTick accepts neither legal 
responsibility for the contents of this message nor responsibility for any 
change made to this message after it was forwarded by the original author.*


Solrcloud: no registered leader found and new searcher error

2014-02-17 Thread sweety
I have configured solrcloud as follows,
http://lucene.472066.n3.nabble.com/file/n4117724/Untitled.png 

Solr.xml:
solr persistent=true sharedLib=lib
  cores adminPath=/admin/cores zkClientTimeout=${zkClientTimeout:15000}
hostPort=${jetty.port:} hostContext=solr
core loadOnStartup=true instanceDir=document\ transient=false
name=document/
core loadOnStartup=true instanceDir=contract\ transient=false
name=contract/
  /cores
/solr

I  have added all the required config for solrcloud, referred this :
http://wiki.apache.org/solr/SolrCloud#Required_Config

I am adding data to core:document.
Now when i try to index using solrnet, (solr.Add(doc)) , i get this error :
SEVERE: org.apache.solr.common.SolrException: *No registered leader was
found, collection:document* slice:shard2
at
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:481)

and this error also:
SEVERE: null:java.lang.RuntimeException: *SolrCoreState already closed*
at
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:84)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:520)

I guess, it is because the leader is from core:contract and i am trying to
index in core:document?
Is there a way to change the leader, and how ?
How can i change the state of shards from gone to active?

Also when i try to query : q=*:* , this is shown
org.apache.solr.common.SolrException: *Error opening new searcher at*
org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415) at 

I read that, if number of commits exceed then this searcher error comes, but
i did not issue commit command,then how will the commit exceed. Also it
requires some warming setting, so i added this to solrconfig.xml, but still
i get the same error,

query
 listener event=newSearcher class=solr.QuerySenderListener
  arr name=queries
lst str name=qsolr/str
  str name=start0/str
  str name=rows10/str
/lst
lst str name=qrocks/str
  str name=start0/str
  str name=rows10/str
/lst
  /arr
/listener
maxWarmingSearchers2/maxWarmingSearchers
/query

I have just started with solrcloud, please tell if I am doing anything wrong
in solrcloud configurations.
Also i did not good material for solrcloud in windows 7 with apache tomcat ,
please suggest for that too.
Thanks a lot.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrcloud-no-registered-leader-found-and-new-searcher-error-tp4117724.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud how to spread out to multiple nodes

2014-02-17 Thread soodyogesh
Thanks, Im going to give this  a try



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-how-to-spread-out-to-multiple-nodes-tp4116326p4117728.html
Sent from the Solr - User mailing list archive at Nabble.com.


Facet cache issue when deleting documents from the index

2014-02-17 Thread Marius Dumitru Florea
Hi guys,

I'm using Solr 4.6.1 (embedded) and for some reason the facet cache is
not invalidated when documents are deleted from the index. Sadly, for
me, I cannot reproduce this issue with an integration test like this:

--8--
SolrInstance server = getSolrInstance();

SolrInputDocument document = new SolrInputDocument();
document.setField(id, foo);
document.setField(locale, en);
server.add(document);

server.commit();

document = new SolrInputDocument();
document.setField(id, bar);
document.setField(locale, en);
server.add(document);

server.commit();

SolrQuery query = new SolrQuery(*:*);
query.set(facet, on);
query.set(facet.field, locale);
QueryResponse response = server.query(query);

Assert.assertEquals(2, response.getResults().size());
FacetField localeFacet = response.getFacetField(locale);
Assert.assertEquals(1, localeFacet.getValues().size());
Count en = localeFacet.getValues().get(0);
Assert.assertEquals(en, en.getName());
Assert.assertEquals(2, en.getCount());

server.delete(foo);
server.commit();

response = server.query(query);

Assert.assertEquals(1, response.getResults().size());
localeFacet = response.getFacetField(locale);
Assert.assertEquals(1, localeFacet.getValues().size());
en = localeFacet.getValues().get(0);
Assert.assertEquals(en, en.getName());
Assert.assertEquals(1, en.getCount());
--8--

Nevertheless, when I do the 'same' on my real environment, the count
for the locale facet remains 2 after one of the documents is deleted.
The search result count is fine, so that's why I think it's a facet
cache issue. Note that the facet count remains 2 even after I restart
the server, so the cache is persisted on the file system.

Strangely, the facet count is updated correctly if I modify the
document instead of deleting it (i.e. removing a keyword from the
content so that it isn't matched by the search query any more). So it
looks like only delete triggers the issue.

Now, an interesting fact is that if, on my real environment, I delete
one of the documents and then add a new one, the facet count becomes
3. So the last commit to the index, which inserts a new document,
doesn't trigger a re-computation of the facet cache. The previous
facet cache is simply incremented, so the error is perpetuated. At
this point I don't even know how to fix the facet cache without
deleting the Solr data folder so that the full index is rebuild.

I'm still trying to figure out what is the difference between the
integration test and my real environment (as I used the same schema
and configuration). Do you know what might be wrong?

Thanks,
Marius


Solr Suggester not working in sharding (distributed search)

2014-02-17 Thread aniket potdar
I have two solr server (solr 4.5.1) which is running in shard..

I have implemented solr suggester using spellcheckComponent for
auto-suggester.

when i execute suggest url on individual core then the solr suggestion is
coming properly.

http://localhost:8986/solr/core1/suggest?spellcheck.q=city%20of and 
http://localhost:8987/solr/core1/suggest?spellcheck.q=city%20of

when i fired url with refence to solr wiki
(https://wiki.apache.org/solr/SpellCheckComponent#Distributed_Search_Support)
the result is not coming and below exception is occur.


URL :-
http://localhost:8986/solr/core1/select?shards=localhost:8986/solr/core1,localhost:8987/solr/core1spellcheck.q=city%20ofshards.qt=%2Fsuggestqt=suggest

java.lang.NullPointerException at
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:843)
at
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:649)
at
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:628)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368) at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:619)

 
for reference below is my schema.xml and solrconfig.xml entry

searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
  str name=namesuggest/str
  str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str
name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str
  str name=fieldsugg/str 
  str name=buildOnCommittrue/str
/lst
  /searchComponent

requestHandler class=org.apache.solr.handler.component.SearchHandler
name=/suggest
lst name=defaults
  str name=spellcheckon/str
  str name=spellcheck.dictionarysuggest/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.count10/str
  str name=spellcheck.collatetrue/str
/lst
arr name=components
  strsuggest/str
/arr
  /requestHandler 


fieldType name=text_autocomplete class=solr.TextField
positionIncrementGap=100
analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory/
filter class=solr.SynonymFilterFactory synonyms=syn.txt
ignoreCase=true expand=true/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

i have unique filed, id which is store = true in schema.xml

can any one please suggest the 

Re: DIH

2014-02-17 Thread Mikhail Khludnev
On Mon, Feb 17, 2014 at 1:11 PM, Ahmet Arslan iori...@yahoo.com wrote:

 My understanding is that there is no multi-threading support in DIH. For
 some reasons, it won't have. Am I correct?


threads parameter seems working in 3.6 or so, but was removed from 4.x as
causes a lot of instability.

Regarding apache flume, how it can be dih replacement? Can I index rich
 documents on my disk using flume? Can I fetch documents from
 wikipedia,jira,twitter,


I don't know Flume, and I'm even not ready to propose a DIH replacement
candidate.
I personally consider an old school ETL, 'cause I'm mostly interested in
joining RDBMS tables.


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Facet cache issue when deleting documents from the index

2014-02-17 Thread Ahmet Arslan
Hi Marius,

Facets are computed from indexed terms. Can you commit with expungeDeletes=true 
flag?

Ahmet



On Monday, February 17, 2014 12:17 PM, Marius Dumitru Florea 
mariusdumitru.flo...@xwiki.com wrote:
Hi guys,

I'm using Solr 4.6.1 (embedded) and for some reason the facet cache is
not invalidated when documents are deleted from the index. Sadly, for
me, I cannot reproduce this issue with an integration test like this:

--8--
SolrInstance server = getSolrInstance();

SolrInputDocument document = new SolrInputDocument();
document.setField(id, foo);
document.setField(locale, en);
server.add(document);

server.commit();

document = new SolrInputDocument();
document.setField(id, bar);
document.setField(locale, en);
server.add(document);

server.commit();

SolrQuery query = new SolrQuery(*:*);
query.set(facet, on);
query.set(facet.field, locale);
QueryResponse response = server.query(query);

Assert.assertEquals(2, response.getResults().size());
FacetField localeFacet = response.getFacetField(locale);
Assert.assertEquals(1, localeFacet.getValues().size());
Count en = localeFacet.getValues().get(0);
Assert.assertEquals(en, en.getName());
Assert.assertEquals(2, en.getCount());

server.delete(foo);
server.commit();

response = server.query(query);

Assert.assertEquals(1, response.getResults().size());
localeFacet = response.getFacetField(locale);
Assert.assertEquals(1, localeFacet.getValues().size());
en = localeFacet.getValues().get(0);
Assert.assertEquals(en, en.getName());
Assert.assertEquals(1, en.getCount());
--8--

Nevertheless, when I do the 'same' on my real environment, the count
for the locale facet remains 2 after one of the documents is deleted.
The search result count is fine, so that's why I think it's a facet
cache issue. Note that the facet count remains 2 even after I restart
the server, so the cache is persisted on the file system.

Strangely, the facet count is updated correctly if I modify the
document instead of deleting it (i.e. removing a keyword from the
content so that it isn't matched by the search query any more). So it
looks like only delete triggers the issue.

Now, an interesting fact is that if, on my real environment, I delete
one of the documents and then add a new one, the facet count becomes
3. So the last commit to the index, which inserts a new document,
doesn't trigger a re-computation of the facet cache. The previous
facet cache is simply incremented, so the error is perpetuated. At
this point I don't even know how to fix the facet cache without
deleting the Solr data folder so that the full index is rebuild.

I'm still trying to figure out what is the difference between the
integration test and my real environment (as I used the same schema
and configuration). Do you know what might be wrong?

Thanks,
Marius



Solr Suggester not working in sharding (distributed search)

2014-02-17 Thread Aniket Potdar

I have two solr server (solr 4.5.1) which is running in shard..

I have implemented solr suggester using spellcheckComponent for 
auto-suggester.


when i execute suggest url on individual core then the solr suggestion 
is coming properly.


mysolr.com:8986/solr/core1/suggest?spellcheck.q=city%20of and 
mysolr.com:8987/solr/core1/suggest?spellcheck.q=city%20of


when i fired url with refence to solr wiki 
(wiki.apache.org/solr/SpellCheckComponent#Distributed_Search_Support) 
the result is not coming and below exception is occur.


URL :- 
mysolr.com:8986/solr/core1/select?shards=mysolr.com:8986/solr/core1,mysolr.com:8987/solr/core1spellcheck.q=city%20ofshards.qt=%2Fsuggestqt=suggest


java.lang.NullPointerException at 
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:843) 
at 
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:649) 
at 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:628) 
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311) 
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) 
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) 
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) 
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) 
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) 
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) 
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) 
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) 
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) 
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) 
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) 
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) 
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) 
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) 
at org.eclipse.jetty.server.Server.handle(Server.java:368) at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) 
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) 
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) 
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) 
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at 
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) 
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) 
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) 
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) 
at java.lang.Thread.run(Thread.java:619)


for reference below is my schema.xml and solrconfig.xml entry

|searchComponentclass=solr.SpellCheckComponent  name=suggest
lst name=spellchecker
  str name=name
span class=pun style=margin: 0px; padding: 0px; border: 0px; font-size: 14px; vertical-align: 
baseline; background-color: transparent; color: rgb(0, 0, 0); background-position: initial initial; 
background-repeat: initial initial;suggest/str
  str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str 
name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str
  str name=field
sugg/str  

  str name=buildOnCommit
true/str
/lst|

|  requestHandlerclass=org.apache.solr.handler.component.SearchHandler  
name=/suggest
lst name=defaults
  str name=spellcheckon/str
  str name=spellcheck.di
ctionarysuggest/str
  str name=spellcheck.on
lyMorePopulartrue/st
r
  str name=spellcheck.co
unt10/str
  str name=spellcheck.co
llatetrue/str



/lst
arr name=components
  strsuggest/str
/arr|

|fieldType name=text_autocomplete  class=solr.TextField  
positionIncrementGap=100
analyzer
tokenizerclass=
/spansolr.KeywordTokenizerFactory/
filterclass=solr.LowerCaseFilterFactory/
filterclass=solr.TrimFilterFactory/
filterclass=solr.SynonymFilterFactory  synonyms=syn.txt  ignoreCase=true  
expand=true/

Re: Facet cache issue when deleting documents from the index

2014-02-17 Thread Ahmet Arslan
Hi,

Also I noticed that in your code snippet you have server.delete(foo); which 
does not exists. deleteById and deleteByQuery methods are defined in SolrServer 
implementation.



On Monday, February 17, 2014 1:42 PM, Ahmet Arslan iori...@yahoo.com wrote:
Hi Marius,

Facets are computed from indexed terms. Can you commit with expungeDeletes=true 
flag?

Ahmet




On Monday, February 17, 2014 12:17 PM, Marius Dumitru Florea 
mariusdumitru.flo...@xwiki.com wrote:
Hi guys,

I'm using Solr 4.6.1 (embedded) and for some reason the facet cache is
not invalidated when documents are deleted from the index. Sadly, for
me, I cannot reproduce this issue with an integration test like this:

--8--
SolrInstance server = getSolrInstance();

SolrInputDocument document = new SolrInputDocument();
document.setField(id, foo);
document.setField(locale, en);
server.add(document);

server.commit();

document = new SolrInputDocument();
document.setField(id, bar);
document.setField(locale, en);
server.add(document);

server.commit();

SolrQuery query = new SolrQuery(*:*);
query.set(facet, on);
query.set(facet.field, locale);
QueryResponse response = server.query(query);

Assert.assertEquals(2, response.getResults().size());
FacetField localeFacet = response.getFacetField(locale);
Assert.assertEquals(1, localeFacet.getValues().size());
Count en = localeFacet.getValues().get(0);
Assert.assertEquals(en, en.getName());
Assert.assertEquals(2, en.getCount());

server.delete(foo);
server.commit();

response = server.query(query);

Assert.assertEquals(1, response.getResults().size());
localeFacet = response.getFacetField(locale);
Assert.assertEquals(1, localeFacet.getValues().size());
en = localeFacet.getValues().get(0);
Assert.assertEquals(en, en.getName());
Assert.assertEquals(1, en.getCount());
--8--

Nevertheless, when I do the 'same' on my real environment, the count
for the locale facet remains 2 after one of the documents is deleted.
The search result count is fine, so that's why I think it's a facet
cache issue. Note that the facet count remains 2 even after I restart
the server, so the cache is persisted on the file system.

Strangely, the facet count is updated correctly if I modify the
document instead of deleting it (i.e. removing a keyword from the
content so that it isn't matched by the search query any more). So it
looks like only delete triggers the issue.

Now, an interesting fact is that if, on my real environment, I delete
one of the documents and then add a new one, the facet count becomes
3. So the last commit to the index, which inserts a new document,
doesn't trigger a re-computation of the facet cache. The previous
facet cache is simply incremented, so the error is perpetuated. At
this point I don't even know how to fix the facet cache without
deleting the Solr data folder so that the full index is rebuild.

I'm still trying to figure out what is the difference between the
integration test and my real environment (as I used the same schema
and configuration). Do you know what might be wrong?

Thanks,
Marius



Solr cloud hangs

2014-02-17 Thread Pawel Rog
Hi,
I have quite annoying problem with Solr cloud. I have a cluster with 8
shards and with 2 replicas in each. (Solr 4.6.1)
After some time cluster doesn't respond to any update requests. Restarting
the cluster nodes doesn't help.

There are a lot of such stack traces (waiting for very long time):


   - sun.misc.Unsafe.park(Native Method)
   - java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
   -
   
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
   -
   org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
   -
   
org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
   -
   
org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
   -
   
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
   - java.lang.Thread.run(Thread.java:722)


Do you have any idea where can I look for?

--
Pawel


Re: Facet cache issue when deleting documents from the index

2014-02-17 Thread Marius Dumitru Florea
On Mon, Feb 17, 2014 at 2:00 PM, Ahmet Arslan iori...@yahoo.com wrote:
 Hi,


 Also I noticed that in your code snippet you have server.delete(foo); which 
 does not exists. deleteById and deleteByQuery methods are defined in 
 SolrServer implementation.

Yes, sorry, I have a wrapper over the SolrInstance that doesn't do
much. In the case of delete it just forwards the call to deleteById.
I'll check the expungeDeletes=true flag and post back the results.

Thanks,
Marius




 On Monday, February 17, 2014 1:42 PM, Ahmet Arslan iori...@yahoo.com wrote:
 Hi Marius,

 Facets are computed from indexed terms. Can you commit with 
 expungeDeletes=true flag?

 Ahmet




 On Monday, February 17, 2014 12:17 PM, Marius Dumitru Florea 
 mariusdumitru.flo...@xwiki.com wrote:
 Hi guys,

 I'm using Solr 4.6.1 (embedded) and for some reason the facet cache is
 not invalidated when documents are deleted from the index. Sadly, for
 me, I cannot reproduce this issue with an integration test like this:

 --8--
 SolrInstance server = getSolrInstance();

 SolrInputDocument document = new SolrInputDocument();
 document.setField(id, foo);
 document.setField(locale, en);
 server.add(document);

 server.commit();

 document = new SolrInputDocument();
 document.setField(id, bar);
 document.setField(locale, en);
 server.add(document);

 server.commit();

 SolrQuery query = new SolrQuery(*:*);
 query.set(facet, on);
 query.set(facet.field, locale);
 QueryResponse response = server.query(query);

 Assert.assertEquals(2, response.getResults().size());
 FacetField localeFacet = response.getFacetField(locale);
 Assert.assertEquals(1, localeFacet.getValues().size());
 Count en = localeFacet.getValues().get(0);
 Assert.assertEquals(en, en.getName());
 Assert.assertEquals(2, en.getCount());

 server.delete(foo);
 server.commit();

 response = server.query(query);

 Assert.assertEquals(1, response.getResults().size());
 localeFacet = response.getFacetField(locale);
 Assert.assertEquals(1, localeFacet.getValues().size());
 en = localeFacet.getValues().get(0);
 Assert.assertEquals(en, en.getName());
 Assert.assertEquals(1, en.getCount());
 --8--

 Nevertheless, when I do the 'same' on my real environment, the count
 for the locale facet remains 2 after one of the documents is deleted.
 The search result count is fine, so that's why I think it's a facet
 cache issue. Note that the facet count remains 2 even after I restart
 the server, so the cache is persisted on the file system.

 Strangely, the facet count is updated correctly if I modify the
 document instead of deleting it (i.e. removing a keyword from the
 content so that it isn't matched by the search query any more). So it
 looks like only delete triggers the issue.

 Now, an interesting fact is that if, on my real environment, I delete
 one of the documents and then add a new one, the facet count becomes
 3. So the last commit to the index, which inserts a new document,
 doesn't trigger a re-computation of the facet cache. The previous
 facet cache is simply incremented, so the error is perpetuated. At
 this point I don't even know how to fix the facet cache without
 deleting the Solr data folder so that the full index is rebuild.

 I'm still trying to figure out what is the difference between the
 integration test and my real environment (as I used the same schema
 and configuration). Do you know what might be wrong?

 Thanks,
 Marius



Best way to copy data from SolrCloud to standalone Solr?

2014-02-17 Thread Daniel Bryant

Hi all,

I have a production SolrCloud server which has multiple sharded indexes, 
and I need to copy all of the indexes to a (non-cloud) Solr server 
within our QA environment.


Can I ask for advice on the best way to do this please?

I've searched the web and found solr2solr 
(https://github.com/dbashford/solr2solr), but the author states that 
this is best for small indexes, and ours are rather large at ~20Gb each. 
I've also looked at replication, but can't find a definite reference on 
how this should be done between SolrCloud and Solr?


Any guidance is very much appreciated.

Best wishes,

Daniel



--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk 
http://www.tai-dev.co.uk/*
daniel.bry...@tai-dev.co.uk mailto:daniel.bry...@tai-dev.co.uk  |  +44 
(0) 7799406399  |  Twitter: @taidevcouk https://twitter.com/taidevcouk


Re: Solr cloud hangs

2014-02-17 Thread Mark Miller
Can you share the full stack trace dump?

- Mark

http://about.me/markrmiller

On Feb 17, 2014, at 7:07 AM, Pawel Rog pawelro...@gmail.com wrote:

 Hi,
 I have quite annoying problem with Solr cloud. I have a cluster with 8
 shards and with 2 replicas in each. (Solr 4.6.1)
 After some time cluster doesn't respond to any update requests. Restarting
 the cluster nodes doesn't help.
 
 There are a lot of such stack traces (waiting for very long time):
 
 
   - sun.misc.Unsafe.park(Native Method)
   - java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
   -
   
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
   -
   org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
   -
   
 org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
   -
   
 org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
   -
   
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
   - java.lang.Thread.run(Thread.java:722)
 
 
 Do you have any idea where can I look for?
 
 --
 Pawel



Re: Solrcloud: no registered leader found and new searcher error

2014-02-17 Thread Erick Erickson
I think commits are not really the issue here. It _looks_ like
at least one node in your document collection is failing to
start, in fact your shard 2. On the Solr admin screen, the
cloud section on the left should show you the states of all
your nodes, make sure they're all green.

My guess is that if you look at your Solr logs on the nodes that
aren't coming up, you'll have a better idea of what's happening.

You need to get all the nodes running first before worrying about
messages like you're showing.

Best,
Erick


On Mon, Feb 17, 2014 at 1:28 AM, sweety sweetyshind...@yahoo.com wrote:

 I have configured solrcloud as follows,
 http://lucene.472066.n3.nabble.com/file/n4117724/Untitled.png

 Solr.xml:
 solr persistent=true sharedLib=lib
   cores adminPath=/admin/cores
 zkClientTimeout=${zkClientTimeout:15000}
 hostPort=${jetty.port:} hostContext=solr
 core loadOnStartup=true instanceDir=document\ transient=false
 name=document/
 core loadOnStartup=true instanceDir=contract\ transient=false
 name=contract/
   /cores
 /solr

 I  have added all the required config for solrcloud, referred this :
 http://wiki.apache.org/solr/SolrCloud#Required_Config

 I am adding data to core:document.
 Now when i try to index using solrnet, (solr.Add(doc)) , i get this error :
 SEVERE: org.apache.solr.common.SolrException: *No registered leader was
 found, collection:document* slice:shard2
 at

 org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:481)

 and this error also:
 SEVERE: null:java.lang.RuntimeException: *SolrCoreState already closed*
 at

 org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:84)
 at

 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:520)

 I guess, it is because the leader is from core:contract and i am trying to
 index in core:document?
 Is there a way to change the leader, and how ?
 How can i change the state of shards from gone to active?

 Also when i try to query : q=*:* , this is shown
 org.apache.solr.common.SolrException: *Error opening new searcher at*
 org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415) at

 I read that, if number of commits exceed then this searcher error comes,
 but
 i did not issue commit command,then how will the commit exceed. Also it
 requires some warming setting, so i added this to solrconfig.xml, but still
 i get the same error,

 query
  listener event=newSearcher class=solr.QuerySenderListener
   arr name=queries
 lst str name=qsolr/str
   str name=start0/str
   str name=rows10/str
 /lst
 lst str name=qrocks/str
   str name=start0/str
   str name=rows10/str
 /lst
   /arr
 /listener
 maxWarmingSearchers2/maxWarmingSearchers
 /query

 I have just started with solrcloud, please tell if I am doing anything
 wrong
 in solrcloud configurations.
 Also i did not good material for solrcloud in windows 7 with apache tomcat
 ,
 please suggest for that too.
 Thanks a lot.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solrcloud-no-registered-leader-found-and-new-searcher-error-tp4117724.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: block join and atomic updates

2014-02-17 Thread Mikhail Khludnev
Hello,

It sounds like you need to switch to query time join.
15.02.2014 21:57 пользователь m...@preselect-media.com написал:

 Any suggestions?


 Zitat von m...@preselect-media.com:

  Yonik Seeley yo...@heliosearch.com:

 On Thu, Feb 13, 2014 at 8:25 AM,  m...@preselect-media.com wrote:

 Is there any workaround to perform atomic updates on blocks or do I
 have to
 re-index the parent document and all its children always again if I
 want to
 update a field?


 The latter, unfortunately.


 Is there any plan to change this behavior in near future?

 So, I'm thinking of alternatives without loosing the benefit of block
 join.
 I try to explain an idea I just thought about:

 Let's say I have a parent document A with a number of fields I want to
 update regularly and a number of child documents AC_1 ... AC_n which are
 only indexed once and aren't going to change anymore.
 So, if I index A and AC_* in a block and I update A, the block is gone.
 But if I create an additional document AF which only contains something
 like an foreign key to A and indexing AF + AC_* as a block (not A + AC_*
 anymore), could I perform a {!parent ... } query on AF + AC_* and make an
 join from the results to get A?
 Does this makes any sense and is it even possible? ;-)
 And if it's possible, how can I do it?

 Thanks,
 - Moritz







Re: Solrcloud: no registered leader found and new searcher error

2014-02-17 Thread sweety
How do i get them running?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrcloud-no-registered-leader-found-and-new-searcher-error-tp4117724p4117830.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr cloud hangs

2014-02-17 Thread Pawel Rog
Hi,
Here is the whole stack trace: https://gist.github.com/anonymous/9056783

--
Pawel

On Mon, Feb 17, 2014 at 4:53 PM, Mark Miller markrmil...@gmail.com wrote:

 Can you share the full stack trace dump?

 - Mark

 http://about.me/markrmiller

 On Feb 17, 2014, at 7:07 AM, Pawel Rog pawelro...@gmail.com wrote:

  Hi,
  I have quite annoying problem with Solr cloud. I have a cluster with 8
  shards and with 2 replicas in each. (Solr 4.6.1)
  After some time cluster doesn't respond to any update requests.
 Restarting
  the cluster nodes doesn't help.
 
  There are a lot of such stack traces (waiting for very long time):
 
 
- sun.misc.Unsafe.park(Native Method)
-
 java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
-
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
-
 
 org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
-
 
 org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
-
 
 org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
-
 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
- java.lang.Thread.run(Thread.java:722)
 
 
  Do you have any idea where can I look for?
 
  --
  Pawel




Re: Best way to copy data from SolrCloud to standalone Solr?

2014-02-17 Thread Shawn Heisey
On 2/17/2014 8:32 AM, Daniel Bryant wrote:
 I have a production SolrCloud server which has multiple sharded indexes,
 and I need to copy all of the indexes to a (non-cloud) Solr server
 within our QA environment.
 
 Can I ask for advice on the best way to do this please?
 
 I've searched the web and found solr2solr
 (https://github.com/dbashford/solr2solr), but the author states that
 this is best for small indexes, and ours are rather large at ~20Gb each.
 I've also looked at replication, but can't find a definite reference on
 how this should be done between SolrCloud and Solr?
 
 Any guidance is very much appreciated.

If the master index isn't changing at the time of the copy, and you're
on a non-Windows platform, you should be able to copy the index
directory directly.  On a Windows platform, whether you can copy the
index while Solr is using it would depend on how Solr/Lucene opens the
files.  A typical Windows file open will prevent anything else from
opening them, and I do not know whether Lucene is smarter than that.

SolrCloud requires the replication handler to be enabled on all configs,
but during normal operation, it does not actually use replication.  This
is a confusing thing for some users.

I *think* you can configure the replication handler on slave cores with
a non-cloud config that point at the master cores, and it should
replicate the main Lucene index, but not the config files.  I have no
idea whether things will work right if you configure other master
options like replicateAfter and config files, and I also don't know if
those options might cause problems for SolrCloud itself.  Those options
shouldn't be necessary for just getting the data into a dev environment,
though.

Thanks,
Shawn



Re: Solr cloud hangs

2014-02-17 Thread Pawel Rog
There are also many errors in solr log like that one:

org.apache.solr.update.StreamingSolrServers$1; error
org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for
connection from pool
at
org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:232)
at
org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:199)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:456)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:232)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)


--
Pawel


On Mon, Feb 17, 2014 at 8:01 PM, Pawel Rog pawelro...@gmail.com wrote:

 Hi,
 Here is the whole stack trace: https://gist.github.com/anonymous/9056783

 --
 Pawel


 On Mon, Feb 17, 2014 at 4:53 PM, Mark Miller markrmil...@gmail.comwrote:

 Can you share the full stack trace dump?

 - Mark

 http://about.me/markrmiller

 On Feb 17, 2014, at 7:07 AM, Pawel Rog pawelro...@gmail.com wrote:

  Hi,
  I have quite annoying problem with Solr cloud. I have a cluster with 8
  shards and with 2 replicas in each. (Solr 4.6.1)
  After some time cluster doesn't respond to any update requests.
 Restarting
  the cluster nodes doesn't help.
 
  There are a lot of such stack traces (waiting for very long time):
 
 
- sun.misc.Unsafe.park(Native Method)
-
 java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
-
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
-
 
 org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
-
 
 org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
-
 
 org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
-
 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
- java.lang.Thread.run(Thread.java:722)
 
 
  Do you have any idea where can I look for?
 
  --
  Pawel





Re: Solrcloud: no registered leader found and new searcher error

2014-02-17 Thread Erick Erickson
Well, first determine whether they are running or not.

Then look at the Solr log for that node when you try to start it up.

Then post the results if you're still puzzled.

You've given us no information about what the error (if any) is,
I'm speculating here.

You might want to review:
http://wiki.apache.org/solr/UsingMailingLists

Best
Erick


On Mon, Feb 17, 2014 at 10:27 AM, sweety sweetyshind...@yahoo.com wrote:

 How do i get them running?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solrcloud-no-registered-leader-found-and-new-searcher-error-tp4117724p4117830.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Autosuggest - Strange issue with leading numbers in query

2014-02-17 Thread Developer
Hi Erik,

Thanks a lot for your reply.

I expect it to return zero suggestions since the suggested keyword doesnt
actually start with numbers.

Expected results 
Searching for ga - returns galaxy 
Searching for gal - returns galaxy
Searching for 12321312321312ga - should not return any suggestion since
there is no keyword (combination) exists in the index.

Thanks




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Autosuggest-Strange-issue-with-leading-numbers-in-query-tp4116751p4117846.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Could not connect or ping a core after import a big data into it...

2014-02-17 Thread Eric_Peng
Sir, after I made experiment
if I there are more than 1000(roughly) documents in the core, the problem
will show up.

then I make a query in command window it shows

Exception in thread main
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error
opening new searcher. exceeded limit of maxWarmingSearchers=2, try again
later.
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at ExampleSolrJClient.handler(ExampleSolrJClient.java:107)
at ExampleSolrJClient.main(ExampleSolrJClient.java:53)




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Could-not-connect-or-ping-a-core-after-import-a-big-data-into-it-tp4117416p4117848.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ Socket Leak

2014-02-17 Thread Jared Rodriguez
Kiran  Shawn,

Thank you both for the info and you are both absolutely correct.  The issue
was not that sockets were leaked, but that wait time thing is a killer.  I
ended up fixing the problem by changing the system property of
http.maxConnections which is used internally to Apache httpclient to
setup the PoolingClientConnectionManager.  Previously, this had no value,
and was defaulting to 5.  That meant that any time there were more than 50
(maxConnections * maxperroute) concurrent connections to the Solr server,
non reusable connections were opening and closing and thus sitting in that
idle state .. too many sockets.

The fix was simply tuning the pool and setting http.maxConnections to a
higher value representing the number of concurrent users that I expect.
 Problem fixed, and a modest speed improvement simply by higher socket
reuse.

Thank you both for the help!

Jared




On Mon, Feb 17, 2014 at 3:03 AM, Kiran Chitturi 
kiran.chitt...@lucidworks.com wrote:

 Jared,

 I faced a similar issue when using CloudSolrServer with Solr. As Shawn
 pointed out the 'TIME_WAIT' status happens when the connection is closed
 by the http client. HTTP client closes connection whenever it thinks the
 connection is stale
 (
 https://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html
 #d5e405). Even the docs point out the stale connection checking cannot be
 all reliable.

 I see two ways to get around this:

 1. Enable 'SO_REUSEADDR'
 2. Disable stale connection checks.

 Also by default, when we create CSS it does not explicitly configure any
 http client parameters
 (
 https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/a
 pache/solr/client/solrj/impl/CloudSolrServer.java#L124). In this case, the
 default configuration parameters (max connections, max connections per
 host) are used for a http connection. You can explicitly configure these
 params when creating CSS using HttpClientUtil:

 ModifiableSolrParams params = new ModifiableSolrParams();
 params.set(HttpClientUtil.PROP_MAX_CONNECTIONS, 128);
 params.set(HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, 32);
 params.set(HttpClientUtil.PROP_FOLLOW_REDIRECTS, false);
 params.set(HttpClientUtil.PROP_CONNECTION_TIMEOUT, 3);
 httpClient = HttpClientUtil.createClient(params);

 final HttpClient client = HttpClientUtil.createClient(params);
 LBHttpSolrServer lb = new LBHttpSolrServer(client);
 CloudSolrServer server = new CloudSolrServer(zkConnect, lb);


 Currently, I am using http client 4.3.2 and building the client when
 creating the CSS. I also use 'SO_REUSEADDR' option and I haven't seen the
 'TIME_WAIT'  after this (may be because of better handling of stale
 connections in 4.3.2 or because of 'SO_REUSEADDR' param enabled). My
 current http client code looks like this: (works only with http client
 4.3.2)

 HttpClientBuilder httpBuilder = HttpClientBuilder.create();

 Builder socketConfig =  SocketConfig.custom();
 socketConfig.setSoReuseAddress(true);
 socketConfig.setSoTimeout(1);
 httpBuilder.setDefaultSocketConfig(socketConfig.build());
 httpBuilder.setMaxConnTotal(300);
 httpBuilder.setMaxConnPerRoute(100);

 httpBuilder.disableRedirectHandling();
 httpBuilder.useSystemProperties();
 LBHttpSolrServer lb = new LBHttpSolrServer(httpClient, parser)
 CloudSolrServer server = new CloudSolrServer(zkConnect, lb);


 There should be a way to configure socket reuse with 4.2.3 too. You can
 try different configurations. I am surprised you have 'TIME_WAIT'
 connections even after 30 minutes because 'TIME_WAIT' connection should be
 closed by default in 2 mins by O.S I think.


 HTH,

 --
 Kiran Chitturi,


 On 2/13/14 12:38 PM, Jared Rodriguez jrodrig...@kitedesk.com wrote:

 I am using solr/solrj 4.6.1 along with the apache httpclient 4.3.2 as part
 of a web application which connects to the solr server via solrj
 using CloudSolrServer();  The web application is wired up with Guice, and
 there is a single instance of the CloudSolrServer class used by all
 inbound
 requests.  All this is running on Amazon.
 
 Basically, everything looks and runs fine for a while, but even with
 moderate concurrency, solrj starts leaving sockets open.  We are handling
 only about 250 connections to the web app per minute and each of these
 issues from 3 - 7 requests to solr.  Over a 30 minute period of this type
 of use, we end up with many 1000s of lingering sockets.  I can see these
 when running netstats
 
 tcp0  0 ip-10-80-14-26.ec2.in:41098
 ip-10-99-145-47.ec2.i:glrpc
 TIME_WAIT
 
 All to the same target host, which is my solr server. There are no other
 pieces of infrastructure on that box, just solr.  Eventually, the server
 just dies as no further sockets can be opened and the opened ones are not
 reused.
 
 The solr server itself is unphased and 

Re: Could not connect or ping a core after import a big data into it...

2014-02-17 Thread Eric_Peng
I found out in this stranger situation, I could import, update, or delete
data(using DIH or SolrJ)
But the query will waiting forever.

So I delete all the documents or just reduce the document number and then
restart the server, problem disappeared



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Could-not-connect-or-ping-a-core-after-import-a-big-data-into-it-tp4117416p4117852.html
Sent from the Solr - User mailing list archive at Nabble.com.


Is it possible to load new elevate.xml on the fly?

2014-02-17 Thread Developer
Hi,

I am trying to figure out a way to use multiple elevate.xml using the query
parameters on the fly.

We have a scenario where we need to elevate documents based on
authentication (same core) without creating a new search handler.
*
For authenticated customers
*
elevate documents based on elevate1.xml

*For non-authenticated customers*

elevate documents based on elevate2.xml

I am not sure if there is a way to implement this using any other method. 

Any help in this regard is appreciated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-possible-to-load-new-elevate-xml-on-the-fly-tp4117856.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Could not connect or ping a core after import a big data into it...

2014-02-17 Thread Eric_Peng
I solved it , my mistake
I was using Solr4.6.1 jars,  but in my solrconfig.xml I used
LucenMatcheVersion 4.5
I just coped from last project and didn't check it.
My really stupid mistake



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Could-not-connect-or-ping-a-core-after-import-a-big-data-into-it-tp4117416p4117859.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best way to copy data from SolrCloud to standalone Solr?

2014-02-17 Thread Michael Della Bitta
I do know for certain that the backup command on a cloud core still works.
We have a script like this running on a cron to snapshot indexes:

curl -s '
http://localhost:8080/solr/#{core}/replication?command=backupnumberToKeep=4location=/tmp
'

(not really using /tmp for this, parameters changed to protect the guilty)

The admin handler for replication doesn't seem to be there, but the actual
API seems to work normally.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

The Science of Influence Marketing

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Mon, Feb 17, 2014 at 2:02 PM, Shawn Heisey s...@elyograg.org wrote:

 On 2/17/2014 8:32 AM, Daniel Bryant wrote:
  I have a production SolrCloud server which has multiple sharded indexes,
  and I need to copy all of the indexes to a (non-cloud) Solr server
  within our QA environment.
 
  Can I ask for advice on the best way to do this please?
 
  I've searched the web and found solr2solr
  (https://github.com/dbashford/solr2solr), but the author states that
  this is best for small indexes, and ours are rather large at ~20Gb each.
  I've also looked at replication, but can't find a definite reference on
  how this should be done between SolrCloud and Solr?
 
  Any guidance is very much appreciated.

 If the master index isn't changing at the time of the copy, and you're
 on a non-Windows platform, you should be able to copy the index
 directory directly.  On a Windows platform, whether you can copy the
 index while Solr is using it would depend on how Solr/Lucene opens the
 files.  A typical Windows file open will prevent anything else from
 opening them, and I do not know whether Lucene is smarter than that.

 SolrCloud requires the replication handler to be enabled on all configs,
 but during normal operation, it does not actually use replication.  This
 is a confusing thing for some users.

 I *think* you can configure the replication handler on slave cores with
 a non-cloud config that point at the master cores, and it should
 replicate the main Lucene index, but not the config files.  I have no
 idea whether things will work right if you configure other master
 options like replicateAfter and config files, and I also don't know if
 those options might cause problems for SolrCloud itself.  Those options
 shouldn't be necessary for just getting the data into a dev environment,
 though.

 Thanks,
 Shawn




Boost Query Example

2014-02-17 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)

Hi can some one help me on the Boost  Sort query example.

http://localhost:8983/solr/ 
ProductCollection/select?q=*%3A*wt=jsonindent=truefq=SKU:223-CL10V3^100 OR 
SKU:223-CL1^90

There is not different in the query Order, Let me know if I am missing 
something. Also I like to Order with the exact match for SKU:223-CL10V3^100

Thanks

Ravi


Re: Facet cache issue when deleting documents from the index

2014-02-17 Thread Marius Dumitru Florea
I tried to set the expungeDeletes flag but it didn't fix the problem.
The SolrServer doesn't expose a way to set this flag so I had to use:

new UpdateRequest().setAction(UpdateRequest.ACTION.COMMIT, true, true,
1, true).process(solrServer);

Any other hints?

Note that I managed to run my test in my real environment at runtime
and it passed, so it seems the behaviour depends on the size of the
documents that are committed (added to or deleted from the index).

Thanks,
Marius

On Mon, Feb 17, 2014 at 2:32 PM, Marius Dumitru Florea
mariusdumitru.flo...@xwiki.com wrote:
 On Mon, Feb 17, 2014 at 2:00 PM, Ahmet Arslan iori...@yahoo.com wrote:
 Hi,


 Also I noticed that in your code snippet you have server.delete(foo); 
 which does not exists. deleteById and deleteByQuery methods are defined in 
 SolrServer implementation.

 Yes, sorry, I have a wrapper over the SolrInstance that doesn't do
 much. In the case of delete it just forwards the call to deleteById.
 I'll check the expungeDeletes=true flag and post back the results.

 Thanks,
 Marius




 On Monday, February 17, 2014 1:42 PM, Ahmet Arslan iori...@yahoo.com wrote:
 Hi Marius,

 Facets are computed from indexed terms. Can you commit with 
 expungeDeletes=true flag?

 Ahmet




 On Monday, February 17, 2014 12:17 PM, Marius Dumitru Florea 
 mariusdumitru.flo...@xwiki.com wrote:
 Hi guys,

 I'm using Solr 4.6.1 (embedded) and for some reason the facet cache is
 not invalidated when documents are deleted from the index. Sadly, for
 me, I cannot reproduce this issue with an integration test like this:

 --8--
 SolrInstance server = getSolrInstance();

 SolrInputDocument document = new SolrInputDocument();
 document.setField(id, foo);
 document.setField(locale, en);
 server.add(document);

 server.commit();

 document = new SolrInputDocument();
 document.setField(id, bar);
 document.setField(locale, en);
 server.add(document);

 server.commit();

 SolrQuery query = new SolrQuery(*:*);
 query.set(facet, on);
 query.set(facet.field, locale);
 QueryResponse response = server.query(query);

 Assert.assertEquals(2, response.getResults().size());
 FacetField localeFacet = response.getFacetField(locale);
 Assert.assertEquals(1, localeFacet.getValues().size());
 Count en = localeFacet.getValues().get(0);
 Assert.assertEquals(en, en.getName());
 Assert.assertEquals(2, en.getCount());

 server.delete(foo);
 server.commit();

 response = server.query(query);

 Assert.assertEquals(1, response.getResults().size());
 localeFacet = response.getFacetField(locale);
 Assert.assertEquals(1, localeFacet.getValues().size());
 en = localeFacet.getValues().get(0);
 Assert.assertEquals(en, en.getName());
 Assert.assertEquals(1, en.getCount());
 --8--

 Nevertheless, when I do the 'same' on my real environment, the count
 for the locale facet remains 2 after one of the documents is deleted.
 The search result count is fine, so that's why I think it's a facet
 cache issue. Note that the facet count remains 2 even after I restart
 the server, so the cache is persisted on the file system.

 Strangely, the facet count is updated correctly if I modify the
 document instead of deleting it (i.e. removing a keyword from the
 content so that it isn't matched by the search query any more). So it
 looks like only delete triggers the issue.

 Now, an interesting fact is that if, on my real environment, I delete
 one of the documents and then add a new one, the facet count becomes
 3. So the last commit to the index, which inserts a new document,
 doesn't trigger a re-computation of the facet cache. The previous
 facet cache is simply incremented, so the error is perpetuated. At
 this point I don't even know how to fix the facet cache without
 deleting the Solr data folder so that the full index is rebuild.

 I'm still trying to figure out what is the difference between the
 integration test and my real environment (as I used the same schema
 and configuration). Do you know what might be wrong?

 Thanks,
 Marius



Re: Boost Query Example

2014-02-17 Thread Michael Della Bitta
Hi,

Filter queries don't affect score, so boosting won't have an effect there.
If you want those query terms to get boosted, move them into the q
parameter.

http://wiki.apache.org/solr/CommonQueryParameters#fq

Hope that helps!

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

The Science of Influence Marketing

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Mon, Feb 17, 2014 at 3:49 PM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote:


 Hi can some one help me on the Boost  Sort query example.

 http://localhost:8983/solr/ProductCollection/select?q=*%3A*wt=jsonindent=truefq=SKU:223-CL10V3^100
 OR SKU:223-CL1^90

 There is not different in the query Order, Let me know if I am missing
 something. Also I like to Order with the exact match for SKU:223-CL10V3^100

 Thanks

 Ravi



DIH and Tika

2014-02-17 Thread Teague James
Is there a way to specify the document types that Tika parses? In my DIH I
index the content of a SQL database which has a field that points to the SQL
record's binary file (which could be Word, PDF, JPG, MOV, etc.). Tika then
uses the document URL to index that document's content. However there are a
lot of document types that Tika cannot parse. I'd like to limit Tika to just
parsing Word and PDF documents so that I don't have to wait for Tika to
determine the document type and whether or not it can parse it. I suspect
that the number of exceptions being thrown over documents that Tika cannot
read is increasing my indexing time significantly. Any guidance is
appreciated.

-Teague



Escape \\n from getting highlighted - highlighter component

2014-02-17 Thread Developer
Hi,

When searching for a text like 'talk n text' the highlighter component also
adds the em tags to the special characters like \n. Is there a way to
avoid highlighting the special characters?

\\r\\n Family Messaging

 is getting replaced as 

\\r\\emn/em Family Messaging 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Escape-n-from-getting-highlighted-highlighter-component-tp4117895.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Autosuggest - Strange issue with leading numbers in query

2014-02-17 Thread Erick Erickson
Ah, OK, I though you were indexing things like 123412335ga, but not so.

Afraid I'm fresh out of ideas. Although I might try using TermsComponent
to examine the index and see if, somehow, there _are_ terms with leading
numbers in the output.

It's also possible that numbers are stripped when building the FST that
is used, but I don't know one way or the other.

Best,
Erick


On Mon, Feb 17, 2014 at 11:30 AM, Developer bbar...@gmail.com wrote:

 Hi Erik,

 Thanks a lot for your reply.

 I expect it to return zero suggestions since the suggested keyword doesnt
 actually start with numbers.

 Expected results
 Searching for ga - returns galaxy
 Searching for gal - returns galaxy
 Searching for 12321312321312ga - should not return any suggestion since
 there is no keyword (combination) exists in the index.

 Thanks




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-Autosuggest-Strange-issue-with-leading-numbers-in-query-tp4116751p4117846.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Could not connect or ping a core after import a big data into it...

2014-02-17 Thread Erick Erickson
Glad it's resolved, thanks for letting us know, it
removes some uncertainty.

Erick


On Mon, Feb 17, 2014 at 12:23 PM, Eric_Peng sagittariuse...@gmail.comwrote:

 I solved it , my mistake
 I was using Solr4.6.1 jars,  but in my solrconfig.xml I used
 LucenMatcheVersion 4.5
 I just coped from last project and didn't check it.
 My really stupid mistake



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Could-not-connect-or-ping-a-core-after-import-a-big-data-into-it-tp4117416p4117859.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR suggester component - Get suggestion dump

2014-02-17 Thread bbi123
I started using terms component to view the terms and the counts...

terms?terms.fl=autocomplete_phraseterms.regex=a.*terms.limit=1000



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-suggester-component-Get-suggestion-dump-tp4110026p4117913.html
Sent from the Solr - User mailing list archive at Nabble.com.


Preventing multiple on-deck searchers without causing failed commits

2014-02-17 Thread Colin Bartolome
We're using Solr version 4.2.1, in case new functionality has helped with 
this issue.


We have our Solr servers doing automatic soft commits with maxTime=1000. 
We also have a scheduled job that triggers a hard commit every fifteen 
minutes. When one of these hard commits happens while a soft commit is 
already in progress, we get that ubiquitous warning:


PERFORMANCE WARNING: Overlapping onDeckSearchers=2

Recently, we had an occasion to have a second scheduled job also issue a 
hard commit every now and then. Since our maxWarmingSearchers value was 
set to the default, 2, we occasionally had a hard commit trigger when two 
other searchers were already warming up, which led to this:


org.apache.solr.client.solrj.SolrServerException: No live SolrServers 
available to handle this request


as the servers started responded with a 503 HTTP response.

It seems like automatic soft commits wait until the hard commits are out 
of the way before they proceed. Is there a way to do the same for hard 
commits? Since we're passing waitSearcher=true in the update request that 
triggers the hard commits, I would expect the request to block until the 
server had enough headroom to service the commit. I did not expect that 
we'd start getting 503 responses.


Is there a way to pull this off, either via some extra request parameters 
or via some server-side configuration?


Slow 95th-percentile

2014-02-17 Thread Allan Carroll
Hi all,

I'm having trouble getting my Solr setup to get consistent performance. Average 
select latency is great, but 95% is dismal (10x average). It's probably 
something slightly misconfigured. I’ve seen it have nice, low variance 
latencies for a few hours here and there, but can’t figure out what’s different 
during those times.


* I’m running 4.1.0 using SolrCloud. 3 replicas of 1 shard on 3 EC2 boxes 
(8proc, 30GB RAM, SSDs). Load peaks around 30 selects per second and about 150 
updates per second. 

* The index has about 11GB of data in 14M docs, the other 10MB of data in 3K 
docs. Stays around 30 segments.

* Soft commits after 10 seconds, hard commits after 120 seconds. Though, 
turning off the update traffic doesn’t seem to have any affect on the select 
latencies.

* I think GC latency is low. Running 3GB heaps with 1G new size. GC time is 
around 3ms per second.
 

Here’s a typical select query:

fl=*,sortScore:textScoresort=textScore descstart=0q=text:((soccer OR MLS 
OR premier league OR FIFA OR world cup) OR (sorority OR fraternity OR 
greek life OR dorm OR campus))wt=jsonfq=startTime:[139265640 TO 
139271754]fq={!frange l=2 u=3}timeflag(startTime)fq={!frange 
l=139265640 u=139269594 
cache=false}timefix(startTime,-2160)fq=privacy:OPENdefType=edismaxrows=131


Anyone have any suggestions on where to look next? Or, if you know someone in 
the bay area that would consult for an hour or two and help me track it down, 
that’d be great too.

Thanks!

-Allan

Re: Preventing multiple on-deck searchers without causing failed commits

2014-02-17 Thread Shawn Heisey
On 2/17/2014 6:06 PM, Colin Bartolome wrote:
 We're using Solr version 4.2.1, in case new functionality has helped
 with this issue.
 
 We have our Solr servers doing automatic soft commits with maxTime=1000.
 We also have a scheduled job that triggers a hard commit every fifteen
 minutes. When one of these hard commits happens while a soft commit is
 already in progress, we get that ubiquitous warning:
 
 PERFORMANCE WARNING: Overlapping onDeckSearchers=2
 
 Recently, we had an occasion to have a second scheduled job also issue a
 hard commit every now and then. Since our maxWarmingSearchers value was
 set to the default, 2, we occasionally had a hard commit trigger when
 two other searchers were already warming up, which led to this:
 
 org.apache.solr.client.solrj.SolrServerException: No live SolrServers
 available to handle this request
 
 as the servers started responded with a 503 HTTP response.
 
 It seems like automatic soft commits wait until the hard commits are out
 of the way before they proceed. Is there a way to do the same for hard
 commits? Since we're passing waitSearcher=true in the update request
 that triggers the hard commits, I would expect the request to block
 until the server had enough headroom to service the commit. I did not
 expect that we'd start getting 503 responses.

Remember this mantra: Hard commits are about durability, soft commits
are about visibility.  You might already know this, but it is the key to
figuring out how to handle commits, whether they are user-triggered or
done automatically by the server.

With Solr 4.x, it's best to *always* configure autoCommit with
openSearcher=false.  This does a hard commit but does not open a new
searcher.  The result: Data is flushed to disk and the current
transaction log is closed.  New documents will not be searchable after
this kind of commit.  For maxTime and maxDocs, pick values that won't
result in huge transaction logs, which increase Solr startup time.

http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup

For document visibility, you can rely on autoSoftCommit, and you
indicated that you already have it configured. Decide how long you can
wait for new content that has just been indexed.  Do you *really* need
new data to be searchable within one second?  If so, you're good.  If
not, increase the maxTime value here.  Be sure to make the value at
least a little bit longer than the amount of time it takes for a soft
commit to finish, including cache warmup time.

Thanks,
Shawn



Re: Slow 95th-percentile

2014-02-17 Thread Shawn Heisey
On 2/17/2014 6:12 PM, Allan Carroll wrote:
 I'm having trouble getting my Solr setup to get consistent performance. 
 Average select latency is great, but 95% is dismal (10x average). It's 
 probably something slightly misconfigured. I’ve seen it have nice, low 
 variance latencies for a few hours here and there, but can’t figure out 
 what’s different during those times.
 
 
 * I’m running 4.1.0 using SolrCloud. 3 replicas of 1 shard on 3 EC2 boxes 
 (8proc, 30GB RAM, SSDs). Load peaks around 30 selects per second and about 
 150 updates per second. 
 
 * The index has about 11GB of data in 14M docs, the other 10MB of data in 3K 
 docs. Stays around 30 segments.
 
 * Soft commits after 10 seconds, hard commits after 120 seconds. Though, 
 turning off the update traffic doesn’t seem to have any affect on the select 
 latencies.
 
 * I think GC latency is low. Running 3GB heaps with 1G new size. GC time is 
 around 3ms per second.
  
 
 Here’s a typical select query:
 
 fl=*,sortScore:textScoresort=textScore descstart=0q=text:((soccer OR 
 MLS OR premier league OR FIFA OR world cup) OR (sorority OR 
 fraternity OR greek life OR dorm OR 
 campus))wt=jsonfq=startTime:[139265640 TO 139271754]fq={!frange 
 l=2 u=3}timeflag(startTime)fq={!frange l=139265640 u=139269594 
 cache=false}timefix(startTime,-2160)fq=privacy:OPENdefType=edismaxrows=131

The first thing to say is that it's fairly normal for the 95th and 99th
percentile values to be quite a lot higher than the median and average
values.  I don't have actual values so I don't know if it's bad or not.

You're good on the most important performance-related resource, which is
memory for the OS disk cache.  The only thing that stands out as a
possible problem from what I know so far is garbage collection.  It
might be a case of full garbage collections happening too frequently, or
it might be a case of garbage collection pauses taking too long.  It
might even be a combination of both.

To fix frequent full collections, increase the heap size.  To fix the
other problem, use the CMS collector and tune it.

Two bits of information will help with recommendations: Your java
startup options, and your solrconfig.xml.

You're using an option in your query that I've never seen before.  I
don't know if frange is slow or not.

One last thing that might cause problems is super-frequent commits.

I could also be completely wrong!

Thanks,
Shawn



Re: Preventing multiple on-deck searchers without causing failed commits

2014-02-17 Thread Colin Bartolome

On 02/17/2014 05:38 PM, Shawn Heisey wrote:

On 2/17/2014 6:06 PM, Colin Bartolome wrote:

We're using Solr version 4.2.1, in case new functionality has helped
with this issue.

We have our Solr servers doing automatic soft commits with maxTime=1000.
We also have a scheduled job that triggers a hard commit every fifteen
minutes. When one of these hard commits happens while a soft commit is
already in progress, we get that ubiquitous warning:

PERFORMANCE WARNING: Overlapping onDeckSearchers=2

Recently, we had an occasion to have a second scheduled job also issue a
hard commit every now and then. Since our maxWarmingSearchers value was
set to the default, 2, we occasionally had a hard commit trigger when
two other searchers were already warming up, which led to this:

org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this request

as the servers started responded with a 503 HTTP response.

It seems like automatic soft commits wait until the hard commits are out
of the way before they proceed. Is there a way to do the same for hard
commits? Since we're passing waitSearcher=true in the update request
that triggers the hard commits, I would expect the request to block
until the server had enough headroom to service the commit. I did not
expect that we'd start getting 503 responses.


Remember this mantra: Hard commits are about durability, soft commits
are about visibility.  You might already know this, but it is the key to
figuring out how to handle commits, whether they are user-triggered or
done automatically by the server.

With Solr 4.x, it's best to *always* configure autoCommit with
openSearcher=false.  This does a hard commit but does not open a new
searcher.  The result: Data is flushed to disk and the current
transaction log is closed.  New documents will not be searchable after
this kind of commit.  For maxTime and maxDocs, pick values that won't
result in huge transaction logs, which increase Solr startup time.

http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup

For document visibility, you can rely on autoSoftCommit, and you
indicated that you already have it configured. Decide how long you can
wait for new content that has just been indexed.  Do you *really* need
new data to be searchable within one second?  If so, you're good.  If
not, increase the maxTime value here.  Be sure to make the value at
least a little bit longer than the amount of time it takes for a soft
commit to finish, including cache warmup time.

Thanks,
Shawn



Increasing the maxTime value doesn't actually solve the problem, though; 
it just makes it a little less likely. Really, the soft commits aren't 
the problem here, as far as we can tell. It's that a request that 
triggers a hard commit simply fails when the server is already at 
maxWarmingSearchers. I would expect the request to queue up and wait 
until the server could handle it.


Re: Preventing multiple on-deck searchers without causing failed commits

2014-02-17 Thread Shawn Heisey
On 2/17/2014 7:06 PM, Colin Bartolome wrote:
 Increasing the maxTime value doesn't actually solve the problem, though;
 it just makes it a little less likely. Really, the soft commits aren't
 the problem here, as far as we can tell. It's that a request that
 triggers a hard commit simply fails when the server is already at
 maxWarmingSearchers. I would expect the request to queue up and wait
 until the server could handle it.

I think I put too much information in my reply.  Apologies.  Here's the
most important information to deal with first:

Don't send hard commits at all.  Configure autoCommit in your server
config, with the all-important openSearcher parameter set to false.
That will take care of all your hard commit needs, but those commits
will never open a new searcher, so they cannot cause an overlap with the
soft commits that DO open a new searcher.

Thanks,
Shawn



Re: Limit amount of search result

2014-02-17 Thread rachun
hi Samee,

Thank you very much for your suggestion.
Now I got it worked now;)

Chun.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Limit-amount-of-search-result-tp4117062p4117952.html
Sent from the Solr - User mailing list archive at Nabble.com.