Re: SolR InvalidTokenOffsetsException with Highlighter and Synonyms

2013-04-16 Thread juancesarvillalba
 Hi,

I am using the stander highlighting.
http://wiki.apache.org/solr/HighlightingParameters

Cheers



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolR-InvalidTokenOffsetsException-with-Highlighter-and-Synonyms-tp4053988p4056240.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Usage of CloudSolrServer?

2013-04-16 Thread Furkan KAMACI
Hi Shawn;

I am sorry but what kind of Load Balancing is that? I mean does it check
whether some leaders are using much CPU or RAM etc.? I think a problem may
occur at such kind of scenario: if some of leaders getting more documents
than other leaders (I don't know how it is decided that into which shard a
document will go) than there will be a bottleneck on that leader?


2013/4/15 Shawn Heisey s...@elyograg.org

 On 4/15/2013 8:05 AM, Furkan KAMACI wrote:

 My system is as follows: I crawl data with Nutch and send them into
 SolrCloud. Users will search at Solr.

 What is that CloudSolrServer, should I use it for load balancing or is it
 something else different?


 It appears that the Solr integration in Nutch currently does not use
 CloudSolrServer.  There is an issue to add it.  The mutual dependency on
 HttpClient is holding it up - Nutch uses HttpClient 3, SolrJ 4.x uses
 HttpClient 4.

 https://issues.apache.org/**jira/browse/NUTCH-1377https://issues.apache.org/jira/browse/NUTCH-1377

 Until that is fixed, a load balancer would be required for full redundancy
 for updates with SolrCloud.  You don't have to use a load balancer for it
 to work, but if the Solr server that Nutch is using goes down, then
 indexing will stop unless you reconfigure Nutch or bring the Solr server
 back up.

 Thanks,
 Shawn




Re: Empty Solr 4.2.1 can not create Collection

2013-04-16 Thread A.Eibner

Hi,
sorry for pushing, but I just replayed the steps with solr 4.0 where 
everything works fine.
Then I switched to solr 4.2.1 and replayed the exact same steps and the 
collection won't start and no leader will be elected.


Any clues ?
Should I try it on the developer mailing list, maybe it's a bug ?

Kind Regards
Alexander

Am 2013-04-10 22:27, schrieb A.Eibner:

Hi,

here the clusterstate.json (from zookeeper) after creating the core:

{storage:{
 shards:{shard1:{
 range:8000-7fff,
 state:active,
 replicas:{app02:9985_solr_storage-core:{
 shard:shard1,
 state:down,
 core:storage-core,
 collection:storage,
 node_name:app02:9985_solr,
 base_url:http://app02:9985/solr,
 router:compositeId}}
cZxid = 0x10024
ctime = Wed Apr 10 22:18:13 CEST 2013
mZxid = 0x1003d
mtime = Wed Apr 10 22:21:26 CEST 2013
pZxid = 0x10024
cversion = 0
dataVersion = 2
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 467
numChildren = 0

But looking in the log files I found the following error (this also
occures with the collection api)

SEVERE: org.apache.solr.common.SolrException: Error CREATEing SolrCore
'storage_shard1_replica1':
 at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:483)

 at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:140)

 at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

 at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:591)

 at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:192)

 at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)

 at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)

 at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)

 at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)

 at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)

 at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)

 at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)

 at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)

 at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
 at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:999)

 at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:565)

 at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:307)

 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

 at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.common.cloud.ZooKeeperException:
 at
org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:931)
 at
org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:892)
 at
org.apache.solr.core.CoreContainer.register(CoreContainer.java:841)
 at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:479)

 ... 19 more
Caused by: java.lang.NullPointerException
 at
org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:190)

 at
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:156)

 at
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:100)

 at
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:266)
 at
org.apache.solr.cloud.ZkController.joinElection(ZkController.java:935)
 at
org.apache.solr.cloud.ZkController.register(ZkController.java:761)
 at
org.apache.solr.cloud.ZkController.register(ZkController.java:727)
 at
org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:908)
 ... 22 more

Kind regards
Alexander

Am 2013-04-10 19:12, schrieb Joel Bernstein:

Can you post what your clusterstate.json?

After you spin up the initial core, it will automatically become
leader for
that shard.


On Wed, Apr 10, 2013 at 3:43 AM, A.Eibner a_eib...@yahoo.de wrote:


Hi Joel,

I followed your steps, the cores and collection get created, but
there is
no leader elected so I can not query the collection...
Do I miss something ?

Kind Regards
Alexander

Am 2013-04-09 10:21, schrieb A.Eibner:

  Hi,

thanks for your faster answer.

You don't use the Collection API - may I ask you why ?
Therefore you have to setup everything 

Is cache useful for my scenario?

2013-04-16 Thread samabhiK
Hi,

I am new in Solr and wish to use version 4.2.x for my app in production. I
want to show hundreds and thousands of markers on a map with contents coming
from Solr. As the user moves around the map and pans, the browser will fetch
data/markers using a BBOX filter (based on the maps' viewport boundary). 

There will be a lot of data that will be indexed in Solr. My question is,
does caching help in my case? As the filter queries will vary for almost all
users ( because the viewport latitude/longitude would vary), in what ways
can I use Caching to increase performance. Should I completely turn off
caching?

If you can suggest by your experience, it would be really nice.

Thanks
Sam



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-cache-useful-for-my-scenario-tp4056250.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Usage of CloudSolrServer?

2013-04-16 Thread Upayavira
If you are accessing Solr from Java code, you will likely use the SolrJ
client to do so. If your users are hitting Solr directly, you should
think about whether this is wise - as well as providing them with direct
search access, you are also providing them with the ability to delete
your entire index with a single command.

SolrJ isn't really a load balancer as such. When SolrJ is used to make a
request against a collection, it will ask Zookeeper for the names of the
shards that make up that collection, and for the hosts/cores that make
up the set of replicas for those shards.

It will then choose one of those hosts/cores for each shard, and send a
request to them as a distributed search request.

This has the advantage over traditional load balancing that if you bring
up a new node, that node will register itself with ZooKeeper, and thus
your SolrJ client(s) will know about it, without any intervention.

Upayavira

On Tue, Apr 16, 2013, at 08:36 AM, Furkan KAMACI wrote:
 Hi Shawn;
 
 I am sorry but what kind of Load Balancing is that? I mean does it check
 whether some leaders are using much CPU or RAM etc.? I think a problem
 may
 occur at such kind of scenario: if some of leaders getting more documents
 than other leaders (I don't know how it is decided that into which shard
 a
 document will go) than there will be a bottleneck on that leader?
 
 
 2013/4/15 Shawn Heisey s...@elyograg.org
 
  On 4/15/2013 8:05 AM, Furkan KAMACI wrote:
 
  My system is as follows: I crawl data with Nutch and send them into
  SolrCloud. Users will search at Solr.
 
  What is that CloudSolrServer, should I use it for load balancing or is it
  something else different?
 
 
  It appears that the Solr integration in Nutch currently does not use
  CloudSolrServer.  There is an issue to add it.  The mutual dependency on
  HttpClient is holding it up - Nutch uses HttpClient 3, SolrJ 4.x uses
  HttpClient 4.
 
  https://issues.apache.org/**jira/browse/NUTCH-1377https://issues.apache.org/jira/browse/NUTCH-1377
 
  Until that is fixed, a load balancer would be required for full redundancy
  for updates with SolrCloud.  You don't have to use a load balancer for it
  to work, but if the Solr server that Nutch is using goes down, then
  indexing will stop unless you reconfigure Nutch or bring the Solr server
  back up.
 
  Thanks,
  Shawn
 
 


first time with new keyword, solr take to much time to give the result

2013-04-16 Thread Montu v Boda
Hi,

when we search with any new keyword at first time then solr 4.2.1 take to
much time to give the result.

we have 506 document is index in solr and it's size is 400GB.

now when We search for keyword test it will take 1 min to give the
response for 1 rows.

we fire the query from the java application using solrj client.

this behavior is same with solr 1.4, 3.5 and 4.2.1.

all 400GB data is indexed in one folder called Solr Home\data\index.

after fire the query, when we open the resource management then it will show
that more cost is of Disk I/O

any help would be helpfull to us

Thanks  Regards
Montu v Boda



--
View this message in context: 
http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254.html
Sent from the Solr - User mailing list archive at Nabble.com.


SEVERE: shard update error StdNode on SolrCloud 4.2.1

2013-04-16 Thread Steve Woodcock
Hi

We have a simple SolrCloud setup (4.2.1) running with a single shard and
two nodes, and it's working fine except whenever we send an update request,
the leader logs this error:

SEVERE: shard update error StdNode:
http://10.20.10.42:8080/solr/ts/:org.apache.solr.common.SolrException:
Server at http://10.20.10.42:8080/solr/ts returned non ok status:500,
message:Internal Server Error
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
...

Which triggers a lot of to-ing and fro-ing between the leader and the
replica, starting with this response on the replica to the above:

INFO: [ts] webapp=/solr path=/update params={distrib.from=
http://10.20.10.29:8080/solr/ts/update.distrib=FROMLEADERwt=javabinversion=2}
{} 0 12
15-Apr-2013 16:38:23 org.apache.solr.common.SolrException log
SEVERE: java.lang.UnsupportedOperationException
at
org.apache.lucene.queries.function.FunctionValues.longVal(FunctionValues.java:46)
at
org.apache.solr.update.VersionInfo.getVersionFromIndex(VersionInfo.java:201)
at org.apache.solr.update.UpdateLog.lookupVersion(UpdateLog.java:714)
at org.apache.solr.update.VersionInfo.lookupVersion(VersionInfo.java:184)
...

At this point, the leader tells the replica to recover:

INFO: try and ask http://10.20.10.42:8080/solr to recover

Which it does:

15-Apr-2013 16:38:23 org.apache.solr.handler.admin.CoreAdminHandler
handleRequestRecoveryAction

but the attempt to use PeerSync fails:

INFO: Attempting to PeerSync from http://10.20.10.29:8080/solr/ts/ core=ts
- recoveringAfterStartup=false
15-Apr-2013 16:38:26 org.apache.solr.update.PeerSync sync
INFO: PeerSync: core=ts url=http://10.20.10.42:8080/solr START replicas=[
http://10.20.10.29:8080/solr/ts/] nUpdates=100
15-Apr-2013 16:38:26 org.apache.solr.update.PeerSync handleVersions
INFO: PeerSync: core=ts url=http://10.20.10.42:8080/solr  Received 100
versions from 10.20.10.29:8080/solr/ts/
15-Apr-2013 16:38:26 org.apache.solr.update.PeerSync handleVersions
INFO: PeerSync: core=ts url=http://10.20.10.42:8080/solr  Our versions are
too old. ourHighThreshold=1432379781917179904
otherLowThreshold=1432382177294680064
15-Apr-2013 16:38:26 org.apache.solr.update.PeerSync sync
INFO: PeerSync: core=ts url=http://10.20.10.42:8080/solr DONE. sync failed
15-Apr-2013 16:38:26 org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: PeerSync Recovery was not successful - trying replication. core=ts

Replication then proceeds correctly, and the node is brought up to date.

I'm guessing it's not supposed to work like this, but I'm having trouble
finding anyone else with this problem, which makes me suspect we configured
it wrong somewhere along the line, but I've checked it all per the
documentation and I'm starting to run out of ideas.

Any suggestions for where to look next would be most appreciated!

Regards, Steve Woodcock


Re: SolR InvalidTokenOffsetsException with Highlighter and Synonyms

2013-04-16 Thread Dmitry Kan
Could be a bug in the higlighter. But before claiming that, I would still
play around different options, like hl.fragSize, hl.highlightMultiTerm.

Also, have you considered storing synonyms in the index?


On Tue, Apr 16, 2013 at 9:42 AM, juancesarvillalba 
juancesarvilla...@gmail.com wrote:

  Hi,

 I am using the stander highlighting.
 http://wiki.apache.org/solr/HighlightingParameters

 Cheers



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SolR-InvalidTokenOffsetsException-with-Highlighter-and-Synonyms-tp4053988p4056240.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: first time with new keyword, solr take to much time to give the result

2013-04-16 Thread Dmitry Kan
Hi,

Things to google ;)

1. warmup queries
2. solr cache

How much RAM does you index take now?

Dmitry


On Tue, Apr 16, 2013 at 1:22 PM, Montu v Boda montu.b...@highqsolutions.com
 wrote:

 Hi,

 when we search with any new keyword at first time then solr 4.2.1 take to
 much time to give the result.

 we have 506 document is index in solr and it's size is 400GB.

 now when We search for keyword test it will take 1 min to give the
 response for 1 rows.

 we fire the query from the java application using solrj client.

 this behavior is same with solr 1.4, 3.5 and 4.2.1.

 all 400GB data is indexed in one folder called Solr Home\data\index.

 after fire the query, when we open the resource management then it will
 show
 that more cost is of Disk I/O

 any help would be helpfull to us

 Thanks  Regards
 Montu v Boda



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: first time with new keyword, solr take to much time to give the result

2013-04-16 Thread Montu v Boda
Hi

currently, my solr is deploy in tomcat1 and we have given 4GB memory of that
tomcat

Thanks  Regards
Montu v Boda



--
View this message in context: 
http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056261.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Storing Solr Index on NFS

2013-04-16 Thread Furkan KAMACI
Hi Walter;

You said: It is not safe to share Solr index files between two Solr
servers. Why do you think like that?


2013/4/16 Tim Vaillancourt t...@elementspace.com

 If centralization of storage is your goal by choosing NFS, iSCSI works
 reasonably well with SOLR indexes, although good local-storage will always
 be the overall winner.

 I noticed a near 5% degredation in overall search performance (casual
 testing, nothing scientific) when moving a 40-50GB indexes to iSCSI (10GBe
 network) from a 4x7200rpm RAID 10 local SATA disk setup.

 Tim


 On 15/04/13 09:59 AM, Walter Underwood wrote:

 Solr 4.2 does have field compression which makes smaller indexes. That
 will reduce the amount of network traffic. That probably does not help
 much, because I think the latency of NFS is what causes problems.

 wunder

 On Apr 15, 2013, at 9:52 AM, Ali, Saqib wrote:

  Hello Walter,

 Thanks for the response. That has been my experience in the past as well.
 But I was wondering if there new are things in Solr 4 and NFS 4.1 that
 make
 the storing of indexes on a NFS mount feasible.

 Thanks,
 Saqib


 On Mon, Apr 15, 2013 at 9:47 AM, Walter Underwoodwunder@wunderwood.**
 org wun...@wunderwood.orgwrote:

  On Apr 15, 2013, at 9:40 AM, Ali, Saqib wrote:

  Greetings,

 Are there any issues with storing Solr Indexes on a NFS share? Also any
 recommendations for using NFS for Solr indexes?

 I recommend that you do not put Solr indexes on NFS.

 It can be very slow, I measured indexing as 100X slower on NFS a few
 years
 ago.

 It is not safe to share Solr index files between two Solr servers, so
 there is no benefit to NFS.

 wunder
 --
 Walter Underwood
 wun...@wunderwood.org




  --
 Walter Underwood
 wun...@wunderwood.org







Re: Usage of CloudSolrServer?

2013-04-16 Thread Furkan KAMACI
Thanks for your detailed explanation. However you said:

It will then choose one of those hosts/cores for each shard, and send a
request to them as a distributed search request. Is there any document
that explains of distributed search? What is the criteria for it?


2013/4/16 Upayavira u...@odoko.co.uk

 If you are accessing Solr from Java code, you will likely use the SolrJ
 client to do so. If your users are hitting Solr directly, you should
 think about whether this is wise - as well as providing them with direct
 search access, you are also providing them with the ability to delete
 your entire index with a single command.

 SolrJ isn't really a load balancer as such. When SolrJ is used to make a
 request against a collection, it will ask Zookeeper for the names of the
 shards that make up that collection, and for the hosts/cores that make
 up the set of replicas for those shards.

 It will then choose one of those hosts/cores for each shard, and send a
 request to them as a distributed search request.

 This has the advantage over traditional load balancing that if you bring
 up a new node, that node will register itself with ZooKeeper, and thus
 your SolrJ client(s) will know about it, without any intervention.

 Upayavira

 On Tue, Apr 16, 2013, at 08:36 AM, Furkan KAMACI wrote:
  Hi Shawn;
 
  I am sorry but what kind of Load Balancing is that? I mean does it check
  whether some leaders are using much CPU or RAM etc.? I think a problem
  may
  occur at such kind of scenario: if some of leaders getting more documents
  than other leaders (I don't know how it is decided that into which shard
  a
  document will go) than there will be a bottleneck on that leader?
 
 
  2013/4/15 Shawn Heisey s...@elyograg.org
 
   On 4/15/2013 8:05 AM, Furkan KAMACI wrote:
  
   My system is as follows: I crawl data with Nutch and send them into
   SolrCloud. Users will search at Solr.
  
   What is that CloudSolrServer, should I use it for load balancing or
 is it
   something else different?
  
  
   It appears that the Solr integration in Nutch currently does not use
   CloudSolrServer.  There is an issue to add it.  The mutual dependency
 on
   HttpClient is holding it up - Nutch uses HttpClient 3, SolrJ 4.x uses
   HttpClient 4.
  
   https://issues.apache.org/**jira/browse/NUTCH-1377
 https://issues.apache.org/jira/browse/NUTCH-1377
  
   Until that is fixed, a load balancer would be required for full
 redundancy
   for updates with SolrCloud.  You don't have to use a load balancer for
 it
   to work, but if the Solr server that Nutch is using goes down, then
   indexing will stop unless you reconfigure Nutch or bring the Solr
 server
   back up.
  
   Thanks,
   Shawn
  
  



Re: Storing Solr Index on NFS

2013-04-16 Thread Yago Riveiro
Furkan, see this post.

http://grokbase.com/t/lucene/solr-user/117t1eswyk/multiple-solr-servers-and-a-shared-index-again
 

Cumprimentos

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Tuesday, April 16, 2013 at 12:15 PM, Furkan KAMACI wrote:

 Hi Walter;
 
 You said: It is not safe to share Solr index files between two Solr
 servers. Why do you think like that?
 
 
 2013/4/16 Tim Vaillancourt t...@elementspace.com 
 (mailto:t...@elementspace.com)
 
  If centralization of storage is your goal by choosing NFS, iSCSI works
  reasonably well with SOLR indexes, although good local-storage will always
  be the overall winner.
  
  I noticed a near 5% degredation in overall search performance (casual
  testing, nothing scientific) when moving a 40-50GB indexes to iSCSI (10GBe
  network) from a 4x7200rpm RAID 10 local SATA disk setup.
  
  Tim
  
  
  On 15/04/13 09:59 AM, Walter Underwood wrote:
  
   Solr 4.2 does have field compression which makes smaller indexes. That
   will reduce the amount of network traffic. That probably does not help
   much, because I think the latency of NFS is what causes problems.
   
   wunder
   
   On Apr 15, 2013, at 9:52 AM, Ali, Saqib wrote:
   
   Hello Walter,

Thanks for the response. That has been my experience in the past as 
well.
But I was wondering if there new are things in Solr 4 and NFS 4.1 that
make
the storing of indexes on a NFS mount feasible.

Thanks,
Saqib


On Mon, Apr 15, 2013 at 9:47 AM, Walter Underwoodwunder@wunderwood.**
org wun...@wunderwood.org (mailto:wun...@wunderwood.org)wrote:

On Apr 15, 2013, at 9:40 AM, Ali, Saqib wrote:
 
 Greetings,
  
  Are there any issues with storing Solr Indexes on a NFS share? Also 
  any
  recommendations for using NFS for Solr indexes?
  
 
 I recommend that you do not put Solr indexes on NFS.
 
 It can be very slow, I measured indexing as 100X slower on NFS a few
 years
 ago.
 
 It is not safe to share Solr index files between two Solr servers, so
 there is no benefit to NFS.
 
 wunder
 --
 Walter Underwood
 wun...@wunderwood.org (mailto:wun...@wunderwood.org)
 
 
 
 
 --
   Walter Underwood
   wun...@wunderwood.org (mailto:wun...@wunderwood.org)
   
  
  
 
 
 




Solr 4.2.1 sorting by distance to polygon centre.

2013-04-16 Thread Guido Medina

Hi,

I got everything in place, my polygons are indexing properly, I played a 
bit with LSP which helped me a lot, now, I have JTS 1.13 inside 
solr.war; here is my challenge:


I have big polygon (A) which contains smaller polygons (B and C), B and 
C have some intersection, so if I search for a coordinate inside the 3, 
I would like to sort by the distance to the centre of the polygons that 
match the criteria.


As example, let's say dot B is on the centre of B, dot C is at the 
centre of C and dot A is at the intersection of B and C which happens to 
be the centre of A, so for dot A should be polygon A first and so on. I 
could compute with the distances using the result but since Solr is 
doing a heavy load already, why not just include the sort in it.


Here is my field type definition:

!-- Spatial field type --
fieldType name=location_rpt 
class=solr.SpatialRecursivePrefixTreeFieldType

spatialContextFactory=com.spatial4j.core.context.jts.JtsSpatialContextFactory
   units=degrees/


Field definition:

!-- JTS spatial polygon field --
field name=geopolygon type=location_rpt indexed=true 
stored=false required=false multiValued=true/



I'm using the Solr admin UI first to shape my query and then moving to 
our web app which uses solrj, here is the XML form of my result which 
includes the query I'm making, which scores all distances to 1.0 (Not 
what I want):


|?xml version=1.0 encoding=UTF-8?
response

lst  name=responseHeader
  int  name=status0/int
  int  name=QTime9/int
  lst  name=params
str  name=flid,score/str
str  name=sortscore asc/str
str  name=indenttrue/str
str  name=q*:*/str
str  name=_136620720/str
str  name=wtxml/str
str  name=fq{!score=distance}geopolygon:Intersects(-6.271906 
53.379284)/str
  /lst
/lst
result  name=response  numFound=3  start=0  maxScore=1.0
  doc
str  name=iduid13972/str
float  name=score1.0/float/doc
  doc
str  name=iduid13979/str
float  name=score1.0/float/doc
  doc
str  name=iduid13974/str
float  name=score1.0/float/doc
/result
/response|


Thanks for all responses,

Guido.


Re: first time with new keyword, solr take to much time to give the result

2013-04-16 Thread Toke Eskildsen
On Tue, 2013-04-16 at 12:22 +0200, Montu v Boda wrote:
 we have 506 document is index in solr and it's size is 400GB.
 
 now when We search for keyword test it will take 1 min to give the
 response for 1 rows.

At this point, you have searched for other keywords before you measure
on keyword test, right? The first search on a newly opened index is
notoriously slow.

 after fire the query, when we open the resource management then it will show
 that more cost is of Disk I/O

Both searching and value retrieval (for the 10K rows) requires a lot of
random access in Lucene/Solr and, I guess, just about every other
comparable search engines.

I will bet a cake that your underlying storage is spinning disks. When
you perform a search for a keyword that has not been used before or not
in a while, the disk cache has little data for that search so there will
be a lot of random access to the underlying storage. Spinning disks are
really bad at this.

 any help would be helpfull to us

Short answer: Use a SSD.

Longer answer: You need to either lower the amount of seeks or make them
faster (or both). You lower the amount of seeks by (in your case)
copious amounts of RAM and a lot of warming of your searchers. You make
the seeks faster by switching storage type.

RAIDing of spinning drives does not help much as the benefits of this
are higher bulk transfer rates and/or concurrent requests, where you
need lower latency. You could buy faster spinning drives, but with
current prices of SSDs I would really advice that you choose that road
instead.

Regards,
Toke Eskildsen, State and University Library, Denmark



Re: first time with new keyword, solr take to much time to give the result

2013-04-16 Thread Dmitry Kan
In the admin page you can monitor the cache parameters, like eviction. If
you cache evicts too much, you can increase its capacity. NOTE: this will
affect on RAM consumption, so you would need to change the tomcat config
too.


On Tue, Apr 16, 2013 at 2:08 PM, Montu v Boda montu.b...@highqsolutions.com
 wrote:

 Hi

 currently, my solr is deploy in tomcat1 and we have given 4GB memory of
 that
 tomcat

 Thanks  Regards
 Montu v Boda



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056261.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Some Questions About Using Solr as Cloud

2013-04-16 Thread Erick Erickson
Yes. Every node is really self-contained. When you send a doc to a
cluster where each shard has a replica, the raw doc is sent to
each node of that shard and indexed independently.

About old docs, it's the same as Solr 3.6. Data associated with
docs stays around in the index until it's merged away.

You cannot transfer just the indexed form of a document from one
core to another, you have to re-index the doc.

Best
Erick

On Mon, Apr 15, 2013 at 7:46 AM, Furkan KAMACI furkankam...@gmail.com wrote:
 Hi Jack;

 I see that SolrCloud makes everything automated. When I use SolrCloud is it
 true that: there may be more than one computer responsible for indexing at
 any time?

 2013/4/15 Jack Krupansky j...@basetechnology.com

 There are no masters or slaves in SolrCloud - it's fully distributed. Some
 cluster nodes will be leaders (of the shard on that node) at a given
 point in time, but different nodes may be leaders at different points in
 time as they become elected.

 In a distributed cluster you would never want to store documents only on
 one node. Sure, you can do that by setting the replication factor to 1, but
 that defeats half the purpose for SolrCloud.

 Index transfer is automatic - SolrCloud supports fully distributed update.

 You might be getting confused with the old Master-Slave-Replication
 model that Solr had (and still has) which is distinct from SolrCloud.

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Sunday, April 14, 2013 7:45 PM
 To: solr-user@lucene.apache.org
 Subject: Some Questions About Using Solr as Cloud


 I read wiki and reading SolrGuide of Lucidworks. However I want to clear
 something in my mind. Here are my questions:

 1) Does SolrCloud lets a multi master design (is there any document that I
 can read about it)?
 2) Let's assume that I use multiple cores i.e. core A and core B. Let's
 assume that there is a document just indexed at core B. If I send a search
 request to core A can I get result?
 3) When I use multi master design (if exists) can I transfer one master's
 index data into another (with its slaves or not)?
 4) When I use multi core design can I transfer one index data into another
 core or anywhere else?

 By the way thanks for the quick responses and kindness at mail list.



Re: Some Questions About Using Solr as Cloud

2013-04-16 Thread Furkan KAMACI
Hi Erick;

Thanks for the explanation. You said:

You cannot transfer just the indexed form of a document from one
core to another, you have to re-index the doc. why do you think like that?

2013/4/16 Erick Erickson erickerick...@gmail.com

 Yes. Every node is really self-contained. When you send a doc to a
 cluster where each shard has a replica, the raw doc is sent to
 each node of that shard and indexed independently.

 About old docs, it's the same as Solr 3.6. Data associated with
 docs stays around in the index until it's merged away.

 You cannot transfer just the indexed form of a document from one
 core to another, you have to re-index the doc.

 Best
 Erick

 On Mon, Apr 15, 2013 at 7:46 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Hi Jack;
 
  I see that SolrCloud makes everything automated. When I use SolrCloud is
 it
  true that: there may be more than one computer responsible for indexing
 at
  any time?
 
  2013/4/15 Jack Krupansky j...@basetechnology.com
 
  There are no masters or slaves in SolrCloud - it's fully distributed.
 Some
  cluster nodes will be leaders (of the shard on that node) at a given
  point in time, but different nodes may be leaders at different points in
  time as they become elected.
 
  In a distributed cluster you would never want to store documents only on
  one node. Sure, you can do that by setting the replication factor to 1,
 but
  that defeats half the purpose for SolrCloud.
 
  Index transfer is automatic - SolrCloud supports fully distributed
 update.
 
  You might be getting confused with the old Master-Slave-Replication
  model that Solr had (and still has) which is distinct from SolrCloud.
 
  -- Jack Krupansky
 
  -Original Message- From: Furkan KAMACI
  Sent: Sunday, April 14, 2013 7:45 PM
  To: solr-user@lucene.apache.org
  Subject: Some Questions About Using Solr as Cloud
 
 
  I read wiki and reading SolrGuide of Lucidworks. However I want to clear
  something in my mind. Here are my questions:
 
  1) Does SolrCloud lets a multi master design (is there any document
 that I
  can read about it)?
  2) Let's assume that I use multiple cores i.e. core A and core B. Let's
  assume that there is a document just indexed at core B. If I send a
 search
  request to core A can I get result?
  3) When I use multi master design (if exists) can I transfer one
 master's
  index data into another (with its slaves or not)?
  4) When I use multi core design can I transfer one index data into
 another
  core or anywhere else?
 
  By the way thanks for the quick responses and kindness at mail list.
 



Re: SolrException parsing error

2013-04-16 Thread Marc des Garets
Did you find anything? I have the same problem but it's on update 
requests only.


The error comes from the solrj client indeed. It is solrj logging this 
error. There is nothing in solr itself and it does the update correctly. 
It's fairly small simple documents being updated.


On 04/15/2013 07:49 PM, Shawn Heisey wrote:

On 4/15/2013 9:47 AM, Luis Lebolo wrote:

Hi All,

I'm using Solr 4.1 and am receiving an 
org.apache.solr.common.SolrException
parsing error with root cause java.io.EOFException (see below for 
stack
trace). The query I'm performing is long/complex and I wonder if its 
size

is causing the issue?

I am querying via POST through SolrJ. The query (fq) itself is ~20,000
characters long in the form of:

fq=(mutation_prot_mt_1_1:2374 + OR + mutation_prot_mt_2_1:2374 + OR +
mutation_prot_mt_3_1:2374 + ...) + OR + (mutation_prot_mt_1_2:2374 + 
OR +
mutation_prot_mt_2_2:2374 + OR + mutation_prot_mt_3_2:2374+...) + OR 
+ ...


In short, I am querying for an ID throughout multiple dynamically 
created

fields (mutation_prot_mt_#_#).

Any thoughts on how to further debug?

Thanks in advance,
Luis

--

SEVERE: Servlet.service() for servlet [X] in context with path [/x] 
threw

exception [Request processing failed; nested exception is
org.apache.solr.common.SolrException: parsing error] with root cause
java.io.EOFException
at
org.apache.solr.common.util.FastInputStream.readByte(FastInputStream.java:193) 

at 
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:107)

  at
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41) 


at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:387) 


  at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) 


at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90) 


  at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)


I am guessing that this log is coming from your SolrJ client, but That 
is not completely clear, so is it SolrJ or Solr that is logging this 
error?  If it's SolrJ, do you see anything in the Solr log, and vice 
versa?


This looks to me like a network problem, where something is dropping 
the connection before transfer is complete.  It could be an unusual 
server-side config, OS problems, timeout settings in the SolrJ code, 
NIC drivers/firmware, bad cables, bad network hardware, etc.


Thanks,
Shawn





Re: first time with new keyword, solr take to much time to give the result

2013-04-16 Thread Montu v Boda
Hi

Thanks for info.

we did the same thing but no effect for first time.

what to do for first time query with new keyword?

how we can make the query faster for first time with new keyword?

say for ex if i try to search the text key word test first time then it
will take to much time to execute.

for second time the same keyword works faster...

Thanks  Regards
Montu v Boda



--
View this message in context: 
http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056276.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query Parser OR AND and NOT

2013-04-16 Thread Erick Erickson
The query language is NOT pure boolean. Hoss wrote this up:
http://searchhub.org/2011/12/28/why-not-and-or-and-not/

Best
Erick

On Mon, Apr 15, 2013 at 12:54 PM, Roman Chyla roman.ch...@gmail.com wrote:
 Oh, sorry, I have assumed lucene query parser. I think SOLR qp must be
 different then, because for me it works as expected (our qp parser is
 identical with lucene in the way it treats modifiers +/- and operators
 AND/OR/NOT -- NOT must be joining two clauses: a NOT b, the first cannot be
 negative, as Chris points out; the modifier however can be first - but it
 cannot be alone, there must be at least one positive clause). Otherwise,
 -field:x it is changed into field:x

 http://labs.adsabs.harvard.edu/adsabs/search/?q=%28*+-abstract%3Ablack%29+AND+abstract%3Ahole*db_key=ASTRONOMYsort_type=DATE
 http://labs.adsabs.harvard.edu/adsabs/search/?q=%28-abstract%3Ablack%29+AND+abstract%3Ahole*db_key=ASTRONOMYsort_type=DATE

 roman


 On Mon, Apr 15, 2013 at 12:25 PM, Peter Schütt newsgro...@pstt.de wrote:

 Hallo,


 Roman Chyla roman.ch...@gmail.com wrote in
 news:caen8dywjrl+e3b0hpc9ntlmjtrkasrqlvkzhkqxopmlhhfn...@mail.gmail.com:

  should be: -city:H* OR zip:30*
 
 -city:H* OR zip:30*   numFound:2520

 gives the same wrong result.


 Another Idea?

 Ciao
   Peter Schütt





how to display groups along with matching terms in solr auto-suggestion?

2013-04-16 Thread sharmila thapa
Hi,



I have used Terms for auto-suggestion. But it just list the terms that
matches terms.prefix from index , along with these term suggestions, I have
to display the product groups that matches with the input prefix. Is it
possible in solr auto-suggest? Somebody could please help me on this issue?


SolrCloud Leader Response Mechanism

2013-04-16 Thread Furkan KAMACI
When a leader responses for a query, does it says that: If I have the data
what I am looking for, I should build response with it, otherwise I should
find it anywhere. Because it may be long to search it?
or
does it says I only index the data, I will tell it to other guys to build
up the response query?


Function Query performance in combination with filters

2013-04-16 Thread Rogalon
Hi,
I am using pretty complex function queries to completely customize (not only
boost) the score of my result documents that are retrieved from an index of
approx 10e7 documents. To get to an acceptable level of performance I
combine my query with filters in the following way (very short example):

q=_val_:sum(termfreq(fieldname,`word`),termfreq(fieldname2,`word2`))fq=fieldname:`word`fq=fieldname2:`word2`

Although I always have (because of the filter) approx 50.000 docs in the
result set, the search times vary (depending on the actual query) between
100ms and 6000ms. 

My understanding was that the scoring function is only applied to the result
set from the filters. But based on what I am seeing it seems that a lot more
documents are actually put through the _val_ function.

Is there a way to fully compute the score of only the documents in the
result set? 

Thanks, Nico 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Function-Query-performance-in-combination-with-filters-tp4056283.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Usage of CloudSolrServer?

2013-04-16 Thread Upayavira
I cannot say that I have researched it, but I have always taken it to be
random.

Upayavira

On Tue, Apr 16, 2013, at 12:23 PM, Furkan KAMACI wrote:
 Thanks for your detailed explanation. However you said:
 
 It will then choose one of those hosts/cores for each shard, and send a
 request to them as a distributed search request. Is there any document
 that explains of distributed search? What is the criteria for it?
 
 
 2013/4/16 Upayavira u...@odoko.co.uk
 
  If you are accessing Solr from Java code, you will likely use the SolrJ
  client to do so. If your users are hitting Solr directly, you should
  think about whether this is wise - as well as providing them with direct
  search access, you are also providing them with the ability to delete
  your entire index with a single command.
 
  SolrJ isn't really a load balancer as such. When SolrJ is used to make a
  request against a collection, it will ask Zookeeper for the names of the
  shards that make up that collection, and for the hosts/cores that make
  up the set of replicas for those shards.
 
  It will then choose one of those hosts/cores for each shard, and send a
  request to them as a distributed search request.
 
  This has the advantage over traditional load balancing that if you bring
  up a new node, that node will register itself with ZooKeeper, and thus
  your SolrJ client(s) will know about it, without any intervention.
 
  Upayavira
 
  On Tue, Apr 16, 2013, at 08:36 AM, Furkan KAMACI wrote:
   Hi Shawn;
  
   I am sorry but what kind of Load Balancing is that? I mean does it check
   whether some leaders are using much CPU or RAM etc.? I think a problem
   may
   occur at such kind of scenario: if some of leaders getting more documents
   than other leaders (I don't know how it is decided that into which shard
   a
   document will go) than there will be a bottleneck on that leader?
  
  
   2013/4/15 Shawn Heisey s...@elyograg.org
  
On 4/15/2013 8:05 AM, Furkan KAMACI wrote:
   
My system is as follows: I crawl data with Nutch and send them into
SolrCloud. Users will search at Solr.
   
What is that CloudSolrServer, should I use it for load balancing or
  is it
something else different?
   
   
It appears that the Solr integration in Nutch currently does not use
CloudSolrServer.  There is an issue to add it.  The mutual dependency
  on
HttpClient is holding it up - Nutch uses HttpClient 3, SolrJ 4.x uses
HttpClient 4.
   
https://issues.apache.org/**jira/browse/NUTCH-1377
  https://issues.apache.org/jira/browse/NUTCH-1377
   
Until that is fixed, a load balancer would be required for full
  redundancy
for updates with SolrCloud.  You don't have to use a load balancer for
  it
to work, but if the Solr server that Nutch is using goes down, then
indexing will stop unless you reconfigure Nutch or bring the Solr
  server
back up.
   
Thanks,
Shawn
   
   
 


terms starting with multilingual character don't list on solr auto-suggestion list

2013-04-16 Thread sharmila thapa
Hi,

I have used /terms for solr auto-suggestion list. It works fine for English
words. But I have problem on multi-language index words, I have tested for
Russian language. If there is Russian charcter in between the word, then it
gets displayed on suggesstion list like if I type 'кар', it list карабином
, but if the russian character is the first/initial character like  Фляга
and I start type Фля, it does not list the word starting with this prefix
(here Фляга). somebody could please help me on this issue?


Re: SolrException parsing error

2013-04-16 Thread Luis Lebolo
Turns out I spoke too soon. I was *not* sending the query via POST.
Changing the method to POST solved the issue for me (maybe I was hitting a
GET limit somewhere?).

-Luis


On Tue, Apr 16, 2013 at 7:38 AM, Marc des Garets m...@ttux.net wrote:

 Did you find anything? I have the same problem but it's on update requests
 only.

 The error comes from the solrj client indeed. It is solrj logging this
 error. There is nothing in solr itself and it does the update correctly.
 It's fairly small simple documents being updated.


 On 04/15/2013 07:49 PM, Shawn Heisey wrote:

 On 4/15/2013 9:47 AM, Luis Lebolo wrote:

 Hi All,

 I'm using Solr 4.1 and am receiving an org.apache.solr.common.**
 SolrException
 parsing error with root cause java.io.EOFException (see below for stack
 trace). The query I'm performing is long/complex and I wonder if its size
 is causing the issue?

 I am querying via POST through SolrJ. The query (fq) itself is ~20,000
 characters long in the form of:

 fq=(mutation_prot_mt_1_1:2374 + OR + mutation_prot_mt_2_1:2374 + OR +
 mutation_prot_mt_3_1:2374 + ...) + OR + (mutation_prot_mt_1_2:2374 + OR +
 mutation_prot_mt_2_2:2374 + OR + mutation_prot_mt_3_2:2374+...) + OR +
 ...

 In short, I am querying for an ID throughout multiple dynamically created
 fields (mutation_prot_mt_#_#).

 Any thoughts on how to further debug?

 Thanks in advance,
 Luis

 --**

 SEVERE: Servlet.service() for servlet [X] in context with path [/x] threw
 exception [Request processing failed; nested exception is
 org.apache.solr.common.**SolrException: parsing error] with root cause
 java.io.EOFException
 at
 org.apache.solr.common.util.**FastInputStream.readByte(**FastInputStream.java:193)

 at org.apache.solr.common.util.**JavaBinCodec.unmarshal(**
 JavaBinCodec.java:107)
   at
 org.apache.solr.client.solrj.**impl.BinaryResponseParser.**
 processResponse(**BinaryResponseParser.java:41)
 at
 org.apache.solr.client.solrj.**impl.HttpSolrServer.request(**HttpSolrServer.java:387)

   at
 org.apache.solr.client.solrj.**impl.HttpSolrServer.request(**HttpSolrServer.java:181)

 at
 org.apache.solr.client.solrj.**request.QueryRequest.process(**QueryRequest.java:90)

   at org.apache.solr.client.solrj.**SolrServer.query(SolrServer.**
 java:301)


 I am guessing that this log is coming from your SolrJ client, but That is
 not completely clear, so is it SolrJ or Solr that is logging this error?
  If it's SolrJ, do you see anything in the Solr log, and vice versa?

 This looks to me like a network problem, where something is dropping the
 connection before transfer is complete.  It could be an unusual server-side
 config, OS problems, timeout settings in the SolrJ code, NIC
 drivers/firmware, bad cables, bad network hardware, etc.

 Thanks,
 Shawn





Solr 4.2 Startup Detects Corrupt Log And is Really Slow to Start

2013-04-16 Thread Umesh Prasad
Hi,
We are migrating to Solr 4.2 from Solr 3.6 and Solr 4.2 is throwing
Exception on Restart. What is More, it take a hell lot of Time ( More than
one hour to get Up and Running)


THE exception After Restart ...
=
Apr 16, 2013 4:47:31 PM org.apache.solr.update.UpdateLog$RecentUpdates
update
WARNING: Unexpected log entry or corrupt log.  Entry=11
java.lang.ClassCastException: java.lang.Long cannot be cast to
java.util.List
at
org.apache.solr.update.UpdateLog$RecentUpdates.update(UpdateLog.java:929)
at
org.apache.solr.update.UpdateLog$RecentUpdates.access$000(UpdateLog.java:863)
at
org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:1014)
at org.apache.solr.update.UpdateLog.init(UpdateLog.java:253)
at
org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:82)
at
org.apache.solr.update.UpdateHandler.init(UpdateHandler.java:137)
at
org.apache.solr.update.UpdateHandler.init(UpdateHandler.java:123)
at
org.apache.solr.update.DirectUpdateHandler2.init(DirectUpdateHandler2.java:95)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:525)
at
org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:596)
at org.apache.solr.core.SolrCore.init(SolrCore.java:806)
at org.apache.solr.core.SolrCore.init(SolrCore.java:618)
at
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)
at
org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Apr 16, 2013 4:47:31 PM org.apache.solr.update.UpdateLog$RecentUpdates
update
WARNING: Unexpected log entry or corrupt log.  Entry=8120?785879438123
java.lang.ClassCastException: java.lang.String cannot be cast to
java.util.List
at
org.apache.solr.update.UpdateLog$RecentUpdates.update(UpdateLog.java:929)
at
org.apache.solr.update.UpdateLog$RecentUpdates.access$000(UpdateLog.java:863)
at
org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:1014)
at org.apache.solr.update.UpdateLog.init(UpdateLog.java:253)
at
org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:82)
at
org.apache.solr.update.UpdateHandler.init(UpdateHandler.java:137)
at
org.apache.solr.update.UpdateHandler.init(UpdateHandler.java:123)
at
org.apache.solr.update.DirectUpdateHandler2.init(DirectUpdateHandler2.java:95)

=

And Once Restarted, I start getting replication errors


Apr 16, 2013 5:20:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
SEVERE: Master at: http://localhost:25280/solr/accessories is not
available. Index fetch failed. Exception:
org.apache.solr.client.solrj.SolrServerException: IOException occured when
talking to server at: http://localhost:25280/solr/accessories
Apr 16, 2013 5:20:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
SEVERE: Master at: http://localhost:25280/solr/newQueries is not available.
Index fetch failed. Exception:
org.apache.solr.client.solrj.SolrServerException: IOException occured when
talking to server at: http://localhost:25280/solr/newQueries
Apr 16, 2013 5:21:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
SEVERE: Master at: http://localhost:25280/solr/phcare is not available.
Index fetch failed. Exception:
org.apache.solr.client.solrj.SolrServerException: Timeout occured while
waiting response from server at: http://localhost:25280/solr/phcare
Apr 16, 2013 5:21:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
SEVERE: Master at: http://localhost:25280/solr/audioplayersCore is not
available. Index fetch failed. Exception:
org.apache.solr.client.solrj.SolrServerException: Timeout occured while
waiting response from server at:
http://localhost:25280/solr/audioplayersCore
Apr 16, 2013 5:21:24 PM 

Re: first time with new keyword, solr take to much time to give the result

2013-04-16 Thread Duncan Irvine
Are you actually trying to return 10,000 records, or is that the number of
hits, and you're only retrieving the top 10?

Cheers,
  Duncan.


On 16 April 2013 12:39, Montu v Boda montu.b...@highqsolutions.com wrote:

 Hi

 Thanks for info.

 we did the same thing but no effect for first time.

 what to do for first time query with new keyword?

 how we can make the query faster for first time with new keyword?

 say for ex if i try to search the text key word test first time then it
 will take to much time to execute.

 for second time the same keyword works faster...

 Thanks  Regards
 Montu v Boda



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056276.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Don't let your mind wander -- it's too little to be let out alone.


Re: Function Query performance in combination with filters

2013-04-16 Thread Yonik Seeley
On Tue, Apr 16, 2013 at 7:51 AM, Rogalon nico.beche...@me.com wrote:
 Hi,
 I am using pretty complex function queries to completely customize (not only
 boost) the score of my result documents that are retrieved from an index of
 approx 10e7 documents. To get to an acceptable level of performance I
 combine my query with filters in the following way (very short example):

 q=_val_:sum(termfreq(fieldname,`word`),termfreq(fieldname2,`word2`))fq=fieldname:`word`fq=fieldname2:`word2`

 Although I always have (because of the filter) approx 50.000 docs in the
 result set, the search times vary (depending on the actual query) between
 100ms and 6000ms.

 My understanding was that the scoring function is only applied to the result
 set from the filters.

That should be the case.

 But based on what I am seeing it seems that a lot more
 documents are actually put through the _val_ function.

How did you verify this?

-Yonik
http://lucidworks.com


Re: terms starting with multilingual character don't list on solr auto-suggestion list

2013-04-16 Thread Jack Krupansky
Can you share your auto-complete/suggestor configuration parameters? 
Including the search component.


It sounds as if there is a field type with an analyzer that is mapping 
characters.


-- Jack Krupansky

-Original Message- 
From: sharmila thapa

Sent: Tuesday, April 16, 2013 7:54 AM
To: solr-user@lucene.apache.org
Subject: terms starting with multilingual character don't list on solr 
auto-suggestion list


Hi,

I have used /terms for solr auto-suggestion list. It works fine for English
words. But I have problem on multi-language index words, I have tested for
Russian language. If there is Russian charcter in between the word, then it
gets displayed on suggesstion list like if I type 'кар', it list карабином
, but if the russian character is the first/initial character like  Фляга
and I start type Фля, it does not list the word starting with this prefix
(here Фляга). somebody could please help me on this issue? 



Re: SolR InvalidTokenOffsetsException with Highlighter and Synonyms

2013-04-16 Thread juancesarvillalba


Hi,

At moment, I am not considering store synonyms in the index, although is
something that I have to do some time.

Is strange that something common like multi-word synonyms have a bug
with highligting but I couldn't find any solution.

Thanks for your help.

 

 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolR-InvalidTokenOffsetsException-with-Highlighter-and-Synonyms-tp4053988p4056305.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: first time with new keyword, solr take to much time to give the result

2013-04-16 Thread Montu v Boda
hi

we are trying to return 10,000 rows

it is necessary to return 1 rows because from that 1, we are pick
only top 100 record based on the user permission and permission is stored in
database not on solr.

and if we try to return 100 rows then it may possible that from the 100
rows, user does not have permission of any document. user will get blank
search result.

Thanks  Regards
Montu v Boda



--
View this message in context: 
http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056306.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: first time with new keyword, solr take to much time to give the result

2013-04-16 Thread Ahmet Arslan
Hi Montu,

Regarding permissions, you may find this solution more elegant:

http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/

http://hokiesuns.blogspot.com/2012/11/using-solrs-postfiltering-to-collect.html


--- On Tue, 4/16/13, Montu v Boda montu.b...@highqsolutions.com wrote:

 From: Montu v Boda montu.b...@highqsolutions.com
 Subject: Re: first time with new keyword, solr take to much time to give the 
 result
 To: solr-user@lucene.apache.org
 Date: Tuesday, April 16, 2013, 4:13 PM
 hi
 
 we are trying to return 10,000 rows
 
 it is necessary to return 1 rows because from that
 1, we are pick
 only top 100 record based on the user permission and
 permission is stored in
 database not on solr.
 
 and if we try to return 100 rows then it may possible that
 from the 100
 rows, user does not have permission of any document. user
 will get blank
 search result.
 
 Thanks  Regards
 Montu v Boda
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056306.html
 Sent from the Solr - User mailing list archive at
 Nabble.com.
 


Re: first time with new keyword, solr take to much time to give the result

2013-04-16 Thread Raymond Wiker
On Tue, Apr 16, 2013 at 3:13 PM, Montu v Boda montu.b...@highqsolutions.com
 wrote:

 hi

 we are trying to return 10,000 rows

 it is necessary to return 1 rows because from that 1, we are pick
 only top 100 record based on the user permission and permission is stored
 in
 database not on solr.

 and if we try to return 100 rows then it may possible that from the 100
 rows, user does not have permission of any document. user will get blank
 search result.


You may have some other options:

1) Add the access rights to SOLR, and have a front-end that takes a user id
and expands it into a set of access rights (groups, mainly) for the user.
This is then added as a filter to the queries.

2) Run the query with a smaller number of hits requested, and use the
start parameter to fetch more hits (if necessary).

Also, you may want to restrict the fields returned by your query, to the
minimal set required.


Re: Function Query performance in combination with filters

2013-04-16 Thread Rogalon


Am 16. April 2013 um 14:46 schrieb Yonik Seeley-4 [via Lucene] 
ml-node+s472066n4056299...@n3.nabble.com:

 On Tue, Apr 16, 2013 at 7:51 AM, Rogalon [hidden email] wrote:

  Hi,
  I am using pretty complex function queries to completely customize (not only
  boost) the score of my result documents that are retrieved from an index of
  approx 10e7 documents. To get to an acceptable level of performance I
  combine my query with filters in the following way (very short example):
 
  q=_val_:sum(termfreq(fieldname,`word`),termfreq(fieldname2,`word2`))fq=fieldname:`word`fq=fieldname2:`word2`
 
  Although I always have (because of the filter) approx 50.000 docs in the
  result set, the search times vary (depending on the actual query) between
  100ms and 6000ms.
 
  My understanding was that the scoring function is only applied to the result
  set from the filters.

 That should be the case.

  But based on what I am seeing it seems that a lot more
  documents are actually put through the _val_ function.

 How did you verify this? 
Thanks for taking a look at my problem.

For now - I verified just by taking a look at the query times and doing some 
simple experiments.

If I am not using the function query at all (q=*:*fq=...), the approx. 50.000 
results from the filters are always returned within 200-300ms. This is pretty 
stable. If I have a (test) index of 50.000 documents (instead of the the 10e7 
index) only and I pass every document through the _val_ query (without any 
filters), this takes about 150ms which in my case would be ok.

Applying no filters to the function query on the 10e7 index leads to search 
times at about 6000ms which is too much.

But if I use the filters as stated above I get returned 50.000 documents but 
the query times suddenly start to vary between 100ms and 6000ms. Some of my 
filters might actually be on stop words which appear in every other document in 
the index but that seems to really hurt performance only if the function query 
is used.

 Greetings, Nico 


 -Yonik
 http://lucidworks.com


 If you reply to this email, your message will be added to the discussion 
 below:
 http://lucene.472066.n3.nabble.com/Function-Query-performance-in-combination-with-filters-tp4056283p4056299.html
  
 To unsubscribe from Function Query performance in combination with filters, 
 click here.
 NAML
  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Function-Query-performance-in-combination-with-filters-tp4056283p4056312.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: first time with new keyword, solr take to much time to give the result

2013-04-16 Thread Jack Krupansky

Why not just add a filter query for user permissions?

-- Jack Krupansky

-Original Message- 
From: Montu v Boda

Sent: Tuesday, April 16, 2013 9:13 AM
To: solr-user@lucene.apache.org
Subject: Re: first time with new keyword, solr take to much time to give the 
result


hi

we are trying to return 10,000 rows

it is necessary to return 1 rows because from that 1, we are pick
only top 100 record based on the user permission and permission is stored in
database not on solr.

and if we try to return 100 rows then it may possible that from the 100
rows, user does not have permission of any document. user will get blank
search result.

Thanks  Regards
Montu v Boda



--
View this message in context: 
http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056306.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: SolrException parsing error

2013-04-16 Thread Marc Des Garets
Problem solved for me as well. The client is running in tomcat and the
connector had compression=true. I removed it and now it seems to work
fine.

On 04/16/2013 02:28 PM, Luis Lebolo wrote:
 Turns out I spoke too soon. I was *not* sending the query via POST.
 Changing the method to POST solved the issue for me (maybe I was hitting a
 GET limit somewhere?).

 -Luis


 On Tue, Apr 16, 2013 at 7:38 AM, Marc des Garets m...@ttux.net wrote:

 Did you find anything? I have the same problem but it's on update requests
 only.

 The error comes from the solrj client indeed. It is solrj logging this
 error. There is nothing in solr itself and it does the update correctly.
 It's fairly small simple documents being updated.


 On 04/15/2013 07:49 PM, Shawn Heisey wrote:

 On 4/15/2013 9:47 AM, Luis Lebolo wrote:

 Hi All,

 I'm using Solr 4.1 and am receiving an org.apache.solr.common.**
 SolrException
 parsing error with root cause java.io.EOFException (see below for stack
 trace). The query I'm performing is long/complex and I wonder if its size
 is causing the issue?

 I am querying via POST through SolrJ. The query (fq) itself is ~20,000
 characters long in the form of:

 fq=(mutation_prot_mt_1_1:2374 + OR + mutation_prot_mt_2_1:2374 + OR +
 mutation_prot_mt_3_1:2374 + ...) + OR + (mutation_prot_mt_1_2:2374 + OR +
 mutation_prot_mt_2_2:2374 + OR + mutation_prot_mt_3_2:2374+...) + OR +
 ...

 In short, I am querying for an ID throughout multiple dynamically created
 fields (mutation_prot_mt_#_#).

 Any thoughts on how to further debug?

 Thanks in advance,
 Luis

 --**

 SEVERE: Servlet.service() for servlet [X] in context with path [/x] threw
 exception [Request processing failed; nested exception is
 org.apache.solr.common.**SolrException: parsing error] with root cause
 java.io.EOFException
 at
 org.apache.solr.common.util.**FastInputStream.readByte(**FastInputStream.java:193)

 at org.apache.solr.common.util.**JavaBinCodec.unmarshal(**
 JavaBinCodec.java:107)
   at
 org.apache.solr.client.solrj.**impl.BinaryResponseParser.**
 processResponse(**BinaryResponseParser.java:41)
 at
 org.apache.solr.client.solrj.**impl.HttpSolrServer.request(**HttpSolrServer.java:387)

   at
 org.apache.solr.client.solrj.**impl.HttpSolrServer.request(**HttpSolrServer.java:181)

 at
 org.apache.solr.client.solrj.**request.QueryRequest.process(**QueryRequest.java:90)

   at org.apache.solr.client.solrj.**SolrServer.query(SolrServer.**
 java:301)

 I am guessing that this log is coming from your SolrJ client, but That is
 not completely clear, so is it SolrJ or Solr that is logging this error?
  If it's SolrJ, do you see anything in the Solr log, and vice versa?

 This looks to me like a network problem, where something is dropping the
 connection before transfer is complete.  It could be an unusual server-side
 config, OS problems, timeout settings in the SolrJ code, NIC
 drivers/firmware, bad cables, bad network hardware, etc.

 Thanks,
 Shawn




This transmission is strictly confidential, possibly legally privileged, and 
intended solely for the addressee. 
Any views or opinions expressed within it are those of the author and do not 
necessarily represent those of 
192.com Ltd or any of its subsidiary companies. If you are not the intended 
recipient then you must 
not disclose, copy or take any action in reliance of this transmission. If you 
have received this 
transmission in error, please notify the sender as soon as possible. No 
employee or agent is authorised 
to conclude any binding agreement on behalf 192.com Ltd with another party by 
email without express written 
confirmation by an authorised employee of the company. http://www.192.com (Tel: 
08000 192 192). 
192.com Ltd is incorporated in England and Wales, company number 07180348, VAT 
No. GB 103226273.

Same Shards at Different Machines

2013-04-16 Thread Furkan KAMACI
Is it possible to use same shards at different machines at SolrCloud?


Re: first time with new keyword, solr take to much time to give the result

2013-04-16 Thread Montu v Boda
Hi

problem is that the permission is frequently update in our system so that we
have to update the index in the same manner other wise it will give wrong
result.

in that case i think the cache will get effect and the performance may be
reduced.


Thanks  Regards
Montu v Boda



--
View this message in context: 
http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056321.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: first time with new keyword, solr take to much time to give the result

2013-04-16 Thread Montu v Boda
Hi

problem is that the permission is frequently update in our system so that we
have to update the index in the same manner other wise it will give wrong
result.

in that case i think the cache will get effect and the performance may be
reduced.


Thanks  Regards
Montu v Boda 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056322.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.2.1 sorting by distance to polygon centre.

2013-04-16 Thread Smiley, David W.
Guido,

The field type solr.SpatialRecursivePrefixTreeFieldType can only
participate in distance reporting for indexed points, not other shapes.
In fact, I recommend not attempting to get the distance if the field isn't
purely indexed points, as it may get confused if it seems some small
shapes.  For your use-case, you should index an additional
solr.SpatialRecursivePrefixTreeFieldType field just for the points.  You
could do this external to Solr, or you could write a Solr
UpdateRequestProcessor that parses the shape in order to then call
getCenter(), and put those points in the other field.

~ David

On 4/16/13 7:23 AM, Guido Medina guido.med...@temetra.com wrote:

Hi,

I got everything in place, my polygons are indexing properly, I played a
bit with LSP which helped me a lot, now, I have JTS 1.13 inside
solr.war; here is my challenge:

I have big polygon (A) which contains smaller polygons (B and C), B and
C have some intersection, so if I search for a coordinate inside the 3,
I would like to sort by the distance to the centre of the polygons that
match the criteria.

As example, let's say dot B is on the centre of B, dot C is at the
centre of C and dot A is at the intersection of B and C which happens to
be the centre of A, so for dot A should be polygon A first and so on. I
could compute with the distances using the result but since Solr is
doing a heavy load already, why not just include the sort in it.

Here is my field type definition:

 !-- Spatial field type --
 fieldType name=location_rpt
class=solr.SpatialRecursivePrefixTreeFieldType
spatialContextFactory=com.spatial4j.core.context.jts.JtsSpatialContextFac
tory
units=degrees/


Field definition:

 !-- JTS spatial polygon field --
 field name=geopolygon type=location_rpt indexed=true
stored=false required=false multiValued=true/


I'm using the Solr admin UI first to shape my query and then moving to
our web app which uses solrj, here is the XML form of my result which
includes the query I'm making, which scores all distances to 1.0 (Not
what I want):

|?xml version=1.0 encoding=UTF-8?
response

lst  name=responseHeader
   int  name=status0/int
   int  name=QTime9/int
   lst  name=params
 str  name=flid,score/str
 str  name=sortscore asc/str
 str  name=indenttrue/str
 str  name=q*:*/str
 str  name=_136620720/str
 str  name=wtxml/str
 str  name=fq{!score=distance}geopolygon:Intersects(-6.271906
53.379284)/str
   /lst
/lst
result  name=response  numFound=3  start=0  maxScore=1.0
   doc
 str  name=iduid13972/str
 float  name=score1.0/float/doc
   doc
 str  name=iduid13979/str
 float  name=score1.0/float/doc
   doc
 str  name=iduid13974/str
 float  name=score1.0/float/doc
/result
/response|


Thanks for all responses,

Guido.



Re: using maven to deploy solr on tomcat

2013-04-16 Thread Adeel Qureshi
the problem is i need to deploy it on servers where i dont know what the
absolute path will be .. basically my goal is to load solr with a different
set of configuration files based on the environment its in. Is there a a
better different way to do this


On Mon, Apr 15, 2013 at 11:29 PM, Shawn Heisey s...@elyograg.org wrote:

 On 4/15/2013 2:33 PM, Adeel Qureshi wrote:
  Environment name=solr/home override=true type=java.lang.String
  value=src/main/resources/solr-dev/
 
  but this leads to absolute path of
 
  INFO: Using JNDI solr.home: src/main/resources/solr-dev
  INFO: looking for solr.xml:
  C:\springsource\sts-2.8.1.RELEASE\src\main\resources\solr-dev\solr.xml

 If you use a relative path for the solr home as you have done, it will
 be relative to the current working directory.  The CWD can vary
 depending on how tomcat gets started.  In your case, the CWD seems to be
 C:\springsource\sts-2.8.1.RELEASE.  If you change the CWD in the
 tomcat startup, you will probably have to set the TOMCAT_HOME
 environment variable for tomcat to start correctly, so I don't recommend
 doing that.

 It is usually best to choose an absolute path for the solr home.  Solr
 will find solr.xml there, which it will use to find the rest of your
 config(s).  All paths in solr.xml and other solr config files can be
 relative.

 What you are seeing as an absolute path is likely the current working
 directory plus your solr home setting.

 Thanks,
 Shawn




Re: Solr 4.2.1 sorting by distance to polygon centre.

2013-04-16 Thread Guido Medina

David,

I have been following your stackoverflow posts, I understand what you 
say, we decided to change the criteria and index an extra field (close 
to your suggestion), so the sorting will happen now by polygon area desc 
(Which induced another problem, calculation of polygon area on a 
sphere), finally I got to the point of testing, also due to what you are 
saying, is not a good idea to overload more than just the bare use of 
points (Intersects) inside polygon to get the the list that matches 
specific criteria.


To resume, calculate the area of the polygon, again, for curved 
polygons is not so obvious, do the standard solr search and sort by that 
extra field, I guess solr overhead will be minimal in that case.


The real use case is for utility industry, let's say users have areas 
where they get meter reads, readings are scheduled and assigned to the 
users that contains such meter GPS location, some users might cover big 
areas and possible to have smaller areas for other users inside such big 
areas, so we changed the distance to center for area covered by, seemed 
simpler and easier.


Thanks your response,

Guido.

On 16/04/13 15:06, Smiley, David W. wrote:

Guido,

The field type solr.SpatialRecursivePrefixTreeFieldType can only
participate in distance reporting for indexed points, not other shapes.
In fact, I recommend not attempting to get the distance if the field isn't
purely indexed points, as it may get confused if it seems some small
shapes.  For your use-case, you should index an additional
solr.SpatialRecursivePrefixTreeFieldType field just for the points.  You
could do this external to Solr, or you could write a Solr
UpdateRequestProcessor that parses the shape in order to then call
getCenter(), and put those points in the other field.

~ David

On 4/16/13 7:23 AM, Guido Medina guido.med...@temetra.com wrote:


Hi,

I got everything in place, my polygons are indexing properly, I played a
bit with LSP which helped me a lot, now, I have JTS 1.13 inside
solr.war; here is my challenge:

I have big polygon (A) which contains smaller polygons (B and C), B and
C have some intersection, so if I search for a coordinate inside the 3,
I would like to sort by the distance to the centre of the polygons that
match the criteria.

As example, let's say dot B is on the centre of B, dot C is at the
centre of C and dot A is at the intersection of B and C which happens to
be the centre of A, so for dot A should be polygon A first and so on. I
could compute with the distances using the result but since Solr is
doing a heavy load already, why not just include the sort in it.

Here is my field type definition:

 !-- Spatial field type --
 fieldType name=location_rpt
class=solr.SpatialRecursivePrefixTreeFieldType
spatialContextFactory=com.spatial4j.core.context.jts.JtsSpatialContextFac
tory
units=degrees/


Field definition:

 !-- JTS spatial polygon field --
 field name=geopolygon type=location_rpt indexed=true
stored=false required=false multiValued=true/


I'm using the Solr admin UI first to shape my query and then moving to
our web app which uses solrj, here is the XML form of my result which
includes the query I'm making, which scores all distances to 1.0 (Not
what I want):

|?xml version=1.0 encoding=UTF-8?
response

lst  name=responseHeader
   int  name=status0/int
   int  name=QTime9/int
   lst  name=params
 str  name=flid,score/str
 str  name=sortscore asc/str
 str  name=indenttrue/str
 str  name=q*:*/str
 str  name=_136620720/str
 str  name=wtxml/str
 str  name=fq{!score=distance}geopolygon:Intersects(-6.271906
53.379284)/str
   /lst
/lst
result  name=response  numFound=3  start=0  maxScore=1.0
   doc
 str  name=iduid13972/str
 float  name=score1.0/float/doc
   doc
 str  name=iduid13979/str
 float  name=score1.0/float/doc
   doc
 str  name=iduid13974/str
 float  name=score1.0/float/doc
/result
/response|


Thanks for all responses,

Guido.




Re: Dynamic data model design questions

2013-04-16 Thread Marko Asplund
Shawn Heisey wrote:

 Solr does have some *very* limited capability for doing joins between
indexes, but generally speaking, you need to flatten the data.

thanks!

So, using a dynamic schema I'd flatten the following JSON object graph

{
  'id':'xyz123',
  'obj1': {
'child1': {
  'prop1': ['val1', 'val2', 'val3']
  'prop2': 123
 }
 'prop3': 'val4'
  },
  'obj2': {
'child2': {
  'prop3': true
}
  }
}

to a Solr document something like this?

{
'id':'xyz123',
'obj1/child1/prop1_ss': ['val1', 'val2', 'val3'],
'obj1/child1/prop2_i': 123,
'obj1/prop3_s': 'val4',
'obj2/child2/prop3_b': true
}

I'm using Java, so I'd probably push docs for indexing to Solr and do the
searches using SolrJ, right?


 Solr's ability to change your data after receiving it is fairly limited.
The schema has some ability in this regard for indexed values,  but the
stored data is 100% verbatim as Solr receives it. If you will be using the
dataimport handler, it does have some transform  capability before sending
to Solr. Most of the time, the rule of thumb is that changing the data on
the Solr side will require
 contrib/custom plugins, so it may be easier to do it before Solr receives
it.

The data import handler is a Solr server side feature and not a client side?
Does Solr or SolrJ have any support for doing transformations on the client
side?
Doing the above transformation should be fairly straight forward, so it
could be also done by code on the client side.

marko


JavaScript transform switch statement during Data Import

2013-04-16 Thread paulblyth
Hello - I'm trying to add a switch statement into a JavaScript function that
we use during an import; it's to replace an if else block that is becoming
increasingly large.

Bizarrely, the switch block is ignore entirely, and it doesn't have any
effect whatsoever.

Our version info:
Solr Specification Version: 3.4.0.2011.09.09.09.06.17
Solr Implementation Version: 3.4.0 1167142 - mike - 2011-09-09 09:06:17
Lucene Specification Version: 3.4.0
Lucene Implementation Version: 3.4.0 1167142 - mike - 2011-09-09 09:02:09


I've tried searching, but can't find anything to suggest this is a known
bugs. Has anyone come across this before?

Paul



--
View this message in context: 
http://lucene.472066.n3.nabble.com/JavaScript-transform-switch-statement-during-Data-Import-tp4056340.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: first time with new keyword, solr take to much time to give the result

2013-04-16 Thread Otis Gospodnetic
Hi,

Have you considered ManifoldCF?

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm/index.html




On Tue, Apr 16, 2013 at 10:02 AM, Montu v Boda
montu.b...@highqsolutions.com wrote:
 Hi

 problem is that the permission is frequently update in our system so that we
 have to update the index in the same manner other wise it will give wrong
 result.

 in that case i think the cache will get effect and the performance may be
 reduced.


 Thanks  Regards
 Montu v Boda



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056322.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.2.1 sorting by distance to polygon centre.

2013-04-16 Thread Guido Medina

David,

  I just peak it at github, the method will estimate well for our 
purpose, but depends on JTS which we included in our Solr server only, 
but we don't want LGPL libraries (v3) in our main project, kind of a 
show stopper, I understand is needed for spatial4j, Lucene and Solr in 
general, so we have no issues keeping it at the Solr server. But can't 
put it on main web project for licensing issues. I know JTS is a great 
set of needed functions for spatial projects. Shame I can't use it 
directly, like I had to develop some convex hull by myself.


Guido.

On 16/04/13 16:14, Smiley, David W. wrote:


On 4/16/13 10:57 AM, Guido Medina guido.med...@temetra.com wrote:


David,

I have been following your stackoverflow posts, I understand what you
say, we decided to change the criteria and index an extra field (close
to your suggestion), so the sorting will happen now by polygon area desc
(Which induced another problem, calculation of polygon area on a
sphere), finally I got to the point of testing, also due to what you are
saying, is not a good idea to overload more than just the bare use of
points (Intersects) inside polygon to get the the list that matches
specific criteria.

Glad you've been following what I've been up to and hopefully haven't
gotten too confused :-).  I welcome all feedback.  BTW I'll be doing a 75
minute spatial deep dive session at the Lucene/Solr Revolution
conference in San Diego May 1st  2nd.  Eventually the slides will be
posted and hopefully the audio track.


To resume, calculate the area of the polygon, again, for curved
polygons is not so obvious, do the standard solr search and sort by that
extra field, I guess solr overhead will be minimal in that case.

FYI Spatial4j will do a decent job estimating it by calculating the
geospatial area of the bounding box of a polygon and using the filled %
ratio of the polygons 2D area to its Bbox.  This logic is in Spatial4j's
JtsGeometry.getArea().


So are you storing the area and sorting by it then?  (overhead is
extremely minimal, this would just be an integer sort)


The real use case is for utility industry, let's say users have areas
where they get meter reads, readings are scheduled and assigned to the
users that contains such meter GPS location, some users might cover big
areas and possible to have smaller areas for other users inside such big
areas, so we changed the distance to center for area covered by, seemed
simpler and easier.

You might want to consider doing both -- sort by a function query that
combines both factors in some clever way.

~ David





Re: using maven to deploy solr on tomcat

2013-04-16 Thread Shawn Heisey

On 4/16/2013 8:47 AM, Adeel Qureshi wrote:

the problem is i need to deploy it on servers where i dont know what the
absolute path will be .. basically my goal is to load solr with a different
set of configuration files based on the environment its in. Is there a a
better different way to do this


If you have zero control over the target machine, then you might have to 
live with your solr home being dictated by the location of the servlet 
container - tomcat in this case.  If you change the tomcat startup 
script to use a different CWD and set TOMCAT_HOME so tomcat works, that 
might be the solution - but I don't know what effect that might have on 
spring or other applications.


I see two real options other than changing the startup script:

1) Go with an absolute path like C:\main\resources\solr-dev or perhaps 
/main/resources/solr-dev if you also don't know the OS platform.  Tell 
the server owners that you will require a specific directory location 
for the Solr data.


2) Utilize .. which gives you something like 
../../main/resources/solr-dev for your solr home.


Thanks,
Shawn



Re: updateLog in Solr 4.2

2013-04-16 Thread Chris Hostetter
: 
: If i disable update log in solr 4.2 then i get the following exception
: SEVERE: :java.lang.NullPointerException
: at
: 
org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:190)

Hmmm.. if you don't have updateLog and you run in SolrCloud mode, solr 
should have given you a clean, clear error that updateLog is required in 
cloud mode.

can you please open a Bug in Jira and attach your config files so we can 
try to figure out why this isn't happening?


-Hoss


Re: Dynamic data model design questions

2013-04-16 Thread Jack Krupansky

'obj1/child1/prop1_ss'

Try to stick to names that follow Java naming conventions: letter or 
underscore followed by letters, digits, and underscores. There are place in 
Solr which have limited rules for names because they support additional 
syntax.


In this case, replace your slashes with underscores.

In general, Solr is much more friendly towards static data models. Yes, you 
can use dynamic fields, but use them in moderation. The more heavily you 
lean on them, the more likely that you will eventually become unhappy with 
Solr.


How many fields are we talking about here?

The trick with Solr is not to brute-force flatten your data model (as you 
appear to be doing), but to REDESIGN your data model so that it is more 
amenable to a flat data model, and takes advantage of Solr's features. You 
can use multiple collections for different types of data. And you can 
simulate joins across tables by doing a sequence of queries (although it 
would be nice to have a SolrJ client-side method to do that in one API 
call.)


-- Jack Krupansky

-Original Message- 
From: Marko Asplund

Sent: Tuesday, April 16, 2013 11:17 AM
To: solr-user
Subject: Re: Dynamic data model design questions

Shawn Heisey wrote:


Solr does have some *very* limited capability for doing joins between

indexes, but generally speaking, you need to flatten the data.

thanks!

So, using a dynamic schema I'd flatten the following JSON object graph

{
 'id':'xyz123',
 'obj1': {
   'child1': {
 'prop1': ['val1', 'val2', 'val3']
 'prop2': 123
}
'prop3': 'val4'
 },
 'obj2': {
   'child2': {
 'prop3': true
   }
 }
}

to a Solr document something like this?

{
'id':'xyz123',
'obj1/child1/prop1_ss': ['val1', 'val2', 'val3'],
'obj1/child1/prop2_i': 123,
'obj1/prop3_s': 'val4',
'obj2/child2/prop3_b': true
}

I'm using Java, so I'd probably push docs for indexing to Solr and do the
searches using SolrJ, right?



Solr's ability to change your data after receiving it is fairly limited.

The schema has some ability in this regard for indexed values,  but the
stored data is 100% verbatim as Solr receives it. If you will be using the
dataimport handler, it does have some transform  capability before sending
to Solr. Most of the time, the rule of thumb is that changing the data on
the Solr side will require

contrib/custom plugins, so it may be easier to do it before Solr receives

it.

The data import handler is a Solr server side feature and not a client side?
Does Solr or SolrJ have any support for doing transformations on the client
side?
Doing the above transformation should be fairly straight forward, so it
could be also done by code on the client side.

marko 



Re: Dynamic data model design questions

2013-04-16 Thread Shawn Heisey

On 4/16/2013 9:17 AM, Marko Asplund wrote:

Shawn Heisey wrote:
So, using a dynamic schema I'd flatten the following JSON object graph

{
   'id':'xyz123',
   'obj1': {
 'child1': {
   'prop1': ['val1', 'val2', 'val3']
   'prop2': 123
  }
  'prop3': 'val4'
   },
   'obj2': {
 'child2': {
   'prop3': true
 }
   }
}

to a Solr document something like this?

{
'id':'xyz123',
'obj1/child1/prop1_ss': ['val1', 'val2', 'val3'],
'obj1/child1/prop2_i': 123,
'obj1/prop3_s': 'val4',
'obj2/child2/prop3_b': true
}


How you flatten the data is up to you. You have to examine the data and 
how you want to use it in order to keep the number of fields to a 
manageable level but retain the flexibility you need.  Side note: I 
would not use anything in a field name other than ASCII alphanumeric and 
underscore characters.  Using special characters (like a slash) has been 
known to cause problems with some Solr features.  Because Solr uses 
HTTP, there are also potential URL escaping issues.


Within a single index, Solr uses a flat model, like a single database 
table with no relational capability.  With two indexes, there is the 
limited join feature, but I am not familiar with how it works.



I'm using Java, so I'd probably push docs for indexing to Solr and do the
searches using SolrJ, right?


That would be the most sensible approach.  The SolrJ API is much more 
advanced than the APIs for other languages.  This is because it is 
actually part of the Solr codebase and used by Solr internally.



The data import handler is a Solr server side feature and not a client side?
Does Solr or SolrJ have any support for doing transformations on the client
side?
Doing the above transformation should be fairly straight forward, so it
could be also done by code on the client side.


With SolrJ, you can do anything, because you write the code.  You can do 
whatever you like to the data, then send it to Solr.


The dataimport handler is indeed a server side feature.  It is a contrib 
module included in the Solr distribution, you have to add a jar to Solr 
to activate it.


Thanks,
Shawn



Re: Solr restart is taking more than 1 hour

2013-04-16 Thread gpssolr2020
Thanks for detailed explanation.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-restart-is-taking-more-than-1-hour-tp4054165p4056355.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.2.x replication events on slaves

2013-04-16 Thread Chris Hostetter

: In Solr 3.x, I was relying on a postCommit call to a listener in the update
: handler to perform data update to caches, this data was used to perform
: 'realtime' filtering on the documents.

I can't find it at the moment, but IIRC this was a side effect of how 
snapshots are now loaded on slaves -- there is no longer an explicit 
commit to read in the new index.

For your usecase however, i think what would make more sense (and probably 
would have always made more sense) is to implement this using the 
newSearcher hook, which allows you to block usage of the newSearcher 
until you have finished your hook logic.

Alternatively, you can implement CacheRegenerator which was specifically 
designed for warming caches on newSearcher events, and gives you access to 
the current cache keys so you can see what items where in the old 
caches to warm.


-Hoss


Re: CloudSolrServer vs ConcurrentUpdateSolrServer for indexing

2013-04-16 Thread J Mohamed Zahoor
It sure increased the performance .
Thanks for the input.

./zahoor

On 14-Apr-2013, at 10:13 PM, J Mohamed Zahoor zah...@indix.com wrote:

 Thanks..
 Will try multithreading with CloudSolrServer.
 
 ./zahoor
 
 On 13-Apr-2013, at 9:11 PM, Mark Miller markrmil...@gmail.com wrote:
 
 
 On Apr 13, 2013, at 11:07 AM, J Mohamed Zahoor zah...@indix.com wrote:
 
 Hi
 
 This question has come up many times in the list with lots of variations 
 (which confuses me a lot).
 
 Iam using Solr 4.1. one collection , 6 shards, 6 machines.
 I am using CloudSolrServer  inside each mapper to index my documents…. 
 While it is working fine , iam trying to improve the indexing performance.
 
 
 Question is:  
 
 1) is CloudSolrServer multiThreaded?
 
 No. The proper fast way to use it is to start many threads that all add docs 
 to the same CloudSolrServer instance. In other words, currently, you must do 
 the multi threading yourself. CloudSolrServer is thread safe.
 
 
 2) Will using ConcurrentUpdateSolr server increase indexing performance?
 
 Yes, but at the cost of having to specify a server to talk to - if it goes 
 down, so does your indexing. It's also not very great at reporting errors. 
 Finally, using multiple threads and CloudSolrServer, you can approach the 
 performance of ConcurrentUpdateSolr server.
 
 - Mark
 
 
 ./Zahoor
 
 



Re: Troubles with solr replication

2013-04-16 Thread Chris Hostetter

: Also when I checked the solr log.
: 
:  [org.apache.solr.handler.SnapPuller] Master at:
:  http://192.168.2.204:8080/solr/replication is not available. Index fetch
:  failed. Exception: Connection refused
: 
: 
: BTW, I was able to fetch the replication file with wget directly.

Are you certian that the network setup for your master  slave machines 
alows them to talk to eachother?  you said you could fetch the files from 
the master via wget, but i'm guessing you were running this from your 
local machine -- are you certain that when logged in to 192.168.2.174 you 
can reach port 8080 of 192.168.2.204?


-Hoss


zkState changes too often

2013-04-16 Thread J Mohamed Zahoor
Hi

I am using SolrCloud (4.1) with 6 nodes.
When i index the documents from the mapper and as the load increases.. i see 
these messages in my mapper logs…
WHich looks like it is slowing down my indexing speed.


2013-04-16 06:04:18,013 INFO org.apache.solr.common.cloud.ZkStateReader: 
Updating live nodes... (5)
2013-04-16 06:04:18,186 INFO org.apache.solr.common.cloud.ZkStateReader: 
Updating live nodes... (6)
2013-04-16 06:04:18,186 INFO org.apache.solr.common.cloud.ZkStateReader: 
Updating live nodes... (6)
2013-04-16 06:04:19,485 INFO org.apache.solr.common.cloud.ZkStateReader: A 
cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged 
path:/clusterstate.json, has occurred - updating... (live nodes size: 6)
2013-04-16 06:04:19,487 INFO org.apache.solr.common.cloud.ZkStateReader: A 
cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged 
path:/clusterstate.json, has occurred - updating... (live nodes size: 6)
2013-04-16 06:08:30,006 INFO org.apache.solr.common.cloud.ZkStateReader: A 
cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged 
path:/live_nodes, has occurred - updating... (live nodes size: 6)
2013-04-16 06:08:30,010 INFO org.apache.solr.common.cloud.ZkStateReader: 
Updating live nodes... (5)
2013-04-16 06:08:30,010 INFO org.apache.solr.common.cloud.ZkStateReader: A 
cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged 
path:/live_nodes, has occurred - updating... (live nodes size: 5)
2013-04-16 06:08:30,019 INFO org.apache.solr.common.cloud.ZkStateReader: 
Updating live nodes... (5)
2013-04-16 06:08:35,443 INFO org.apache.solr.common.cloud.ZkStateReader: A 
cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged 
path:/live_nodes, has occurred - updating... (live nodes size: 5)
2013-04-16 06:08:35,446 INFO org.apache.solr.common.cloud.ZkStateReader: 
Updating live nodes... (6)
2013-04-16 06:08:35,446 INFO org.apache.solr.common.cloud.ZkStateReader: A 
cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged 
path:/live_nodes, has occurred - updating... (live nodes size: 6)
2013-04-16 06:08:35,459 INFO org.apache.solr.common.cloud.ZkStateReader: 
Updating live nodes... (6)
2013-04-16 06:08:48,929 INFO org.apache.solr.common.cloud.ZkStateReader: A 
cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged 
path:/clusterstate.json, has occurred - updating... (live nodes size: 6)
2013-04-16 06:08:48,931 INFO org.apache.solr.common.cloud.ZkStateReader: A 
cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged 
path:/clusterstate.json, has occurred - updating... (live nodes size: 6)
2013-04-16 06:09:12,005 INFO org.apache.solr.common.cloud.ZkStateReader: A 
cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged 
path:/live_nodes, has occurred - updating... (live nodes size: 6)
2013-04-16 06:09:12,010 INFO org.apache.solr.common.cloud.ZkStateReader: 
Updating live nodes... (5)
2013-04-16 06:09:12,011 INFO org.apache.solr.common.cloud.ZkStateReader: A 
cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged 
path:/live_nodes, has occurred - updating... (live nodes size: 5)
2013-04-16 06:09:12,014 INFO org.apache.solr.common.cloud.ZkStateReader: 
Updating live nodes... (5)
2013-04-16 06:09:15,438 INFO org.apache.solr.common.cloud.ZkStateReader: A 
cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged 
path:/live_nodes, has occurred - updating... (live nodes size: 5)
2013-04-16 06:09:15,441 INFO org.apache.solr.common.cloud.ZkStateReader: 
Updating live nodes... (6)
2013-04-16 06:09:15,441 INFO org.apache.solr.common.cloud.ZkStateReader: A 
cluster state change: WatchedEvent stat

I tried increasing the Zk timeout from 15 to 20 sec… but i still see this 
message…
anything i might try to avoid this?

./Zahoor




Document Missing from Share in Solr cloud

2013-04-16 Thread Cool Techi
Hi,

We noticed a strange behavior in our solr cloud setup, we are using solr4.2  
with 1:3 replication setting. We noticed that some of the documents were 
showing up in search sometimes and not at other, the reason being the document 
was not present in all the shards.

We have restarted zookeeper and also entire cloud, but these documents are not 
being replicated in all the shards for some reason and hence inconsistent 
search results.

Regards,
Ayush
  

Re: Document Missing from Share in Solr cloud

2013-04-16 Thread Timothy Potter
If you are using the default doc router for indexing in SolrCloud, then a
document only exists in a single shard but can be replicated in that shard
to any number of replicas.

Can you clarify your question as it sounds like you're saying that the
document is not replicated across all the replicas for a specific shard? If
so, that's definitely a problem ...


On Tue, Apr 16, 2013 at 11:22 AM, Cool Techi cooltec...@outlook.com wrote:

 Hi,

 We noticed a strange behavior in our solr cloud setup, we are using
 solr4.2  with 1:3 replication setting. We noticed that some of the
 documents were showing up in search sometimes and not at other, the reason
 being the document was not present in all the shards.

 We have restarted zookeeper and also entire cloud, but these documents are
 not being replicated in all the shards for some reason and hence
 inconsistent search results.

 Regards,
 Ayush



Re: zkState changes too often

2013-04-16 Thread Mark Miller
Are you using a the concurrent low pause garbage collector or perhaps G1? 

Are you able to use something like visualvm to pinpoint what the bottleneck 
might be?

Otherwise, keep raising the timeout. This means Solr and Zk are not able to 
talk for that much time - either something needs to be tuned or the time 
allowed raised.

- Mark

On Apr 16, 2013, at 12:49 PM, J Mohamed Zahoor zah...@indix.com wrote:

 Hi
 
 I am using SolrCloud (4.1) with 6 nodes.
 When i index the documents from the mapper and as the load increases.. i see 
 these messages in my mapper logs…
 WHich looks like it is slowing down my indexing speed.
 
 
 2013-04-16 06:04:18,013 INFO org.apache.solr.common.cloud.ZkStateReader: 
 Updating live nodes... (5)
 2013-04-16 06:04:18,186 INFO org.apache.solr.common.cloud.ZkStateReader: 
 Updating live nodes... (6)
 2013-04-16 06:04:18,186 INFO org.apache.solr.common.cloud.ZkStateReader: 
 Updating live nodes... (6)
 2013-04-16 06:04:19,485 INFO org.apache.solr.common.cloud.ZkStateReader: A 
 cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged 
 path:/clusterstate.json, has occurred - updating... (live nodes size: 6)
 2013-04-16 06:04:19,487 INFO org.apache.solr.common.cloud.ZkStateReader: A 
 cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged 
 path:/clusterstate.json, has occurred - updating... (live nodes size: 6)
 2013-04-16 06:08:30,006 INFO org.apache.solr.common.cloud.ZkStateReader: A 
 cluster state change: WatchedEvent state:SyncConnected 
 type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live 
 nodes size: 6)
 2013-04-16 06:08:30,010 INFO org.apache.solr.common.cloud.ZkStateReader: 
 Updating live nodes... (5)
 2013-04-16 06:08:30,010 INFO org.apache.solr.common.cloud.ZkStateReader: A 
 cluster state change: WatchedEvent state:SyncConnected 
 type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live 
 nodes size: 5)
 2013-04-16 06:08:30,019 INFO org.apache.solr.common.cloud.ZkStateReader: 
 Updating live nodes... (5)
 2013-04-16 06:08:35,443 INFO org.apache.solr.common.cloud.ZkStateReader: A 
 cluster state change: WatchedEvent state:SyncConnected 
 type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live 
 nodes size: 5)
 2013-04-16 06:08:35,446 INFO org.apache.solr.common.cloud.ZkStateReader: 
 Updating live nodes... (6)
 2013-04-16 06:08:35,446 INFO org.apache.solr.common.cloud.ZkStateReader: A 
 cluster state change: WatchedEvent state:SyncConnected 
 type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live 
 nodes size: 6)
 2013-04-16 06:08:35,459 INFO org.apache.solr.common.cloud.ZkStateReader: 
 Updating live nodes... (6)
 2013-04-16 06:08:48,929 INFO org.apache.solr.common.cloud.ZkStateReader: A 
 cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged 
 path:/clusterstate.json, has occurred - updating... (live nodes size: 6)
 2013-04-16 06:08:48,931 INFO org.apache.solr.common.cloud.ZkStateReader: A 
 cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged 
 path:/clusterstate.json, has occurred - updating... (live nodes size: 6)
 2013-04-16 06:09:12,005 INFO org.apache.solr.common.cloud.ZkStateReader: A 
 cluster state change: WatchedEvent state:SyncConnected 
 type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live 
 nodes size: 6)
 2013-04-16 06:09:12,010 INFO org.apache.solr.common.cloud.ZkStateReader: 
 Updating live nodes... (5)
 2013-04-16 06:09:12,011 INFO org.apache.solr.common.cloud.ZkStateReader: A 
 cluster state change: WatchedEvent state:SyncConnected 
 type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live 
 nodes size: 5)
 2013-04-16 06:09:12,014 INFO org.apache.solr.common.cloud.ZkStateReader: 
 Updating live nodes... (5)
 2013-04-16 06:09:15,438 INFO org.apache.solr.common.cloud.ZkStateReader: A 
 cluster state change: WatchedEvent state:SyncConnected 
 type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live 
 nodes size: 5)
 2013-04-16 06:09:15,441 INFO org.apache.solr.common.cloud.ZkStateReader: 
 Updating live nodes... (6)
 2013-04-16 06:09:15,441 INFO org.apache.solr.common.cloud.ZkStateReader: A 
 cluster state change: WatchedEvent stat
 
 I tried increasing the Zk timeout from 15 to 20 sec… but i still see this 
 message…
 anything i might try to avoid this?
 
 ./Zahoor
 
 



RE: Document Missing from Share in Solr cloud

2013-04-16 Thread Cool Techi
That's what I am trying to say, the document is not replicated across all the 
replicas for a specific shard, hence the query show different results on every 
refresh.



 Date: Tue, 16 Apr 2013 11:34:18 -0600
 Subject: Re: Document Missing from Share in Solr cloud
 From: thelabd...@gmail.com
 To: solr-user@lucene.apache.org
 
 If you are using the default doc router for indexing in SolrCloud, then a
 document only exists in a single shard but can be replicated in that shard
 to any number of replicas.
 
 Can you clarify your question as it sounds like you're saying that the
 document is not replicated across all the replicas for a specific shard? If
 so, that's definitely a problem ...
 
 
 On Tue, Apr 16, 2013 at 11:22 AM, Cool Techi cooltec...@outlook.com wrote:
 
  Hi,
 
  We noticed a strange behavior in our solr cloud setup, we are using
  solr4.2  with 1:3 replication setting. We noticed that some of the
  documents were showing up in search sometimes and not at other, the reason
  being the document was not present in all the shards.
 
  We have restarted zookeeper and also entire cloud, but these documents are
  not being replicated in all the shards for some reason and hence
  inconsistent search results.
 
  Regards,
  Ayush
 
  

Re: Solr 4.2.1 sorting by distance to polygon centre.

2013-04-16 Thread Smiley, David W.
Guido,

I encourage you to try to open-source the shape-related code you have to
Spatial4j.  I realize that for some organizations, that can be really
difficult.  

~ David

On 4/16/13 11:55 AM, Guido Medina guido.med...@temetra.com wrote:

David,

   I just peak it at github, the method will estimate well for our
purpose, but depends on JTS which we included in our Solr server only,
but we don't want LGPL libraries (v3) in our main project, kind of a
show stopper, I understand is needed for spatial4j, Lucene and Solr in
general, so we have no issues keeping it at the Solr server. But can't
put it on main web project for licensing issues. I know JTS is a great
set of needed functions for spatial projects. Shame I can't use it
directly, like I had to develop some convex hull by myself.

Guido.



Re: Document Missing from Share in Solr cloud

2013-04-16 Thread Timothy Potter
Ok, that makes more sense and is definitely cause for concern. Do you have
a sense for whether this is ongoing or happened a few times unexpectedly in
the past? If ongoing, then will probably be easier to track down the root
cause.


On Tue, Apr 16, 2013 at 12:08 PM, Cool Techi cooltec...@outlook.com wrote:

 That's what I am trying to say, the document is not replicated across all
 the replicas for a specific shard, hence the query show different results
 on every refresh.



  Date: Tue, 16 Apr 2013 11:34:18 -0600
  Subject: Re: Document Missing from Share in Solr cloud
  From: thelabd...@gmail.com
  To: solr-user@lucene.apache.org
 
  If you are using the default doc router for indexing in SolrCloud, then a
  document only exists in a single shard but can be replicated in that
 shard
  to any number of replicas.
 
  Can you clarify your question as it sounds like you're saying that the
  document is not replicated across all the replicas for a specific shard?
 If
  so, that's definitely a problem ...
 
 
  On Tue, Apr 16, 2013 at 11:22 AM, Cool Techi cooltec...@outlook.com
 wrote:
 
   Hi,
  
   We noticed a strange behavior in our solr cloud setup, we are using
   solr4.2  with 1:3 replication setting. We noticed that some of the
   documents were showing up in search sometimes and not at other, the
 reason
   being the document was not present in all the shards.
  
   We have restarted zookeeper and also entire cloud, but these documents
 are
   not being replicated in all the shards for some reason and hence
   inconsistent search results.
  
   Regards,
   Ayush
  




Re: updateLog in Solr 4.2

2013-04-16 Thread Mark Miller
Can you file a JIRA issue? - minimum you should get a better error.

- Mark

On Apr 12, 2013, at 9:17 AM, vicky desai vicky.de...@germinait.com wrote:

 If i disable update log in solr 4.2 then i get the following exception
 SEVERE: :java.lang.NullPointerException
at
 org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:190)
at
 org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:156)
at
 org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:100)
at
 org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:266)
at
 org.apache.solr.cloud.ZkController.joinElection(ZkController.java:935)
at
 org.apache.solr.cloud.ZkController.register(ZkController.java:761)
at
 org.apache.solr.cloud.ZkController.register(ZkController.java:727)
at
 org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:908)
at
 org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:892)
at
 org.apache.solr.core.CoreContainer.register(CoreContainer.java:841)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:638)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
 
 Apr 12, 2013 6:39:56 PM org.apache.solr.common.SolrException log
 SEVERE: null:org.apache.solr.common.cloud.ZooKeeperException:
at
 org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:931)
at
 org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:892)
at
 org.apache.solr.core.CoreContainer.register(CoreContainer.java:841)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:638)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
 Caused by: java.lang.NullPointerException
at
 org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:190)
at
 org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:156)
at
 org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:100)
at
 org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:266)
at
 org.apache.solr.cloud.ZkController.joinElection(ZkController.java:935)
at
 org.apache.solr.cloud.ZkController.register(ZkController.java:761)
at
 org.apache.solr.cloud.ZkController.register(ZkController.java:727)
at
 org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:908)
... 12 more
 
 and solr fails to start . However if i add updatelog in my solrconfig.xml it
 starts. Is the update log parameter mandatory for solr4.2
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/updateLog-in-Solr-4-2-tp4055548.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SolrCloud Leader Response Mechanism

2013-04-16 Thread Mark Miller
Leaders don't have much to do with querying - the node that you query will 
determine what other nodes it has to query to search the whole index and do a 
scatter/gather for you. (Though in some cases that request can be proxied to 
another node)

- Mark

On Apr 16, 2013, at 7:48 AM, Furkan KAMACI furkankam...@gmail.com wrote:

 When a leader responses for a query, does it says that: If I have the data
 what I am looking for, I should build response with it, otherwise I should
 find it anywhere. Because it may be long to search it?
 or
 does it says I only index the data, I will tell it to other guys to build
 up the response query?



solr 3.5 core rename issue

2013-04-16 Thread Jie Sun
We just tried to use 
.../solr/admin/cores?action=RENAMEcore=core0other=core5

to rename a core 'old' to 'new'.

After the request is done, the solr.xml has new core name, and the solr
admin shows the new core name in the list. But the index dir still has the
old name as the directory name. I looked into solr 3.5 code, this is what
the code does.

However, if I bounce tomcat/solr, when solr is started up, it creates new
index dir with 'new', and now of course there is no longer any document
returned if you search the core.

is this a bug? or did I miss anything?
thanks
Jie



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-3-5-core-rename-issue-tp4056425.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Document Missing from Share in Solr cloud

2013-04-16 Thread Timothy Potter
btw ... what is the field type of your unique ID field?


On Tue, Apr 16, 2013 at 12:34 PM, Timothy Potter thelabd...@gmail.comwrote:

 Ok, that makes more sense and is definitely cause for concern. Do you have
 a sense for whether this is ongoing or happened a few times unexpectedly in
 the past? If ongoing, then will probably be easier to track down the root
 cause.


 On Tue, Apr 16, 2013 at 12:08 PM, Cool Techi cooltec...@outlook.comwrote:

 That's what I am trying to say, the document is not replicated across all
 the replicas for a specific shard, hence the query show different results
 on every refresh.



  Date: Tue, 16 Apr 2013 11:34:18 -0600
  Subject: Re: Document Missing from Share in Solr cloud
  From: thelabd...@gmail.com
  To: solr-user@lucene.apache.org
 
  If you are using the default doc router for indexing in SolrCloud, then
 a
  document only exists in a single shard but can be replicated in that
 shard
  to any number of replicas.
 
  Can you clarify your question as it sounds like you're saying that the
  document is not replicated across all the replicas for a specific
 shard? If
  so, that's definitely a problem ...
 
 
  On Tue, Apr 16, 2013 at 11:22 AM, Cool Techi cooltec...@outlook.com
 wrote:
 
   Hi,
  
   We noticed a strange behavior in our solr cloud setup, we are using
   solr4.2  with 1:3 replication setting. We noticed that some of the
   documents were showing up in search sometimes and not at other, the
 reason
   being the document was not present in all the shards.
  
   We have restarted zookeeper and also entire cloud, but these
 documents are
   not being replicated in all the shards for some reason and hence
   inconsistent search results.
  
   Regards,
   Ayush
  






Re: solr 3.5 core rename issue

2013-04-16 Thread Shawn Heisey

On 4/16/2013 2:02 PM, Jie Sun wrote:

We just tried to use
.../solr/admin/cores?action=RENAMEcore=core0other=core5

to rename a core 'old' to 'new'.

After the request is done, the solr.xml has new core name, and the solr
admin shows the new core name in the list. But the index dir still has the
old name as the directory name. I looked into solr 3.5 code, this is what
the code does.

However, if I bounce tomcat/solr, when solr is started up, it creates new
index dir with 'new', and now of course there is no longer any document
returned if you search the core.

is this a bug? or did I miss anything?


If your solr.xml is missing the 'persistent' attribute on the solr 
tag, or it is set to false, then I can imagine it behaving this way. 
This must be set to true, or changes that you make with the core admin 
API will not be written to disk, so they will not survive a restart.


solr sharedLib=lib persistent=true
  cores adminPath=/admin/cores

I haven't used the RENAME functionality, but I use the core SWAP feature 
extensively.  I have cores with names like s0live and s0build, but they 
actually refer to directories with names like s0_0 and s0_1.  When they 
swap, the directory location of the index doesn't change, but it's like 
I have renamed both of them with each other's name.


Thanks,
Shawn



Re: solr 3.5 core rename issue

2013-04-16 Thread Jie Sun
Hi Shawn,
I do have persistent=true in my solr.xml:

?xml version=1.0 encoding=UTF-8 ?
solr persistent=true
  cores adminPath=/admin/cores
core name=default instanceDir=.//
core name=413a instanceDir=.//
core name=blah instanceDir=.//
...
  /cores
/solr

the command I ran was to rename from '413' to '413a'. 

when i debug through solr CoreAdminHandler, I notice the persistent flag
only controls if the new data will be persisted to solr.xml or not, thus as
you can see, it did changed my solr.xml, there is no problem here.

But the index dir ends up with no change at all (still '413'). I guess swap
will have similar issue, I bet your 's0_0' directory actually hold data for
core s0build, and s0_1 holds data for s0live after you swap them. Because I
dont see anywhere in CoreAdminHandler and CoreContainer code actually rename
the index directory. I might be wrong, but you can test and find out.

Jie



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-3-5-core-rename-issue-tp4056425p4056435.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Leader Response Mechanism

2013-04-16 Thread Furkan KAMACI
Hi Mark;

When I speak with proper terms I want to ask that: is there a data locality
of spatial locality (
http://www.roguewave.com/portals/0/products/threadspotter/docs/2011.2/manual_html_linux/manual_html/ch_intro_locality.html
- I mean if you have data on your machine, use it and don't search it
anywhere else, just search for remaining parts) at querying on a leader of
SolrCloud?

2013/4/16 Mark Miller markrmil...@gmail.com

 Leaders don't have much to do with querying - the node that you query will
 determine what other nodes it has to query to search the whole index and do
 a scatter/gather for you. (Though in some cases that request can be proxied
 to another node)

 - Mark

 On Apr 16, 2013, at 7:48 AM, Furkan KAMACI furkankam...@gmail.com wrote:

  When a leader responses for a query, does it says that: If I have the
 data
  what I am looking for, I should build response with it, otherwise I
 should
  find it anywhere. Because it may be long to search it?
  or
  does it says I only index the data, I will tell it to other guys to build
  up the response query?




Why indexing and querying performance is better at SolrCloud compared to older versions of Solr?

2013-04-16 Thread Furkan KAMACI
Is there any document that describes why indexing and querying performance
is better at SolrCloud compared to older versions of Solr?

I was examining that architecture to use: there will be a cloud of Solr
that just do indexing and there will be another cloud that copies that
indexes into them and just to querying because of to get better
performance. However if I use SolrCloud I think that there is no need to
build up an architecture such like it.


Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase

2013-04-16 Thread Furkan KAMACI
Hi Otis and Jack;

I have made a research about highlights and debugged code. I see that
highlight are query dependent and not stored. Why Solr uses Lucene for
storing text, I mean i.e. content of a web page. Is there any comparison
about to store texts at Hbase or any other databases versus Lucene.

Also I want to learn that is there anybody who has used anything else from
Lucene to store text of document at our solr user list?

2013/4/11 Otis Gospodnetic otis.gospodne...@gmail.com

 Source code is your best bet.  Wiki has info about how to use it, but
 not how highlighting is implemented.  But you don't need to understand
 the implementation details to understand that they are dynamic,
 computed specifically for each query for each matching document, so
 you cannot store them anywhere ahead of time.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Thu, Apr 11, 2013 at 11:22 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Hi Otis;
 
  It seems that I should read more about highlights. Is there any where
 that
  explains in detail how highlights are generated at Solr?
 
  2013/4/11 Otis Gospodnetic otis.gospodne...@gmail.com
 
  Hi,
 
  You can't store highlights ahead of time because they are query
  dependent.  You could store documents in HBase and use Solr just for
  indexing.  Is that what you want to do?  If so, a custom
  SearchComponent executed after QueryComponent could fetch data from
  external store like HBase.  I'm not sure if I'd recommend that.
 
  Otis
  --
  Solr  ElasticSearch Support
  http://sematext.com/
 
 
 
 
 
  On Thu, Apr 11, 2013 at 10:01 AM, Furkan KAMACI furkankam...@gmail.com
 
  wrote:
   Actually I don't think to store documents at Solr. I want to store
 just
   highlights (snippets) at Hbase and I want to retrieve them from Hbase
  when
   needed.
   What do you think about separating just highlights from Solr and
 storing
   them into Hbase at Solrclod. By the way if you explain at which
 process
  and
   how highlights are genareted at Solr you are welcome.
  
  
   2013/4/9 Otis Gospodnetic otis.gospodne...@gmail.com
  
   You may also be interested in looking at things like solrbase (on
  Github).
  
   Otis
   --
   Solr  ElasticSearch Support
   http://sematext.com/
  
  
  
  
  
   On Sat, Apr 6, 2013 at 6:01 PM, Furkan KAMACI 
 furkankam...@gmail.com
   wrote:
Hi;
   
First of all should mention that I am new to Solr and making a
  research
about it. What I am trying to do that I will crawl some websites
 with
   Nutch
and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud
 4.2 )
   
I wonder about something. I have a cloud of machines that crawls
  websites
and stores that documents. Then I send that documents into
 SolrCloud.
   Solr
indexes that documents and generates indexes and save them. I know
  that
from Information Retrieval theory: it *may* not be efficient to
 store
indexes at a NoSQL database (they are something like linked lists
 and
  if
you store them in such kind of database you *may* have a sparse
representation -by the way there may be some solutions for it. If
 you
explain them you are welcome.)
   
However Solr stores some documents too (i.e. highlights) So some
 of my
documents will be doubled somehow. If I consider that I will have
 many
documents, that dobuled documents may cause a problem for me. So is
  there
any way not storing that documents at Solr and pointing to them at
Hbase(where I save my crawled documents) or instead of pointing
  directly
storing them at Hbase (is it efficient or not)?
  
 



Re: Empty Solr 4.2.1 can not create Collection

2013-04-16 Thread Chris Hostetter

: sorry for pushing, but I just replayed the steps with solr 4.0 where
: everything works fine.
: Then I switched to solr 4.2.1 and replayed the exact same steps and the
: collection won't start and no leader will be elected.
: 
: Any clues ?
: Should I try it on the developer mailing list, maybe it's a bug ?

I'm not really understanding what the sequence of events is that's leading 
you to this error, but if you can reproduce a problem in which there is no 
leader election (and you get the NPE listed below) when creating a 
collection then yes, absolutely, please open a Jira and include...

1) the specific list of steps to reproduce starting from a 4.2.1 install
2) the configs you start with as well as any configs you are specifying 
when creating collections
3) snapshots of clusterstate.json taken before and after you encounter the 
problem
4) logs from each of hte solr servers you run in your test.



: 
: Kind Regards
: Alexander
: 
: Am 2013-04-10 22:27, schrieb A.Eibner:
:  Hi,
:  
:  here the clusterstate.json (from zookeeper) after creating the core:
:  
:  {storage:{
:   shards:{shard1:{
:   range:8000-7fff,
:   state:active,
:   replicas:{app02:9985_solr_storage-core:{
:   shard:shard1,
:   state:down,
:   core:storage-core,
:   collection:storage,
:   node_name:app02:9985_solr,
:   base_url:http://app02:9985/solr,
:   router:compositeId}}
:  cZxid = 0x10024
:  ctime = Wed Apr 10 22:18:13 CEST 2013
:  mZxid = 0x1003d
:  mtime = Wed Apr 10 22:21:26 CEST 2013
:  pZxid = 0x10024
:  cversion = 0
:  dataVersion = 2
:  aclVersion = 0
:  ephemeralOwner = 0x0
:  dataLength = 467
:  numChildren = 0
:  
:  But looking in the log files I found the following error (this also
:  occures with the collection api)
:  
:  SEVERE: org.apache.solr.common.SolrException: Error CREATEing SolrCore
:  'storage_shard1_replica1':
:   at
:  
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:483)
:  
:   at
:  
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:140)
:  
:   at
:  
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
:  
:   at
:  
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:591)
:  
:   at
:  
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:192)
:  
:   at
:  
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
:  
:   at
:  
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
:  
:   at
:  
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
:  
:   at
:  
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)
:  
:   at
:  
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
:  
:   at
:  
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
:  
:   at
:  org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
:  
:   at
:  
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
:  
:   at
:  org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
:   at
:  
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:999)
:  
:   at
:  
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:565)
:  
:   at
:  
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:307)
:  
:   at
:  
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
:  
:   at
:  
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
:  
:   at java.lang.Thread.run(Thread.java:722)
:  Caused by: org.apache.solr.common.cloud.ZooKeeperException:
:   at
:  org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:931)
:   at
:  org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:892)
:   at
:  org.apache.solr.core.CoreContainer.register(CoreContainer.java:841)
:   at
:  
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:479)
:  
:   ... 19 more
:  Caused by: java.lang.NullPointerException
:   at
:  
org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:190)
:  
:   at
:  
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:156)
:  
:   at
:  org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:100)
:  
:   at
:  

When a search query comes to a replica what happens?

2013-04-16 Thread Furkan KAMACI
I want to make it clear in my mind:

When a search query comes to a replica what happens?

-Does it forwards the search query to leader and leader collects all the
data and prepares response (this will cause a performance issue because
leader is responsible for indexing at same time)
or
- replica communicates with leader and learns where is remaining
data(leaders asks to Zookeper and tells it to replica) and replica collects
all data and response it?


How SolrCloud Balance Number of Documents at each Shard?

2013-04-16 Thread Furkan KAMACI
Is it possible that different shards have different number of documents or
does SolrCloud balance them?

I ask this question because I want to learn the mechanism behind how Solr
calculete hash value of the identifier of the document. Is it possible that
hash function produces more documents into one of the shards other than any
of shards. (because this may cause a bottleneck at some leaders of
SolrCloud)


Re: When a search query comes to a replica what happens?

2013-04-16 Thread Otis Gospodnetic
Hi,

No, I believe redirect from replica to leader would happen only at
index time, so a doc first gets indexed to leader and from there it's
replicated to non-leader shards.  At query time there is no redirect
to leader, I imagine, as that would quickly turn leaders into
hotspots.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Tue, Apr 16, 2013 at 6:01 PM, Furkan KAMACI furkankam...@gmail.com wrote:
 I want to make it clear in my mind:

 When a search query comes to a replica what happens?

 -Does it forwards the search query to leader and leader collects all the
 data and prepares response (this will cause a performance issue because
 leader is responsible for indexing at same time)
 or
 - replica communicates with leader and learns where is remaining
 data(leaders asks to Zookeper and tells it to replica) and replica collects
 all data and response it?


Re: How SolrCloud Balance Number of Documents at each Shard?

2013-04-16 Thread Otis Gospodnetic
They won't be exact, but should be close.  Are you seeing some *big*
differences?

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Tue, Apr 16, 2013 at 6:11 PM, Furkan KAMACI furkankam...@gmail.com wrote:
 Is it possible that different shards have different number of documents or
 does SolrCloud balance them?

 I ask this question because I want to learn the mechanism behind how Solr
 calculete hash value of the identifier of the document. Is it possible that
 hash function produces more documents into one of the shards other than any
 of shards. (because this may cause a bottleneck at some leaders of
 SolrCloud)


Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase

2013-04-16 Thread Otis Gospodnetic
People do use other data stores to retrieve data sometimes. e.g. Mongo
is popular for that.  Like I hinted in another email, I wouldn't
necessarily recommend this for common cases.  Don't do it unless you
really know you need it.  Otherwise, just store in Solr.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Tue, Apr 16, 2013 at 5:32 PM, Furkan KAMACI furkankam...@gmail.com wrote:
 Hi Otis and Jack;

 I have made a research about highlights and debugged code. I see that
 highlight are query dependent and not stored. Why Solr uses Lucene for
 storing text, I mean i.e. content of a web page. Is there any comparison
 about to store texts at Hbase or any other databases versus Lucene.

 Also I want to learn that is there anybody who has used anything else from
 Lucene to store text of document at our solr user list?

 2013/4/11 Otis Gospodnetic otis.gospodne...@gmail.com

 Source code is your best bet.  Wiki has info about how to use it, but
 not how highlighting is implemented.  But you don't need to understand
 the implementation details to understand that they are dynamic,
 computed specifically for each query for each matching document, so
 you cannot store them anywhere ahead of time.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Thu, Apr 11, 2013 at 11:22 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Hi Otis;
 
  It seems that I should read more about highlights. Is there any where
 that
  explains in detail how highlights are generated at Solr?
 
  2013/4/11 Otis Gospodnetic otis.gospodne...@gmail.com
 
  Hi,
 
  You can't store highlights ahead of time because they are query
  dependent.  You could store documents in HBase and use Solr just for
  indexing.  Is that what you want to do?  If so, a custom
  SearchComponent executed after QueryComponent could fetch data from
  external store like HBase.  I'm not sure if I'd recommend that.
 
  Otis
  --
  Solr  ElasticSearch Support
  http://sematext.com/
 
 
 
 
 
  On Thu, Apr 11, 2013 at 10:01 AM, Furkan KAMACI furkankam...@gmail.com
 
  wrote:
   Actually I don't think to store documents at Solr. I want to store
 just
   highlights (snippets) at Hbase and I want to retrieve them from Hbase
  when
   needed.
   What do you think about separating just highlights from Solr and
 storing
   them into Hbase at Solrclod. By the way if you explain at which
 process
  and
   how highlights are genareted at Solr you are welcome.
  
  
   2013/4/9 Otis Gospodnetic otis.gospodne...@gmail.com
  
   You may also be interested in looking at things like solrbase (on
  Github).
  
   Otis
   --
   Solr  ElasticSearch Support
   http://sematext.com/
  
  
  
  
  
   On Sat, Apr 6, 2013 at 6:01 PM, Furkan KAMACI 
 furkankam...@gmail.com
   wrote:
Hi;
   
First of all should mention that I am new to Solr and making a
  research
about it. What I am trying to do that I will crawl some websites
 with
   Nutch
and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud
 4.2 )
   
I wonder about something. I have a cloud of machines that crawls
  websites
and stores that documents. Then I send that documents into
 SolrCloud.
   Solr
indexes that documents and generates indexes and save them. I know
  that
from Information Retrieval theory: it *may* not be efficient to
 store
indexes at a NoSQL database (they are something like linked lists
 and
  if
you store them in such kind of database you *may* have a sparse
representation -by the way there may be some solutions for it. If
 you
explain them you are welcome.)
   
However Solr stores some documents too (i.e. highlights) So some
 of my
documents will be doubled somehow. If I consider that I will have
 many
documents, that dobuled documents may cause a problem for me. So is
  there
any way not storing that documents at Solr and pointing to them at
Hbase(where I save my crawled documents) or instead of pointing
  directly
storing them at Hbase (is it efficient or not)?
  
 



Re: Why indexing and querying performance is better at SolrCloud compared to older versions of Solr?

2013-04-16 Thread Otis Gospodnetic
Correct.  With SolrCloud you typically don't need to make this
separation (with ElasticSearch one can designate some nodes as
non-data nodes).  SolrCloud won't necessarily always be faster because
it typically involves sharding and thus a distributed search, while
some non-SolrCloud setups can hold the whole index locally and thus
avoid the network part.

General (and friendly!) comment - you may find it faster/cheaper/more
efficient to just pick the approach and do it, unless you are really
doing this purely to learn the theory.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Tue, Apr 16, 2013 at 5:27 PM, Furkan KAMACI furkankam...@gmail.com wrote:
 Is there any document that describes why indexing and querying performance
 is better at SolrCloud compared to older versions of Solr?

 I was examining that architecture to use: there will be a cloud of Solr
 that just do indexing and there will be another cloud that copies that
 indexes into them and just to querying because of to get better
 performance. However if I use SolrCloud I think that there is no need to
 build up an architecture such like it.


Re: SolrCloud Leader Response Mechanism

2013-04-16 Thread Otis Gospodnetic
If query comes to shard X on some node and this shard X is NOT a
leader, but HAS data, it will just execute the query.  If it needs to
query shards on other nodes, it will have the info about which shards
to query and will just do that and aggregate the results.  It doesn't
have to ask leader for permission, for info, etc.  It can just do it
because it knows where things are.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Tue, Apr 16, 2013 at 5:23 PM, Furkan KAMACI furkankam...@gmail.com wrote:
 Hi Mark;

 When I speak with proper terms I want to ask that: is there a data locality
 of spatial locality (
 http://www.roguewave.com/portals/0/products/threadspotter/docs/2011.2/manual_html_linux/manual_html/ch_intro_locality.html
 - I mean if you have data on your machine, use it and don't search it
 anywhere else, just search for remaining parts) at querying on a leader of
 SolrCloud?

 2013/4/16 Mark Miller markrmil...@gmail.com

 Leaders don't have much to do with querying - the node that you query will
 determine what other nodes it has to query to search the whole index and do
 a scatter/gather for you. (Though in some cases that request can be proxied
 to another node)

 - Mark

 On Apr 16, 2013, at 7:48 AM, Furkan KAMACI furkankam...@gmail.com wrote:

  When a leader responses for a query, does it says that: If I have the
 data
  what I am looking for, I should build response with it, otherwise I
 should
  find it anywhere. Because it may be long to search it?
  or
  does it says I only index the data, I will tell it to other guys to build
  up the response query?




Re: When a search query comes to a replica what happens?

2013-04-16 Thread Furkan KAMACI
All in all will replica ask to its leader about where is remaining of data
or it directly asks to Zookeper?

2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com

 Hi,

 No, I believe redirect from replica to leader would happen only at
 index time, so a doc first gets indexed to leader and from there it's
 replicated to non-leader shards.  At query time there is no redirect
 to leader, I imagine, as that would quickly turn leaders into
 hotspots.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Tue, Apr 16, 2013 at 6:01 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  I want to make it clear in my mind:
 
  When a search query comes to a replica what happens?
 
  -Does it forwards the search query to leader and leader collects all the
  data and prepares response (this will cause a performance issue because
  leader is responsible for indexing at same time)
  or
  - replica communicates with leader and learns where is remaining
  data(leaders asks to Zookeper and tells it to replica) and replica
 collects
  all data and response it?



Re: When a search query comes to a replica what happens?

2013-04-16 Thread Otis Gospodnetic
No.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Tue, Apr 16, 2013 at 6:23 PM, Furkan KAMACI furkankam...@gmail.com wrote:
 All in all will replica ask to its leader about where is remaining of data
 or it directly asks to Zookeper?

 2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com

 Hi,

 No, I believe redirect from replica to leader would happen only at
 index time, so a doc first gets indexed to leader and from there it's
 replicated to non-leader shards.  At query time there is no redirect
 to leader, I imagine, as that would quickly turn leaders into
 hotspots.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Tue, Apr 16, 2013 at 6:01 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  I want to make it clear in my mind:
 
  When a search query comes to a replica what happens?
 
  -Does it forwards the search query to leader and leader collects all the
  data and prepares response (this will cause a performance issue because
  leader is responsible for indexing at same time)
  or
  - replica communicates with leader and learns where is remaining
  data(leaders asks to Zookeper and tells it to replica) and replica
 collects
  all data and response it?



Re: How SolrCloud Balance Number of Documents at each Shard?

2013-04-16 Thread Furkan KAMACI
Hi Otis;

Firstly thanks for your answers. So do you mean that hashing mechanism will
randomly route a document into a randomly shard? I want to ask it because I
consider about putting a load balancer in front of my SolrCloud and
manually route some documents into some other shards to avoid bottleneck.

2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com

 They won't be exact, but should be close.  Are you seeing some *big*
 differences?

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Tue, Apr 16, 2013 at 6:11 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Is it possible that different shards have different number of documents
 or
  does SolrCloud balance them?
 
  I ask this question because I want to learn the mechanism behind how Solr
  calculete hash value of the identifier of the document. Is it possible
 that
  hash function produces more documents into one of the shards other than
 any
  of shards. (because this may cause a bottleneck at some leaders of
  SolrCloud)



Re: Push/pull model between leader and replica in one shard

2013-04-16 Thread Otis Gospodnetic
Hi,

Replication when everything is working well is push:
* request comes to any node, ideally leader
* doc is indexed on leader
* doc is copied to replicas

If replica falls too far behind (not exactly sure what the too far
threshold is), it uses pull to replicate the whole index from leader.
Mark can answer the part about where tlog gets replayed to catch up on
docs that were missed while big index replication pull was happening.

This is a good thread to read on this topic:
http://search-lucene.com/m/y1yj218J2v82

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Tue, Apr 16, 2013 at 1:36 AM, SuoNayi suonayi2...@163.com wrote:
 Hi, can someone explain more details about what model is used to sync docs 
 between the lead and
 replica in the shard?
 The model can be push or pull.Supposing I have only one shard that has 1 
 leader and 2 replicas,
 when the leader receives a update request, does it will scatter the request 
 to each available and active
 replica at first and then processes the request locally at last?In this case 
 if the replicas are able to catch
 up with the leader can I think this is a push model that the leader pushes 
 updates to it's replicas?


 What happens if a replica is behind the leader?Will the replica pull docs 
 from the leader and keep
 a track of the coming updates from the lead in a log(called tlog)?If so when 
 it complete pulling docs
 it will replay updates in the tlog at last?




 regards









Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase

2013-04-16 Thread Furkan KAMACI
Thanks again for your answer. If I find any document about such comparisons
that I would like to read.

By the way, is there any advantage for using Lucene instead of anything
else as like that:

Using Lucene is naturally supported at Solr and if I use anything else I
may face with some compatibility problems or communicating issues?


2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com

 People do use other data stores to retrieve data sometimes. e.g. Mongo
 is popular for that.  Like I hinted in another email, I wouldn't
 necessarily recommend this for common cases.  Don't do it unless you
 really know you need it.  Otherwise, just store in Solr.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Tue, Apr 16, 2013 at 5:32 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Hi Otis and Jack;
 
  I have made a research about highlights and debugged code. I see that
  highlight are query dependent and not stored. Why Solr uses Lucene for
  storing text, I mean i.e. content of a web page. Is there any comparison
  about to store texts at Hbase or any other databases versus Lucene.
 
  Also I want to learn that is there anybody who has used anything else
 from
  Lucene to store text of document at our solr user list?
 
  2013/4/11 Otis Gospodnetic otis.gospodne...@gmail.com
 
  Source code is your best bet.  Wiki has info about how to use it, but
  not how highlighting is implemented.  But you don't need to understand
  the implementation details to understand that they are dynamic,
  computed specifically for each query for each matching document, so
  you cannot store them anywhere ahead of time.
 
  Otis
  --
  Solr  ElasticSearch Support
  http://sematext.com/
 
 
 
 
 
  On Thu, Apr 11, 2013 at 11:22 AM, Furkan KAMACI furkankam...@gmail.com
 
  wrote:
   Hi Otis;
  
   It seems that I should read more about highlights. Is there any where
  that
   explains in detail how highlights are generated at Solr?
  
   2013/4/11 Otis Gospodnetic otis.gospodne...@gmail.com
  
   Hi,
  
   You can't store highlights ahead of time because they are query
   dependent.  You could store documents in HBase and use Solr just for
   indexing.  Is that what you want to do?  If so, a custom
   SearchComponent executed after QueryComponent could fetch data from
   external store like HBase.  I'm not sure if I'd recommend that.
  
   Otis
   --
   Solr  ElasticSearch Support
   http://sematext.com/
  
  
  
  
  
   On Thu, Apr 11, 2013 at 10:01 AM, Furkan KAMACI 
 furkankam...@gmail.com
  
   wrote:
Actually I don't think to store documents at Solr. I want to store
  just
highlights (snippets) at Hbase and I want to retrieve them from
 Hbase
   when
needed.
What do you think about separating just highlights from Solr and
  storing
them into Hbase at Solrclod. By the way if you explain at which
  process
   and
how highlights are genareted at Solr you are welcome.
   
   
2013/4/9 Otis Gospodnetic otis.gospodne...@gmail.com
   
You may also be interested in looking at things like solrbase (on
   Github).
   
Otis
--
Solr  ElasticSearch Support
http://sematext.com/
   
   
   
   
   
On Sat, Apr 6, 2013 at 6:01 PM, Furkan KAMACI 
  furkankam...@gmail.com
wrote:
 Hi;

 First of all should mention that I am new to Solr and making a
   research
 about it. What I am trying to do that I will crawl some websites
  with
Nutch
 and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud
  4.2 )

 I wonder about something. I have a cloud of machines that crawls
   websites
 and stores that documents. Then I send that documents into
  SolrCloud.
Solr
 indexes that documents and generates indexes and save them. I
 know
   that
 from Information Retrieval theory: it *may* not be efficient to
  store
 indexes at a NoSQL database (they are something like linked
 lists
  and
   if
 you store them in such kind of database you *may* have a sparse
 representation -by the way there may be some solutions for it.
 If
  you
 explain them you are welcome.)

 However Solr stores some documents too (i.e. highlights) So some
  of my
 documents will be doubled somehow. If I consider that I will
 have
  many
 documents, that dobuled documents may cause a problem for me.
 So is
   there
 any way not storing that documents at Solr and pointing to them
 at
 Hbase(where I save my crawled documents) or instead of pointing
   directly
 storing them at Hbase (is it efficient or not)?
   
  
 



Re: How SolrCloud Balance Number of Documents at each Shard?

2013-04-16 Thread Otis Gospodnetic
Hi,

Routing is not random... have a look at
https://issues.apache.org/jira/browse/SOLR-2341 . In short, you
shouldn't have to route manually from your app.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Tue, Apr 16, 2013 at 6:26 PM, Furkan KAMACI furkankam...@gmail.com wrote:
 Hi Otis;

 Firstly thanks for your answers. So do you mean that hashing mechanism will
 randomly route a document into a randomly shard? I want to ask it because I
 consider about putting a load balancer in front of my SolrCloud and
 manually route some documents into some other shards to avoid bottleneck.

 2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com

 They won't be exact, but should be close.  Are you seeing some *big*
 differences?

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Tue, Apr 16, 2013 at 6:11 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Is it possible that different shards have different number of documents
 or
  does SolrCloud balance them?
 
  I ask this question because I want to learn the mechanism behind how Solr
  calculete hash value of the identifier of the document. Is it possible
 that
  hash function produces more documents into one of the shards other than
 any
  of shards. (because this may cause a bottleneck at some leaders of
  SolrCloud)



Re: SolrCloud Leader Response Mechanism

2013-04-16 Thread Furkan KAMACI
Hi Otis;

You said:

It can just do it because it knows where things are.

Does it learn it from Zookeeper?

2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com

 If query comes to shard X on some node and this shard X is NOT a
 leader, but HAS data, it will just execute the query.  If it needs to
 query shards on other nodes, it will have the info about which shards
 to query and will just do that and aggregate the results.  It doesn't
 have to ask leader for permission, for info, etc.  It can just do it
 because it knows where things are.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Tue, Apr 16, 2013 at 5:23 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Hi Mark;
 
  When I speak with proper terms I want to ask that: is there a data
 locality
  of spatial locality (
 
 http://www.roguewave.com/portals/0/products/threadspotter/docs/2011.2/manual_html_linux/manual_html/ch_intro_locality.html
  - I mean if you have data on your machine, use it and don't search it
  anywhere else, just search for remaining parts) at querying on a leader
 of
  SolrCloud?
 
  2013/4/16 Mark Miller markrmil...@gmail.com
 
  Leaders don't have much to do with querying - the node that you query
 will
  determine what other nodes it has to query to search the whole index
 and do
  a scatter/gather for you. (Though in some cases that request can be
 proxied
  to another node)
 
  - Mark
 
  On Apr 16, 2013, at 7:48 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
 
   When a leader responses for a query, does it says that: If I have the
  data
   what I am looking for, I should build response with it, otherwise I
  should
   find it anywhere. Because it may be long to search it?
   or
   does it says I only index the data, I will tell it to other guys to
 build
   up the response query?
 
 



Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase

2013-04-16 Thread Otis Gospodnetic
Use Solr.  It's pretty clear you don't yet have any problems that
would make you think about alternatives.  Using Solr to store and not
just index will make your life simpler (and your app simpler and
likely faster).

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Tue, Apr 16, 2013 at 6:31 PM, Furkan KAMACI furkankam...@gmail.com wrote:
 Thanks again for your answer. If I find any document about such comparisons
 that I would like to read.

 By the way, is there any advantage for using Lucene instead of anything
 else as like that:

 Using Lucene is naturally supported at Solr and if I use anything else I
 may face with some compatibility problems or communicating issues?


 2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com

 People do use other data stores to retrieve data sometimes. e.g. Mongo
 is popular for that.  Like I hinted in another email, I wouldn't
 necessarily recommend this for common cases.  Don't do it unless you
 really know you need it.  Otherwise, just store in Solr.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Tue, Apr 16, 2013 at 5:32 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Hi Otis and Jack;
 
  I have made a research about highlights and debugged code. I see that
  highlight are query dependent and not stored. Why Solr uses Lucene for
  storing text, I mean i.e. content of a web page. Is there any comparison
  about to store texts at Hbase or any other databases versus Lucene.
 
  Also I want to learn that is there anybody who has used anything else
 from
  Lucene to store text of document at our solr user list?
 
  2013/4/11 Otis Gospodnetic otis.gospodne...@gmail.com
 
  Source code is your best bet.  Wiki has info about how to use it, but
  not how highlighting is implemented.  But you don't need to understand
  the implementation details to understand that they are dynamic,
  computed specifically for each query for each matching document, so
  you cannot store them anywhere ahead of time.
 
  Otis
  --
  Solr  ElasticSearch Support
  http://sematext.com/
 
 
 
 
 
  On Thu, Apr 11, 2013 at 11:22 AM, Furkan KAMACI furkankam...@gmail.com
 
  wrote:
   Hi Otis;
  
   It seems that I should read more about highlights. Is there any where
  that
   explains in detail how highlights are generated at Solr?
  
   2013/4/11 Otis Gospodnetic otis.gospodne...@gmail.com
  
   Hi,
  
   You can't store highlights ahead of time because they are query
   dependent.  You could store documents in HBase and use Solr just for
   indexing.  Is that what you want to do?  If so, a custom
   SearchComponent executed after QueryComponent could fetch data from
   external store like HBase.  I'm not sure if I'd recommend that.
  
   Otis
   --
   Solr  ElasticSearch Support
   http://sematext.com/
  
  
  
  
  
   On Thu, Apr 11, 2013 at 10:01 AM, Furkan KAMACI 
 furkankam...@gmail.com
  
   wrote:
Actually I don't think to store documents at Solr. I want to store
  just
highlights (snippets) at Hbase and I want to retrieve them from
 Hbase
   when
needed.
What do you think about separating just highlights from Solr and
  storing
them into Hbase at Solrclod. By the way if you explain at which
  process
   and
how highlights are genareted at Solr you are welcome.
   
   
2013/4/9 Otis Gospodnetic otis.gospodne...@gmail.com
   
You may also be interested in looking at things like solrbase (on
   Github).
   
Otis
--
Solr  ElasticSearch Support
http://sematext.com/
   
   
   
   
   
On Sat, Apr 6, 2013 at 6:01 PM, Furkan KAMACI 
  furkankam...@gmail.com
wrote:
 Hi;

 First of all should mention that I am new to Solr and making a
   research
 about it. What I am trying to do that I will crawl some websites
  with
Nutch
 and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud
  4.2 )

 I wonder about something. I have a cloud of machines that crawls
   websites
 and stores that documents. Then I send that documents into
  SolrCloud.
Solr
 indexes that documents and generates indexes and save them. I
 know
   that
 from Information Retrieval theory: it *may* not be efficient to
  store
 indexes at a NoSQL database (they are something like linked
 lists
  and
   if
 you store them in such kind of database you *may* have a sparse
 representation -by the way there may be some solutions for it.
 If
  you
 explain them you are welcome.)

 However Solr stores some documents too (i.e. highlights) So some
  of my
 documents will be doubled somehow. If I consider that I will
 have
  many
 documents, that dobuled documents may cause a problem for me.
 So is
   there
 any way not storing that documents at Solr and pointing to them
 at
 Hbase(where I save my crawled documents) or instead of pointing
   directly
 storing them at Hbase (is it efficient or not)?
   
  
 



Re: How do I recover the position and offset a highlight for solr (4.1/4.2)?

2013-04-16 Thread P Williams
Hi,

It doesn't have the offset information, but checkout my patch
https://issues.apache.org/jira/browse/SOLR-4722 which outputs the position
of each term that's been matched.  I'm eager to get some feedback on this
approach and any improvements that might be suggested.

Cheers,
Tricia


On Wed, Mar 27, 2013 at 8:28 AM, Skealler Nametic bchaillou...@gmail.comwrote:

 Hi,

 I would like to retrieve the position and offset of each highlighting
 found.
 I searched on the internet, but I have not found the exact solution to my
 problem...



Re: how to display groups along with matching terms in solr auto-suggestion?

2013-04-16 Thread Otis Gospodnetic
Hi,

Try Solr Suggester, though I'm not sure if you can group with it.
tried http://search-lucene.com/?q=suggester+groupfc_project=Solr but
it doesn't seem to yield much.  If you need to group suggestions like
what you see on http://search-lucene.com/ for example, we use our own
AC from http://sematext.com/products/autocomplete/index.html for that.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Tue, Apr 16, 2013 at 7:38 AM, sharmila thapa shar...@gmail.com wrote:
 Hi,



 I have used Terms for auto-suggestion. But it just list the terms that
 matches terms.prefix from index , along with these term suggestions, I have
 to display the product groups that matches with the input prefix. Is it
 possible in solr auto-suggest? Somebody could please help me on this issue?


Re: SolrCloud Leader Response Mechanism

2013-04-16 Thread Otis Gospodnetic
Oui, ZK holds the map.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Tue, Apr 16, 2013 at 6:33 PM, Furkan KAMACI furkankam...@gmail.com wrote:
 Hi Otis;

 You said:

 It can just do it because it knows where things are.

 Does it learn it from Zookeeper?

 2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com

 If query comes to shard X on some node and this shard X is NOT a
 leader, but HAS data, it will just execute the query.  If it needs to
 query shards on other nodes, it will have the info about which shards
 to query and will just do that and aggregate the results.  It doesn't
 have to ask leader for permission, for info, etc.  It can just do it
 because it knows where things are.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Tue, Apr 16, 2013 at 5:23 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Hi Mark;
 
  When I speak with proper terms I want to ask that: is there a data
 locality
  of spatial locality (
 
 http://www.roguewave.com/portals/0/products/threadspotter/docs/2011.2/manual_html_linux/manual_html/ch_intro_locality.html
  - I mean if you have data on your machine, use it and don't search it
  anywhere else, just search for remaining parts) at querying on a leader
 of
  SolrCloud?
 
  2013/4/16 Mark Miller markrmil...@gmail.com
 
  Leaders don't have much to do with querying - the node that you query
 will
  determine what other nodes it has to query to search the whole index
 and do
  a scatter/gather for you. (Though in some cases that request can be
 proxied
  to another node)
 
  - Mark
 
  On Apr 16, 2013, at 7:48 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
 
   When a leader responses for a query, does it says that: If I have the
  data
   what I am looking for, I should build response with it, otherwise I
  should
   find it anywhere. Because it may be long to search it?
   or
   does it says I only index the data, I will tell it to other guys to
 build
   up the response query?
 
 



Re: Some Questions About Using Solr as Cloud

2013-04-16 Thread Otis Gospodnetic
See
https://issues.apache.org/jira/browse/SOLR-4532
https://issues.apache.org/jira/browse/SOLR-1535
https://issues.apache.org/jira/browse/SOLR-4619

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Tue, Apr 16, 2013 at 7:37 AM, Furkan KAMACI furkankam...@gmail.com wrote:
 Hi Erick;

 Thanks for the explanation. You said:

 You cannot transfer just the indexed form of a document from one
 core to another, you have to re-index the doc. why do you think like that?

 2013/4/16 Erick Erickson erickerick...@gmail.com

 Yes. Every node is really self-contained. When you send a doc to a
 cluster where each shard has a replica, the raw doc is sent to
 each node of that shard and indexed independently.

 About old docs, it's the same as Solr 3.6. Data associated with
 docs stays around in the index until it's merged away.

 You cannot transfer just the indexed form of a document from one
 core to another, you have to re-index the doc.

 Best
 Erick

 On Mon, Apr 15, 2013 at 7:46 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Hi Jack;
 
  I see that SolrCloud makes everything automated. When I use SolrCloud is
 it
  true that: there may be more than one computer responsible for indexing
 at
  any time?
 
  2013/4/15 Jack Krupansky j...@basetechnology.com
 
  There are no masters or slaves in SolrCloud - it's fully distributed.
 Some
  cluster nodes will be leaders (of the shard on that node) at a given
  point in time, but different nodes may be leaders at different points in
  time as they become elected.
 
  In a distributed cluster you would never want to store documents only on
  one node. Sure, you can do that by setting the replication factor to 1,
 but
  that defeats half the purpose for SolrCloud.
 
  Index transfer is automatic - SolrCloud supports fully distributed
 update.
 
  You might be getting confused with the old Master-Slave-Replication
  model that Solr had (and still has) which is distinct from SolrCloud.
 
  -- Jack Krupansky
 
  -Original Message- From: Furkan KAMACI
  Sent: Sunday, April 14, 2013 7:45 PM
  To: solr-user@lucene.apache.org
  Subject: Some Questions About Using Solr as Cloud
 
 
  I read wiki and reading SolrGuide of Lucidworks. However I want to clear
  something in my mind. Here are my questions:
 
  1) Does SolrCloud lets a multi master design (is there any document
 that I
  can read about it)?
  2) Let's assume that I use multiple cores i.e. core A and core B. Let's
  assume that there is a document just indexed at core B. If I send a
 search
  request to core A can I get result?
  3) When I use multi master design (if exists) can I transfer one
 master's
  index data into another (with its slaves or not)?
  4) When I use multi core design can I transfer one index data into
 another
  core or anywhere else?
 
  By the way thanks for the quick responses and kindness at mail list.
 



Re: SolrCloud Leader Response Mechanism

2013-04-16 Thread Furkan KAMACI
Replica asks to Zookeper and Leader does not do anything. Thanks for your
answer Otis.

2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com

 Oui, ZK holds the map.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Tue, Apr 16, 2013 at 6:33 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Hi Otis;
 
  You said:
 
  It can just do it because it knows where things are.
 
  Does it learn it from Zookeeper?
 
  2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com
 
  If query comes to shard X on some node and this shard X is NOT a
  leader, but HAS data, it will just execute the query.  If it needs to
  query shards on other nodes, it will have the info about which shards
  to query and will just do that and aggregate the results.  It doesn't
  have to ask leader for permission, for info, etc.  It can just do it
  because it knows where things are.
 
  Otis
  --
  Solr  ElasticSearch Support
  http://sematext.com/
 
 
 
 
 
  On Tue, Apr 16, 2013 at 5:23 PM, Furkan KAMACI furkankam...@gmail.com
  wrote:
   Hi Mark;
  
   When I speak with proper terms I want to ask that: is there a data
  locality
   of spatial locality (
  
 
 http://www.roguewave.com/portals/0/products/threadspotter/docs/2011.2/manual_html_linux/manual_html/ch_intro_locality.html
   - I mean if you have data on your machine, use it and don't search it
   anywhere else, just search for remaining parts) at querying on a
 leader
  of
   SolrCloud?
  
   2013/4/16 Mark Miller markrmil...@gmail.com
  
   Leaders don't have much to do with querying - the node that you query
  will
   determine what other nodes it has to query to search the whole index
  and do
   a scatter/gather for you. (Though in some cases that request can be
  proxied
   to another node)
  
   - Mark
  
   On Apr 16, 2013, at 7:48 AM, Furkan KAMACI furkankam...@gmail.com
  wrote:
  
When a leader responses for a query, does it says that: If I have
 the
   data
what I am looking for, I should build response with it, otherwise I
   should
find it anywhere. Because it may be long to search it?
or
does it says I only index the data, I will tell it to other guys to
  build
up the response query?
  
  
 



Re: Storing Solr Index on NFS

2013-04-16 Thread Otis Gospodnetic
Yesterday, we spent 1 hour with a client looking at their cluster's
performance metrics SPM, their indexing logs, etc. trying to figure
out why some indexing was slower than it should have been.  We traced
issues to network hickups, to VMs that would move from host to host,
etc.  Really fancy and powerful system in terms of hardware resources,
but in the end a bit too far from just locally attached HDD or SDD
that would not have issues like the ones we found.  I'd stay away from
NFS for the same reason - it's another moving part on the other side
of the network.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Tue, Apr 16, 2013 at 7:15 AM, Furkan KAMACI furkankam...@gmail.com wrote:
 Hi Walter;

 You said: It is not safe to share Solr index files between two Solr
 servers. Why do you think like that?


 2013/4/16 Tim Vaillancourt t...@elementspace.com

 If centralization of storage is your goal by choosing NFS, iSCSI works
 reasonably well with SOLR indexes, although good local-storage will always
 be the overall winner.

 I noticed a near 5% degredation in overall search performance (casual
 testing, nothing scientific) when moving a 40-50GB indexes to iSCSI (10GBe
 network) from a 4x7200rpm RAID 10 local SATA disk setup.

 Tim


 On 15/04/13 09:59 AM, Walter Underwood wrote:

 Solr 4.2 does have field compression which makes smaller indexes. That
 will reduce the amount of network traffic. That probably does not help
 much, because I think the latency of NFS is what causes problems.

 wunder

 On Apr 15, 2013, at 9:52 AM, Ali, Saqib wrote:

  Hello Walter,

 Thanks for the response. That has been my experience in the past as well.
 But I was wondering if there new are things in Solr 4 and NFS 4.1 that
 make
 the storing of indexes on a NFS mount feasible.

 Thanks,
 Saqib


 On Mon, Apr 15, 2013 at 9:47 AM, Walter Underwoodwunder@wunderwood.**
 org wun...@wunderwood.orgwrote:

  On Apr 15, 2013, at 9:40 AM, Ali, Saqib wrote:

  Greetings,

 Are there any issues with storing Solr Indexes on a NFS share? Also any
 recommendations for using NFS for Solr indexes?

 I recommend that you do not put Solr indexes on NFS.

 It can be very slow, I measured indexing as 100X slower on NFS a few
 years
 ago.

 It is not safe to share Solr index files between two Solr servers, so
 there is no benefit to NFS.

 wunder
 --
 Walter Underwood
 wun...@wunderwood.org




  --
 Walter Underwood
 wun...@wunderwood.org







Re: Is cache useful for my scenario?

2013-04-16 Thread Otis Gospodnetic
Hi Sam,

Sounds like you may want to disable caches, yes.  But instead of
guessing, just look at the stats and based on that configure your
caches.  You can get stats from Solr Admin page or, if you need
long-term stats and performance patterns, use SPM for Solr or
something similar.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Tue, Apr 16, 2013 at 5:25 AM, samabhiK qed...@gmail.com wrote:
 Hi,

 I am new in Solr and wish to use version 4.2.x for my app in production. I
 want to show hundreds and thousands of markers on a map with contents coming
 from Solr. As the user moves around the map and pans, the browser will fetch
 data/markers using a BBOX filter (based on the maps' viewport boundary).

 There will be a lot of data that will be indexed in Solr. My question is,
 does caching help in my case? As the filter queries will vary for almost all
 users ( because the viewport latitude/longitude would vary), in what ways
 can I use Caching to increase performance. Should I completely turn off
 caching?

 If you can suggest by your experience, it would be really nice.

 Thanks
 Sam



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Is-cache-useful-for-my-scenario-tp4056250.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr 3.5 core rename issue

2013-04-16 Thread Shawn Heisey

On 4/16/2013 2:39 PM, Jie Sun wrote:

Hi Shawn,
I do have persistent=true in my solr.xml:

?xml version=1.0 encoding=UTF-8 ?
solr persistent=true
   cores adminPath=/admin/cores
 core name=default instanceDir=.//
 core name=413a instanceDir=.//
 core name=blah instanceDir=.//
...
   /cores
/solr

the command I ran was to rename from '413' to '413a'.


I think I see the problem.  You have three cores that all point to the 
same instanceDir, and no dataDir parameter.  Normally the dataDir 
parameter defaults to data in the instanceDir, but perhaps if you have 
multiple cores sharing the instanceDir, it will use the core name 
instead.  With this solr.xml, I can see why you're having a problem. 
The solr.xml file doesn't tell Solr where the dataDir is.


If you set up an explicit dataDir option for each core, then it should 
work out the way you expect it to.  Here's an excerpt from my solr.xml:


core name=s0live instanceDir=/index/solr/cores/s0_1/ 
dataDir=/index/solr/data/s0_1/
core name=s0build instanceDir=cores/s0_0/ 
dataDir=../../data/s0_0/


You are correct about what happens with my directories on a swap, but 
because solr.xml gets updated and has an explicit dataDir for each core, 
everything works.


Thanks,
Shawn



Re: Storing Solr Index on NFS

2013-04-16 Thread Furkan KAMACI
I don't want to bother but I try to understand that part:

When yo perform a commit in solr you have (for an instant) two versions of
the index. The commit produces new segments (with new documents, new
deletions, etc). After creating these new segments a new index searcher is
created and its caches begin to autowarm. At this point the old index
searcher that you were using is still active receiving requests. After the
new index searcher finishes loading and autowarming the old searcher is
discarded.

So does it mean that when I have multiple Solr servers and a shared index,
I should synchronize the caches at that different machines RAMs?

2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com

 Yesterday, we spent 1 hour with a client looking at their cluster's
 performance metrics SPM, their indexing logs, etc. trying to figure
 out why some indexing was slower than it should have been.  We traced
 issues to network hickups, to VMs that would move from host to host,
 etc.  Really fancy and powerful system in terms of hardware resources,
 but in the end a bit too far from just locally attached HDD or SDD
 that would not have issues like the ones we found.  I'd stay away from
 NFS for the same reason - it's another moving part on the other side
 of the network.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Tue, Apr 16, 2013 at 7:15 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Hi Walter;
 
  You said: It is not safe to share Solr index files between two Solr
  servers. Why do you think like that?
 
 
  2013/4/16 Tim Vaillancourt t...@elementspace.com
 
  If centralization of storage is your goal by choosing NFS, iSCSI works
  reasonably well with SOLR indexes, although good local-storage will
 always
  be the overall winner.
 
  I noticed a near 5% degredation in overall search performance (casual
  testing, nothing scientific) when moving a 40-50GB indexes to iSCSI
 (10GBe
  network) from a 4x7200rpm RAID 10 local SATA disk setup.
 
  Tim
 
 
  On 15/04/13 09:59 AM, Walter Underwood wrote:
 
  Solr 4.2 does have field compression which makes smaller indexes. That
  will reduce the amount of network traffic. That probably does not help
  much, because I think the latency of NFS is what causes problems.
 
  wunder
 
  On Apr 15, 2013, at 9:52 AM, Ali, Saqib wrote:
 
   Hello Walter,
 
  Thanks for the response. That has been my experience in the past as
 well.
  But I was wondering if there new are things in Solr 4 and NFS 4.1 that
  make
  the storing of indexes on a NFS mount feasible.
 
  Thanks,
  Saqib
 
 
  On Mon, Apr 15, 2013 at 9:47 AM, Walter Underwoodwunder@wunderwood.
 **
  org wun...@wunderwood.orgwrote:
 
   On Apr 15, 2013, at 9:40 AM, Ali, Saqib wrote:
 
   Greetings,
 
  Are there any issues with storing Solr Indexes on a NFS share? Also
 any
  recommendations for using NFS for Solr indexes?
 
  I recommend that you do not put Solr indexes on NFS.
 
  It can be very slow, I measured indexing as 100X slower on NFS a few
  years
  ago.
 
  It is not safe to share Solr index files between two Solr servers, so
  there is no benefit to NFS.
 
  wunder
  --
  Walter Underwood
  wun...@wunderwood.org
 
 
 
 
   --
  Walter Underwood
  wun...@wunderwood.org
 
 
 
 
 



Re: Push/pull model between leader and replica in one shard

2013-04-16 Thread Mark Miller

On Apr 16, 2013, at 1:36 AM, SuoNayi suonayi2...@163.com wrote:

 Hi, can someone explain more details about what model is used to sync docs 
 between the lead and 
 replica in the shard?
 The model can be push or pull.Supposing I have only one shard that has 1 
 leader and 2 replicas,
 when the leader receives a update request, does it will scatter the request 
 to each available and active 
 replica at first and then processes the request locally at last?In this case 
 if the replicas are able to catch
 up with the leader can I think this is a push model that the leader pushes 
 updates to it's replicas?

Currently, the leader adds the doc locally and then sends it to all replicas 
concurrently.

 
 
 What happens if a replica is behind the leader?Will the replica pull docs 
 from the leader and keep 
 a track of the coming updates from the lead in a log(called tlog)?If so when 
 it complete pulling docs
 it will replay updates in the tlog at last?

If an update forwarded from a leader to a replica fails it's likely because 
that replica died. Just in case, the leader will ask that replica to enter 
recovery.

When a node comes up and is not a leader, it also enters recovery.

Recovery tries to peersync from the leader, and if that fails (works if off by 
about 100 updates), it replicates the entire index.

If you are interested in more details on the SolrCloud architecture, I've given 
a few talks on it - two of them here:

http://vimeo.com/43913870
http://www.youtube.com/watch?v=eVK0wLkLw9w

- Mark



Re: Push/pull model between leader and replica in one shard

2013-04-16 Thread Furkan KAMACI
Really nice presentation.

2013/4/17 Mark Miller markrmil...@gmail.com


 On Apr 16, 2013, at 1:36 AM, SuoNayi suonayi2...@163.com wrote:

  Hi, can someone explain more details about what model is used to sync
 docs between the lead and
  replica in the shard?
  The model can be push or pull.Supposing I have only one shard that has 1
 leader and 2 replicas,
  when the leader receives a update request, does it will scatter the
 request to each available and active
  replica at first and then processes the request locally at last?In this
 case if the replicas are able to catch
  up with the leader can I think this is a push model that the leader
 pushes updates to it's replicas?

 Currently, the leader adds the doc locally and then sends it to all
 replicas concurrently.

 
 
  What happens if a replica is behind the leader?Will the replica pull
 docs from the leader and keep
  a track of the coming updates from the lead in a log(called tlog)?If so
 when it complete pulling docs
  it will replay updates in the tlog at last?

 If an update forwarded from a leader to a replica fails it's likely
 because that replica died. Just in case, the leader will ask that replica
 to enter recovery.

 When a node comes up and is not a leader, it also enters recovery.

 Recovery tries to peersync from the leader, and if that fails (works if
 off by about 100 updates), it replicates the entire index.

 If you are interested in more details on the SolrCloud architecture, I've
 given a few talks on it - two of them here:

 http://vimeo.com/43913870
 http://www.youtube.com/watch?v=eVK0wLkLw9w

 - Mark




Re: Is cache useful for my scenario?

2013-04-16 Thread Chris Hostetter

: There will be a lot of data that will be indexed in Solr. My question is,
: does caching help in my case? As the filter queries will vary for almost all
: users ( because the viewport latitude/longitude would vary), in what ways
: can I use Caching to increase performance. Should I completely turn off
: caching?

you can use the cache localparam on your fq params to disable caching 
of those specific bbox filter queries w/o needing to completley disable 
caching.

that way if you have any other filter queries that could leverage caching, 
or use faceting, etc...) thy can still take advantage of the caches...

http://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters

-Hoss


  1   2   >