How to insert documents into differenet indexes

2012-11-11 Thread tomw
Hi, 

I 've set up a Solr instance with multiple cores to be able to use
different indexes for different applications. The point I'm struggling
with is how do I insert documents into the index running on a specific
core? Any clue appreciated.

best


-- 
tomw t...@ubilix.com



Re: How to insert documents into differenet indexes

2012-11-11 Thread Rafał Kuć
Hello!

Just use the update handler that is specific to a given core. For
example if you have two cores named core1 and core2, you should use
the following addresses (if you didn't change the default
configuration):

/solr/core1/update/

and

/solr/core2/update/

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi, 

 I 've set up a Solr instance with multiple cores to be able to use
 different indexes for different applications. The point I'm struggling
 with is how do I insert documents into the index running on a specific
 core? Any clue appreciated.

 best




Re: How to insert documents into differenet indexes

2012-11-11 Thread tomw

 Just use the update handler that is specific to a given core. For
 example if you have two cores named core1 and core2, you should use
 the following addresses (if you didn't change the default
 configuration):
 
 /solr/core1/update/
 
 and
 
 /solr/core2/update/
 
Thanks, that seems to work. Life can be so simple. Unfortunately this
case isn't mentioned in any of the sections covering updates in the
wiki.




Re: More references for configuring Solr

2012-11-11 Thread Dmitry Kan
Hi,

here are some resources:
http://wiki.apache.org/solr/ (Solr wiki)
http://lucene.apache.org/solr/books.html (books published on Solr)

the goes googling on a specific topic. But before reading a book might not
be a bad idea..

-- Dmitry

On Sat, Nov 10, 2012 at 1:15 PM, FARAHZADI, EMAD
emad.farahz...@netapp.comwrote:

  Dear Sir or Madam,

 ** **

 I want to use to Solr for my final project in university in part of
 searching and indexing.

 I’d be appreciated if you send me more resources or documentations about
 Solr.

 ** **

 Regards

 ** **

 ** **


 *

 Emad Farahzadi [image: brand-site-home-telescope-160x95]*

 *Professional Services Consultant*

 *NetApp Middle-East*

 *
 **Office: +971 4   4466203
 Cell:+971 50 9197237*

 ** **

 *NetApp MEA (Middle East   Africa)
 Office No. 214
 Building 2, 2nd Floor
 Dubai Internet City
 P.O. Box 500199
 Dubai, U.A.E. *

  [image: netapp-cloud-esig-dollar]

 ** **




-- 
Regards,

Dmitry Kan


Re: How to insert documents into differenet indexes

2012-11-11 Thread Gora Mohanty
On 11 November 2012 15:06, tomw t...@ubilix.com wrote:
[...]
 Thanks, that seems to work. Life can be so simple. Unfortunately this
 case isn't mentioned in any of the sections covering updates in the
 wiki.

While this could be made clearer, it should not be very difficult
to guess at the update URL for a specific core in a multi-core
setup from the examples in http://wiki.apache.org/solr/CoreAdmin

http://wiki.apache.org/solr/DataImportHandler#Full_Import_Example
also mentions multiple cores in passing.

Regards,
Gora


Re: custom request handler

2012-11-11 Thread Lee Carroll
Only slaves are public facing and they are read only, with limited query
request handlers defined. The above approach is to prevent abusive / in
appropriate queries by clients. A query component sounds interesting would
this be implemented through an interface so could be separate from solr or
would it be sub classing a base component ?

cheers lee c


On 9 November 2012 17:24, Amit Nithian anith...@gmail.com wrote:

 Lee,

 I guess my question was if you are trying to prevent the big bad world
 from doing stuff they aren't supposed to in Solr, how are you going to
 prevent the big bad world from POSTing a delete all query? Or restrict
 them from hitting the admin console, looking at the schema.xml,
 solrconfig.xml.

 I guess the question here is who is the big bad world? The internet at
 large or employees/colleagues in your organization? If it's the internet at
 large then I'd totally decouple this from Solr b/c I want to be 100% sure
 that the *only* thing that the internet has access to is a GET on /select
 with some restrictions and this could be done in many places but it's not
 clear that coupling this to Solr is the place to do it.

 If the big bad world is just within your organization and you want some
 basic protections around what they can and can't see then what you are
 saying is reasonable to me. Also perhaps another option is to consider a
 query component rather than creating a sublcass of the request handler as a
 query component promotes more re-use and flexibility. You could make the
 necessary parameter changes in the prepare() method and just make sure that
 this safe parameter component comes before the query component in the
 list of components for a handler and you should be fine.

 Cheers!
 Amit


 On Fri, Nov 9, 2012 at 5:39 AM, Lee Carroll lee.a.carr...@googlemail.com
 wrote:

  Hi Amit
 
  I did not do this via a servlet filter as I wanted the solr devs to be
  concerned with solr config and keep them out of any concerns of the
  container. By specifying declarative data in a request handler that would
  be enough to produce a service uri for an application.
 
  Or have  I missed a point ? We have several cores with several apps all
  with different data query needs. Maybe 20 request handlers needed to
  support this with active development on going. Basically I want it easy
 for
  devs to create a specific request handler suited to their needs. I
 thought
  a servletfilter developed and mainatined every time would be over kill.
  Again though I may have missed a point / over emphasised a difficulty?
 
  Are you saying my custom request handler is to tightly bound to solr? so
  the parameters my apps talk is not de-coupled enough from solr?
 
  Lee C
 
  On 7 November 2012 19:49, Amit Nithian anith...@gmail.com wrote:
 
   Why not do this in a ServletFilter? Alternatively, I'd just write a
 front
   end application servlet to do this so that you don't firewall your
  internal
   admins off from accessing the core Solr admin pages. I guess you could
   solve this using some form of security but I don't know this well
 enough.
  
   If I were to restrict access to certain parts of Solr, I'd do this
  outside
   of Solr itself and do this in a servlet or a filter, inspecting the
   parameters. It's easy to create a modifiable parameters class and
   populate that with acceptable parameters before the Solr filter
 operates
  on
   it.
  
   HTH
   Amit
  
  
  
  
 



Internal Vs. External ZooKeeper

2012-11-11 Thread Nick Chase
OK, I can't find a definitive answer on this.  The wiki says not to use 
the embedded ZooKeeper servers for production.  But my question is: why 
not?  Basically, what are the reasons and circumstances that make you 
better off using an external ZooKeeper ensemble?


Thanks...

 Nick


Re: Internal Vs. External ZooKeeper

2012-11-11 Thread Jack Krupansky
Production typically implies high availability and in a distributed 
system the goal is that the overall cluster integrity and performance should 
not be compromised just because a few worker nodes go down. Solr nodes do 
a lot of complex operations and are quite prone to running into issues 
that compromise their integrity and require that they be taken down, 
restarted, etc. In fact, taking down a bunch of Solr worker nodes should 
not be a big deal (unless they are all of the nodes/replicas from a single 
shard/slice), while taking down a bunch of zookeepers could be 
catastrophic to maintaining the integrity of the zookeeper ensemble. (OTOH, 
if every Solr node is also a zookeeper node, a bunch of Solr nodes would 
generally be less than a quorum, so maybe that is not an absolute issue per 
se.) Zookeeper nodes are categorically distinct in terms of their importance 
to maintaining the integrity and availability of the overall cluster. They 
are special in that sense. And they are special because they are maintaining 
the integrity of the cluster's configuration information. Even for large 
clusters their number will be relatively few compared to the many of 
worker nodes (replicas), so zookeeper nodes need to be protected from 
the vagaries that can disrupt and take Solr nodes down, not the least of 
which is incoming traffic.


I'm not sure what the implications would be if you had a large cluster and 
because Zookeeper was embedded you had a large number of zookeepers. Any of 
the inter-zookeeper operations would take longer and could be compromised by 
even a single busy/overloaded/dead Solr node. OTOH, the Zookeeper ensemble 
design is supposed to be able to handle a far number of missing zookeeper 
nodes.


OTOH, if high availability is not a requirement for a production cluster 
(use case?), then non-embedded zookeepers are certainly an annoyance.


Maybe you could think of embedded zookeeper like every employee having their 
manager sitting right next to them all the time. How could that be anything 
but a bad idea in terms of maximizing worker output - and 
distracting/preventing managers from focusing on their own work?


-- Jack Krupansky

-Original Message- 
From: Nick Chase

Sent: Sunday, November 11, 2012 7:12 AM
To: solr-user@lucene.apache.org
Subject: Internal Vs. External ZooKeeper

OK, I can't find a definitive answer on this.  The wiki says not to use
the embedded ZooKeeper servers for production.  But my question is: why
not?  Basically, what are the reasons and circumstances that make you
better off using an external ZooKeeper ensemble?

Thanks...

 Nick 



Re: Preventing accepting queries while custom QueryComponent starts up?

2012-11-11 Thread Jack Krupansky
Is the issue here that the Solr node is continuously live with the load 
balancer so that the moment during startup that Solr can respond to 
anything, the load balancer will be sending it traffic and that this can 
occur while Solr is still warming up?


First, shouldn't we be encouraging people to have an app layer between Solr 
and the outside world? If so, the app layer should simply not respond to 
traffic until the app layer can verified that Solr has stabilized. If not, 
then maybe we do need to suggest a change to Solr so that the developer can 
control exactly when Solr becomes live and responsive to incoming traffic.


At a minimum, we should document when that moment is today in terms of an 
explicit contract. It sounds like the problem is that the contract is either 
nonexistent, vague, ambiguous, non-deterministic, or whatever.


-- Jack Krupansky

-Original Message- 
From: Amit Nithian

Sent: Saturday, November 10, 2012 4:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Preventing accepting queries while custom QueryComponent starts 
up?


Yeah that's what I was suggesting in my response too. I don't think your
load balancer should be doing this but whatever script does the release
(restarting the container) should do this so that when the ping is enabled
the warming has finished.


On Sat, Nov 10, 2012 at 3:33 PM, Erick Erickson 
erickerick...@gmail.comwrote:


Hmmm, rather than hit the ping query, why not just send in a real query 
and

only let the queued ones through after the response?

Just a random thought
Erick


On Sat, Nov 10, 2012 at 2:53 PM, Amit Nithian anith...@gmail.com wrote:

 Yes but the problem is that if user facing queries are hitting a server
 that is warming up and isn't being serviced quickly, then you could
 potentially bring down your site if all the front end threads are 
 blocked

 on Solr queries b/c those queries are waiting (presumably at the
container
 level since the filter hasn't finished its init() sequence) for the
warming
 to complete (this is especially notorious when your front end is rails).
 This is why your ping to enable/disable a server from the load balancer
has
 to be accurate with regards to whether or not a server is truly ready 
 and

 warm.

 I think what I am gathering from this discussion is that the server is
 warming up, the ping is going through and tells the load balancer this
 server is ready, user queries are hitting this server and are queued
 waiting for the firstSearcher to finish (say these initial user queries
are
 to respond in 500-1000ms) that's terrible for performance.

 Alternatively, if you have a bunch of servers behind a load balancer, 
 you

 want this one server (or block of servers depending on your deployment)
to
 be reasonably sure that user queries will return in a decent time
(whatever
 you define decent to be) hence why this matters.

 Let me know if I am missing anything.

 Thanks
 Amit


 On Sat, Nov 10, 2012 at 10:03 AM, Erick Erickson 
erickerick...@gmail.com
 wrote:

  Why does it matter? The whole idea of firstSearcher queries is to warm
up
  your system as fast as possible. The theory is that upon restarting 
  the

  server, let's bet this stuff going immediately... They were never
 intended
  (as far as I know) to complete before any queries were handled. As an
  aside, I'm not quite sure I understand why pings during the warmup are
a
  problem.
 
  But anyway. firstSearcher is particularly relevant because the
  autowarmCount settings on your caches are irrelevant when starting the
  server, there's no history to autowarm
 
  But, there's no good reason _not_ to let queries through while
  firstSearcher is doing it's tricks, they just get into the queue and
are
  served as quickly as they may. That might be some time since, as you
say,
  they may not get serviced until the expensive parts get filled. But I
 don't
  think having them be serviced is doing any harm.
 
  Now, newSearcher and autowarming of the caches is a completely
different
  beast since having the old searchers continue serving requests until
the
  warmups _does_ directly impact the user, they don't see random 
  slowness

  because a searcher is being opened.
 
  So I guess my real question is whether you're seeing a measurable
problem
  or if this is a red herring
 
  FWIW,
  Erick
 
 
  On Thu, Nov 8, 2012 at 2:54 PM, Aaron Daubman daub...@gmail.com
wrote:
 
   Greetings,
  
   I have several custom QueryComponents that have high one-time 
   startup

  costs
   (hashing things in the index, caching things from a RDBMS, etc...)
  
   Is there a way to prevent solr from accepting connections before all
   QueryComponents are ready?
  
   Especially, since many of our instance are load-balanced (and
   added-in/removed automatically based on admin/ping responses)
 preventing
   ping from answering prior to all custom QueryComponents being ready
 would
   be ideal...
  
   Thanks,
Aaron
  
 






Re: Internal Vs. External ZooKeeper

2012-11-11 Thread Nick Chase
Thanks, Jack, this is a great explanation!  And since a greater number 
of ZK nodes tends to degrade write performance, that would be a factor 
in making every Solr node a ZK node as well.  Much obliged!


  Nick

On 11/11/2012 10:45 AM, Jack Krupansky wrote:

Production typically implies high availability and in a distributed
system the goal is that the overall cluster integrity and performance
should not be compromised just because a few worker nodes go down.
Solr nodes do a lot of complex operations and are quite prone to running
into issues that compromise their integrity and require that they be
taken down, restarted, etc. In fact, taking down a bunch of Solr
worker nodes should not be a big deal (unless they are all of the
nodes/replicas from a single shard/slice), while taking down a bunch
of zookeepers could be catastrophic to maintaining the integrity of the
zookeeper ensemble. (OTOH, if every Solr node is also a zookeeper node,
a bunch of Solr nodes would generally be less than a quorum, so maybe
that is not an absolute issue per se.) Zookeeper nodes are categorically
distinct in terms of their importance to maintaining the integrity and
availability of the overall cluster. They are special in that sense. And
they are special because they are maintaining the integrity of the
cluster's configuration information. Even for large clusters their
number will be relatively few compared to the many of worker nodes
(replicas), so zookeeper nodes need to be protected from the vagaries
that can disrupt and take Solr nodes down, not the least of which is
incoming traffic.

I'm not sure what the implications would be if you had a large cluster
and because Zookeeper was embedded you had a large number of zookeepers.
Any of the inter-zookeeper operations would take longer and could be
compromised by even a single busy/overloaded/dead Solr node. OTOH, the
Zookeeper ensemble design is supposed to be able to handle a far number
of missing zookeeper nodes.

OTOH, if high availability is not a requirement for a production cluster
(use case?), then non-embedded zookeepers are certainly an annoyance.

Maybe you could think of embedded zookeeper like every employee having
their manager sitting right next to them all the time. How could that be
anything but a bad idea in terms of maximizing worker output - and
distracting/preventing managers from focusing on their own work?

-- Jack Krupansky

-Original Message- From: Nick Chase
Sent: Sunday, November 11, 2012 7:12 AM
To: solr-user@lucene.apache.org
Subject: Internal Vs. External ZooKeeper

OK, I can't find a definitive answer on this.  The wiki says not to use
the embedded ZooKeeper servers for production.  But my question is: why
not?  Basically, what are the reasons and circumstances that make you
better off using an external ZooKeeper ensemble?

Thanks...

 Nick



zkcli issues

2012-11-11 Thread Nick Chase

OK, so this is my ZooKeeper week, sorry. :)

So I'm trying to use ZkCLI without success.  I DID start and stop Solr 
in non-cloud mode, so everything is extracted and it IS finding 
zookeeper*.jar.  However, now it's NOT finding SolrJ.


I even tried to run it from the provided script (in cloud-scripts) with 
no success.  Here's what I've got:


 cd my-solr-install

.\example\cloud-scripts\zkcli.bat -cmd upconfig -zkhost 
localhost:9983 -confdir example/solr/collection/conf -confname conf1 
-solrhome example/solr


set JVM=java

set SDIR=C:\sw\apache-solr-4.0.0\example\cloud-scripts\

if \ == \ set SDIR=C:\sw\apache-solr-4.0.0\example\cloud-scripts

java -classpath 
C:\sw\apache-solr-4.0.0\example\cloud-scripts\..\solr-webapp\webapp\WEB-INF\lib\* 
org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost localhost:9983 
-confdir example/solr/collection/conf -confname conf1 -solrhome example/solr


Error: Could not find or load main class 
C:\sw\apache-solr-4.0.0\example\cloud-scripts\..\solr-webapp\webapp\WEB-INF\lib\apache-solr-solrj-4.0.0.jar


I've verified that 
C:\sw\apache-solr-4.0.0\example\cloud-scripts\..\solr-webapp\webapp\WEB-INF\lib\apache-solr-solrj-4.0.0.jar 
exists, so I'm really at a loss here.


Thanks...

  Nick


Re: zkcli issues

2012-11-11 Thread Yonik Seeley
On Sun, Nov 11, 2012 at 10:39 PM, Nick Chase nch...@earthlink.net wrote:
 So I'm trying to use ZkCLI without success.  I DID start and stop Solr in
 non-cloud mode, so everything is extracted and it IS finding zookeeper*.jar.
 However, now it's NOT finding SolrJ.

Not sure about your specific problem in this case, but I chatted with
Mark about this while at ApacheCon... it seems like we should be able
to explode the WAR ourselves if necessary, eliminating the need to
start Solr first.  Just throwing it out there before I forgot about it
;-)

-Yonik
http://lucidworks.com


Re: Apache Nutch 1.5.1 + Apache Solr 4.0

2012-11-11 Thread Dave Meikle
Hi,

On 8 Nov 2012, at 15:00, Markus Jelsma markus.jel...@openindex.io wrote:

 Hm, i copied the schema from Nutch' trunk verbatim and only had to change the 
 stemmer.  It seems like you have, for some reason, a float with an extra 
 point dangling around somewhere. Can you check?

Just building a Nutch 1.5.1 environment and found this too.  It is actually the 
version number in the schema.xml[1] and schema-solr4.xml[2]'s for the 1.5.1 
branch that is the problem.

In these file the version number reads:
schema name=nutch version=1.5.1

Whereas in trunk[3] it is:
schema name=nutch version=1.5

Obviously as the field is read as a float in the IndexSchema class 1.5.1 will 
fail due to the extra float.  A quick change back to 1.5 in the file should 
solve things.

Cheers,
Dave

[1] http://svn.apache.org/repos/asf/nutch/branches/branch-1.5.1/conf/schema.xml
[2] 
http://svn.apache.org/repos/asf/nutch/branches/branch-1.5.1/conf/schema-solr4.xml
[3] http://svn.apache.org/repos/asf/nutch/trunk/conf/schema.xml





Re: Internal Vs. External ZooKeeper

2012-11-11 Thread Anirudha Jadhav
let me see if i get this correctly,

greater the no.of zookeeper nodes , more the time it takes to come to a
consensus.

During an indexing operation, how many times does a solr client needs to
contact zookeeper for consensus ?
- per docs ? per commit ? ?

thanks,
Ani


On Sun, Nov 11, 2012 at 11:17 AM, Nick Chase nch...@earthlink.net wrote:

 Thanks, Jack, this is a great explanation!  And since a greater number of
 ZK nodes tends to degrade write performance, that would be a factor in
 making every Solr node a ZK node as well.  Much obliged!

   Nick


 On 11/11/2012 10:45 AM, Jack Krupansky wrote:

 Production typically implies high availability and in a distributed
 system the goal is that the overall cluster integrity and performance
 should not be compromised just because a few worker nodes go down.
 Solr nodes do a lot of complex operations and are quite prone to running
 into issues that compromise their integrity and require that they be
 taken down, restarted, etc. In fact, taking down a bunch of Solr
 worker nodes should not be a big deal (unless they are all of the
 nodes/replicas from a single shard/slice), while taking down a bunch
 of zookeepers could be catastrophic to maintaining the integrity of the
 zookeeper ensemble. (OTOH, if every Solr node is also a zookeeper node,
 a bunch of Solr nodes would generally be less than a quorum, so maybe
 that is not an absolute issue per se.) Zookeeper nodes are categorically
 distinct in terms of their importance to maintaining the integrity and
 availability of the overall cluster. They are special in that sense. And
 they are special because they are maintaining the integrity of the
 cluster's configuration information. Even for large clusters their
 number will be relatively few compared to the many of worker nodes
 (replicas), so zookeeper nodes need to be protected from the vagaries
 that can disrupt and take Solr nodes down, not the least of which is
 incoming traffic.

 I'm not sure what the implications would be if you had a large cluster
 and because Zookeeper was embedded you had a large number of zookeepers.
 Any of the inter-zookeeper operations would take longer and could be
 compromised by even a single busy/overloaded/dead Solr node. OTOH, the
 Zookeeper ensemble design is supposed to be able to handle a far number
 of missing zookeeper nodes.

 OTOH, if high availability is not a requirement for a production cluster
 (use case?), then non-embedded zookeepers are certainly an annoyance.

 Maybe you could think of embedded zookeeper like every employee having
 their manager sitting right next to them all the time. How could that be
 anything but a bad idea in terms of maximizing worker output - and
 distracting/preventing managers from focusing on their own work?

 -- Jack Krupansky

 -Original Message- From: Nick Chase
 Sent: Sunday, November 11, 2012 7:12 AM
 To: solr-user@lucene.apache.org
 Subject: Internal Vs. External ZooKeeper

 OK, I can't find a definitive answer on this.  The wiki says not to use
 the embedded ZooKeeper servers for production.  But my question is: why
 not?  Basically, what are the reasons and circumstances that make you
 better off using an external ZooKeeper ensemble?

 Thanks...

  Nick




-- 
Anirudha P. Jadhav


Re: Internal Vs. External ZooKeeper

2012-11-11 Thread Mark Miller
When SolrCloud is in a steady state (eg the number of nodes in the 
cluster is not changing and config is not changing), Solr does not 
really talk to ZooKeeper other than really light stuff like a heartbeat 
and maintaining a connection. So performance is not likely a large 
concern here.


Mostly it's just a hassle because ZooKeeper does not currently support 
dynamically changing the nodes in an ensemble without doing a rolling 
restart. There are JIRA issues that are being worked on that will help 
with this though.


Until then, it's just kind of a pain that some nodes have to be special 
or you have to do rolling restarts to make additional nodes part of the 
zk quorum.


It's really up to you though - having the services separate just seems 
nicer to me. Easier to maintain. Often, once you start running 
ZooKeeper for one thing, you may end up running other things that use 
ZooKeeper as well - many people like to colocate this stuff on a single 
dedicated ZooKeeper ensemble.


Embedded will run just fine - we simply recommend the other way to save 
headaches. If you know what you are getting into, it's certainly a valid 
choice.


- Mark

On 11/11/2012 05:11 PM, Anirudha Jadhav wrote:

let me see if i get this correctly,

greater the no.of zookeeper nodes , more the time it takes to come to a
consensus.

During an indexing operation, how many times does a solr client needs to
contact zookeeper for consensus ?
- per docs ? per commit ? ?

thanks,
Ani


On Sun, Nov 11, 2012 at 11:17 AM, Nick Chase nch...@earthlink.net wrote:


Thanks, Jack, this is a great explanation!  And since a greater number of
ZK nodes tends to degrade write performance, that would be a factor in
making every Solr node a ZK node as well.  Much obliged!

  Nick


On 11/11/2012 10:45 AM, Jack Krupansky wrote:


Production typically implies high availability and in a distributed
system the goal is that the overall cluster integrity and performance
should not be compromised just because a few worker nodes go down.
Solr nodes do a lot of complex operations and are quite prone to running
into issues that compromise their integrity and require that they be
taken down, restarted, etc. In fact, taking down a bunch of Solr
worker nodes should not be a big deal (unless they are all of the
nodes/replicas from a single shard/slice), while taking down a bunch
of zookeepers could be catastrophic to maintaining the integrity of the
zookeeper ensemble. (OTOH, if every Solr node is also a zookeeper node,
a bunch of Solr nodes would generally be less than a quorum, so maybe
that is not an absolute issue per se.) Zookeeper nodes are categorically
distinct in terms of their importance to maintaining the integrity and
availability of the overall cluster. They are special in that sense. And
they are special because they are maintaining the integrity of the
cluster's configuration information. Even for large clusters their
number will be relatively few compared to the many of worker nodes
(replicas), so zookeeper nodes need to be protected from the vagaries
that can disrupt and take Solr nodes down, not the least of which is
incoming traffic.

I'm not sure what the implications would be if you had a large cluster
and because Zookeeper was embedded you had a large number of zookeepers.
Any of the inter-zookeeper operations would take longer and could be
compromised by even a single busy/overloaded/dead Solr node. OTOH, the
Zookeeper ensemble design is supposed to be able to handle a far number
of missing zookeeper nodes.

OTOH, if high availability is not a requirement for a production cluster
(use case?), then non-embedded zookeepers are certainly an annoyance.

Maybe you could think of embedded zookeeper like every employee having
their manager sitting right next to them all the time. How could that be
anything but a bad idea in terms of maximizing worker output - and
distracting/preventing managers from focusing on their own work?

-- Jack Krupansky

-Original Message- From: Nick Chase
Sent: Sunday, November 11, 2012 7:12 AM
To: solr-user@lucene.apache.org
Subject: Internal Vs. External ZooKeeper

OK, I can't find a definitive answer on this.  The wiki says not to use
the embedded ZooKeeper servers for production.  But my question is: why
not?  Basically, what are the reasons and circumstances that make you
better off using an external ZooKeeper ensemble?

Thanks...

 Nick








Re: zkcli issues

2012-11-11 Thread Mark Miller

On 11/11/2012 04:47 PM, Yonik Seeley wrote:

On Sun, Nov 11, 2012 at 10:39 PM, Nick Chase nch...@earthlink.net wrote:

So I'm trying to use ZkCLI without success.  I DID start and stop Solr in
non-cloud mode, so everything is extracted and it IS finding zookeeper*.jar.
However, now it's NOT finding SolrJ.

Not sure about your specific problem in this case, but I chatted with
Mark about this while at ApacheCon... it seems like we should be able
to explode the WAR ourselves if necessary, eliminating the need to
start Solr first.  Just throwing it out there before I forgot about it
;-)

-Yonik
http://lucidworks.com


I guess the tricky part might be knowing where to extract it. We know 
how to do it for the default jetty setup, but that could be reconfigured 
or you could be using another web container.


Kind of annoying.

- Mark


Solr 4.0 - distributed updates without zookeeper?

2012-11-11 Thread Peter Wolanin
Looking at how we could upgrade some of our infrastructure to Solr 4.0
- I would really like to take advantage of distributed updates to get
NRT, but we want to keep our fixed master and slave server roles since
we use different hardware appropriate to the different roles.

Looking at the solr 4.0 distributed update code, it seems really
hard-coded and bound to zookeeper.  Is there a way to have a solr
master distribute updates without using ZK, or a way to mock the ZK
interface to provide a fixed cluster topography that will work when
sending updates just to the master?

To be clear, if the master goes doen I don't want a slave promoted,
nor do I want most of the other SolrCloud features - we have already
built out a system for managing groups of servers.

Thanks,

Peter


Re: Apache Nutch 1.5.1 + Apache Solr 4.0

2012-11-11 Thread Iwan Hanjoyo
Hi Steiner,

I found a video tutorial on Nutch 1.4 + Solr 3.4.0 (on Windows).
It do solve my error. Hope it do for yours too.
Here is the link:
Running Nutch and Solr on Windows Tutorial: Part 1
http://www.youtube.com/watch?v=baxhI6Wkov8
Running Nutch and Solr on Windows Tutorial: Part 2
http://www.youtube.com/watch?v=Qs-18hRRpNU
Running Nutch and Solr on Windows Tutorial: Part 3
http://www.youtube.com/watch?v=GtbDHiYrlNE

Published on Mar 15, 2012 by Dutedute2

Kind regards,


Hanjoyo

On Thu, Nov 8, 2012 at 4:52 PM, Antony Steiner ant.stei...@gmail.comwrote:

 Hello my name is Antony and I'm new to apache nutch and solr.

 I want to crawl my website and therefore I downloaded nutch to do this.
 This works fine. But no I would like to integrate nutch with solr. Im
 running this on my unix system.
 Im trying to follow this tutorial:
 http://wiki.apache.org/nutch/NutchTutorial
 But it wont for me. Running Solr without nutch is no problem. I can post
 documents to solr with post.jar. But what I want to do is post my nutch
 crawl to solr.
 Now if I copy the schema.xml from nutch to
 apache-solr-4.0.0/example/solr/collection1/conf directory aned restart solr
 (java -jar start.jar), I get compiling errors but Solr will start. (Is this
 the correct directory to copy my schema?)

 Nov 8, 2012 9:40:33 AM org.apache.solr.schema.IndexSchema readSchema
 INFO: Schema name=nutch
 Nov 8, 2012 9:40:33 AM org.apache.solr.core.CoreContainer create
 SEVERE: Unable to create core: collection1
 org.apache.solr.common.SolrException: Schema Parsing Failed: multiple
 points
 at
 org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:571)
 at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:113)
 ...

 Nov 8, 2012 9:40:33 AM org.apache.solr.common.SolrException log
 SEVERE: null:org.apache.solr.common.SolrException: Schema Parsing Failed:
 multiple points
 at
 org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:571)
 at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:113)
 at
 org.apache.solr.core.CoreContainer.create(CoreContainer.java:846)
 ...

 Now if I don't copy the schema and push my nutch crawl to solr I get
 following error:

 SolrIndexer: starting at 2012-11-08 10:49:02
 Indexing 5 documents
 java.io.IOException: Job failed!
 SolrDeleteDuplicates: starting at 2012-11-08 10:49:47
 SolrDeleteDuplicates: Solr url: http://photon:8983/solr/

 And this is taken from the logging:
 org.apache.solr.common.SolrException: ERROR: [doc=
 http://e-docs/infrastructure/cpuload_monitor.html] unknown field 'host'

 What should I do or what am I missing?

 I hope you can help me
 Best Regards
 Antony



Re: custom request handler

2012-11-11 Thread Amit Nithian
Hi Lee,

So the query component would be a subclass of SearchComponent and you can
define the list of components executed during a search handler.
http://wiki.apache.org/solr/SearchComponent

I *think* you can have a custom component do what you want as long as it's
the first component in the list so you can inspect and re-set the
parameters before it goes downstream to the other components. However, it's
still not clear how you are going to prevent users from POSTing bad queries
or looking at things they probably shouldn't be like the schema.xml or
solrconfig.xml or the admin console. Maybe there are ways in Solr to
prevent this but then you'd have to allow it for internal admins but
exclude it for the public.

If you are exposing your slaves to the actual world wide public then I'd
strongly suggest an app layer between solr and the public. I treat Solr
like my database meaning that I don't expose access to my database publicly
but rather through some app layer (say some CMS tools or what not).

HTH!
Amit


On Sun, Nov 11, 2012 at 5:23 AM, Lee Carroll
lee.a.carr...@googlemail.comwrote:

 Only slaves are public facing and they are read only, with limited query
 request handlers defined. The above approach is to prevent abusive / in
 appropriate queries by clients. A query component sounds interesting would
 this be implemented through an interface so could be separate from solr or
 would it be sub classing a base component ?

 cheers lee c


 On 9 November 2012 17:24, Amit Nithian anith...@gmail.com wrote:

  Lee,
 
  I guess my question was if you are trying to prevent the big bad world
  from doing stuff they aren't supposed to in Solr, how are you going to
  prevent the big bad world from POSTing a delete all query? Or restrict
  them from hitting the admin console, looking at the schema.xml,
  solrconfig.xml.
 
  I guess the question here is who is the big bad world? The internet at
  large or employees/colleagues in your organization? If it's the internet
 at
  large then I'd totally decouple this from Solr b/c I want to be 100% sure
  that the *only* thing that the internet has access to is a GET on /select
  with some restrictions and this could be done in many places but it's not
  clear that coupling this to Solr is the place to do it.
 
  If the big bad world is just within your organization and you want some
  basic protections around what they can and can't see then what you are
  saying is reasonable to me. Also perhaps another option is to consider a
  query component rather than creating a sublcass of the request handler
 as a
  query component promotes more re-use and flexibility. You could make the
  necessary parameter changes in the prepare() method and just make sure
 that
  this safe parameter component comes before the query component in the
  list of components for a handler and you should be fine.
 
  Cheers!
  Amit
 
 
  On Fri, Nov 9, 2012 at 5:39 AM, Lee Carroll 
 lee.a.carr...@googlemail.com
  wrote:
 
   Hi Amit
  
   I did not do this via a servlet filter as I wanted the solr devs to be
   concerned with solr config and keep them out of any concerns of the
   container. By specifying declarative data in a request handler that
 would
   be enough to produce a service uri for an application.
  
   Or have  I missed a point ? We have several cores with several apps all
   with different data query needs. Maybe 20 request handlers needed to
   support this with active development on going. Basically I want it easy
  for
   devs to create a specific request handler suited to their needs. I
  thought
   a servletfilter developed and mainatined every time would be over kill.
   Again though I may have missed a point / over emphasised a difficulty?
  
   Are you saying my custom request handler is to tightly bound to solr?
 so
   the parameters my apps talk is not de-coupled enough from solr?
  
   Lee C
  
   On 7 November 2012 19:49, Amit Nithian anith...@gmail.com wrote:
  
Why not do this in a ServletFilter? Alternatively, I'd just write a
  front
end application servlet to do this so that you don't firewall your
   internal
admins off from accessing the core Solr admin pages. I guess you
 could
solve this using some form of security but I don't know this well
  enough.
   
If I were to restrict access to certain parts of Solr, I'd do this
   outside
of Solr itself and do this in a servlet or a filter, inspecting the
parameters. It's easy to create a modifiable parameters class and
populate that with acceptable parameters before the Solr filter
  operates
   on
it.
   
HTH
Amit
   
   
   
   
  
 



Re: Preventing accepting queries while custom QueryComponent starts up?

2012-11-11 Thread Amit Nithian
Jack,

I think the issue is that the ping which is used to determine whether or
not the server is live returns a seemingly false positive back to the load
balancer (and indirectly the client) that this server is ready to go when
in fact it's not. Reading this page (
http://wiki.apache.org/solr/SolrConfigXml), it does seem to be documented
to do this but it may not be fully stressed to hide your Solr behind a load
balancer.  I am more than happy to write up a post that, in my opinion at
least, stresses some best practices on the use of Solr based on my
experience if others find this useful.

What seems odd here is that the ping is a query so maybe the ping query in
the solrconfig (for Aaron and others having this) should be configured to
hit the handler that is used by the front end app so that while that
handler is warming up the ping query will be blocked.

Of course using the load balancer means that the app layer knows nothing
about servers in and out of rotation.

Cheers!
Amit


On Sun, Nov 11, 2012 at 8:05 AM, Jack Krupansky j...@basetechnology.comwrote:

 Is the issue here that the Solr node is continuously live with the load
 balancer so that the moment during startup that Solr can respond to
 anything, the load balancer will be sending it traffic and that this can
 occur while Solr is still warming up?

 First, shouldn't we be encouraging people to have an app layer between
 Solr and the outside world? If so, the app layer should simply not respond
 to traffic until the app layer can verified that Solr has stabilized. If
 not, then maybe we do need to suggest a change to Solr so that the
 developer can control exactly when Solr becomes live and responsive to
 incoming traffic.

 At a minimum, we should document when that moment is today in terms of an
 explicit contract. It sounds like the problem is that the contract is
 either nonexistent, vague, ambiguous, non-deterministic, or whatever.

 -- Jack Krupansky

 -Original Message- From: Amit Nithian
 Sent: Saturday, November 10, 2012 4:24 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Preventing accepting queries while custom QueryComponent
 starts up?


 Yeah that's what I was suggesting in my response too. I don't think your
 load balancer should be doing this but whatever script does the release
 (restarting the container) should do this so that when the ping is enabled
 the warming has finished.


 On Sat, Nov 10, 2012 at 3:33 PM, Erick Erickson erickerick...@gmail.com*
 *wrote:

  Hmmm, rather than hit the ping query, why not just send in a real query
 and
 only let the queued ones through after the response?

 Just a random thought
 Erick


 On Sat, Nov 10, 2012 at 2:53 PM, Amit Nithian anith...@gmail.com wrote:

  Yes but the problem is that if user facing queries are hitting a server
  that is warming up and isn't being serviced quickly, then you could
  potentially bring down your site if all the front end threads are 
 blocked
  on Solr queries b/c those queries are waiting (presumably at the
 container
  level since the filter hasn't finished its init() sequence) for the
 warming
  to complete (this is especially notorious when your front end is rails).
  This is why your ping to enable/disable a server from the load balancer
 has
  to be accurate with regards to whether or not a server is truly ready 
 and
  warm.
 
  I think what I am gathering from this discussion is that the server is
  warming up, the ping is going through and tells the load balancer this
  server is ready, user queries are hitting this server and are queued
  waiting for the firstSearcher to finish (say these initial user queries
 are
  to respond in 500-1000ms) that's terrible for performance.
 
  Alternatively, if you have a bunch of servers behind a load balancer, 
 you
  want this one server (or block of servers depending on your deployment)
 to
  be reasonably sure that user queries will return in a decent time
 (whatever
  you define decent to be) hence why this matters.
 
  Let me know if I am missing anything.
 
  Thanks
  Amit
 
 
  On Sat, Nov 10, 2012 at 10:03 AM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
   Why does it matter? The whole idea of firstSearcher queries is to warm
 up
   your system as fast as possible. The theory is that upon restarting 
  the
   server, let's bet this stuff going immediately... They were never
  intended
   (as far as I know) to complete before any queries were handled. As an
   aside, I'm not quite sure I understand why pings during the warmup are
 a
   problem.
  
   But anyway. firstSearcher is particularly relevant because the
   autowarmCount settings on your caches are irrelevant when starting the
   server, there's no history to autowarm
  
   But, there's no good reason _not_ to let queries through while
   firstSearcher is doing it's tricks, they just get into the queue and
 are
   served as quickly as they may. That might be some time since, as you
 say,
   they may not get serviced 

Re: 4.0 query question

2012-11-11 Thread Amit Nithian
Why not group by cid using the grouping component, within the group sort by
version descending and return 1 result per group.

http://wiki.apache.org/solr/FieldCollapsing

Cheers
Amit


On Fri, Nov 9, 2012 at 2:56 PM, dm_tim dm_...@yahoo.com wrote:

 I think I may have found my answer buy I'd like additional validation:
 I believe that I can add a function to my query to get only the highest
 values of 'file_version' like this -
 _val_:max(file_version, 1)

 I seem to be getting the results I want. Does this look correct?

 Regards,

 Tim



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/4-0-query-question-tp4019397p4019426.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: how to sort the solr suggester's result

2012-11-11 Thread eyun
anyone can help to tell me where is my mistake?




eyun

From: eyun
Date: 2012-11-12 11:24
To: solr-user-subscribe
Subject: how to sort the solr suggester's result
following is my config , it suggests words well .  
i want to get a sorted result when it suggest, so i added a transformer , it 
will add a tab(\t) separated float weight string 
to the end of the Suggestion field , but the suggestion result still does't 
sorted correctly. 

my suggest result( note the red rectangle is the weight)






schema.xml

field name=Suggestion type=string indexed=true stored=true/ 



solrconfig.xml

 searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
str name=namesuggest/str
str name=fieldSuggestion/str
str 
name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str 
name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str   
!-- float name=threshold0.0001/float --
str name=spellcheckIndexDirspellchecker/str
str name=comparatorClassfreq/str
str name=buildOnCommittrue/str
 
/lst
/searchComponent
requestHandler class=org.apache.solr.handler.component.SearchHandler
name=/suggest
lst name=defaults
str name=spellchecktrue/str
str name=spellcheck.dictionarysuggest/str
str name=spellcheck.count10/str
str name=spellcheck.onlyMorePopulartrue/str  
str name=spellcheck.collatetrue/str  
/lst
arr name=components
strsuggest/str
/arr
/requestHandler




eyun

how to sort the solr suggester's result

2012-11-11 Thread 徐郑
following is my config , it suggests words well .
i want to get a sorted result when it suggest, so i added a transformer ,
it will add a tab(\t) separated float weight string
to the end of the Suggestion field , but the suggestion result still does't
sorted correctly.

my suggest result( note the float number at the end is the weight)

lst name=spellcheck
lst name=suggestions
lst name=我
int name=numFound10/int
int name=startOffset1/int
int name=endOffset2/int
arr name=suggestion
str我脑中的橡皮擦 2.12/str
str我老婆是大佬3 2.07/str
str我老婆是大佬2 2.12/str




schema.xml

field name=Suggestion type=string indexed=true stored=true/



solrconfig.xml

  searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
str name=namesuggest/str
str name=fieldSuggestion/str
str
name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str
name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
!-- float name=threshold0.0001/float --
str name=spellcheckIndexDirspellchecker/str
str name=comparatorClassfreq/str
str name=buildOnCommittrue/str

/lst
/searchComponent
requestHandler class=org.apache.solr.handler.component.SearchHandler
name=/suggest
lst name=defaults
str name=spellchecktrue/str
str name=spellcheck.dictionarysuggest/str
str name=spellcheck.count10/str
str name=spellcheck.onlyMorePopulartrue/str
str name=spellcheck.collatetrue/str
/lst
arr name=components
strsuggest/str
/arr
/requestHandler

-- 

eyun

The truth, whether or not

Q:276770341   G+:eyun...@gmail.com


RE: sort by function error

2012-11-11 Thread Kuai, Ben
more information,  problem only happends when I have both sort by function and 
grouping in query.



From: Kuai, Ben [ben.k...@sensis.com.au]
Sent: Monday, November 12, 2012 2:12 PM
To: solr-user@lucene.apache.org
Subject: sort by function error

Hi

I am trying to use sort by function something like sort=sum(field1, field2) 
asc 

But it is not working and I got error  SortField needs to be rewritten through 
Sort.rewrite(..) and SortField.rewrite

Please shed me some light on this.

Thanks
Ben

Full exception stack track:
SEVERE: java.lang.IllegalStateException: SortField needs to be rewritten 
through Sort.rewrite(..) and SortField.rewrite(..)
at org.apache.lucene.search.SortField.getComparator(SortField.java:484)
at 
org.apache.lucene.search.grouping.AbstractFirstPassGroupingCollector.init(AbstractFirstPassGroupingCollector.java:82)
at 
org.apache.lucene.search.grouping.TermFirstPassGroupingCollector.init(TermFirstPassGroupingCollector.java:58)
at 
org.apache.solr.search.Grouping$TermFirstPassGroupingCollectorJava6.init(Grouping.java:1009)
at 
org.apache.solr.search.Grouping$CommandField.createFirstPassCollector(Grouping.java:632)
at org.apache.solr.search.Grouping.execute(Grouping.java:301)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:373)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:201)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1001)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:585)
at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)




Re: customize solr search/scoring for performance

2012-11-11 Thread jchen2000
Yes, we only need term overlap information to choose top candidates (we may
incorporate boost factor for different terms later but that's another
story).

we are quite new to solr so haven't really profiled the process. Is there
any rough guess on what could be expected latency from such cases?  our
throughput is only around 100 qps so that might not be a significant factor
here. 

Thanks,

Jeremy
  

Otis Gospodnetic-5 wrote
 Fuzzy answer:
 Can you verify the bottleneck, especially in slow cases is indeed scoring?
 Profiler?
 Not sure if coord method in Similarity is still around... are you saying
 you need just term overlap for scoring/ordering?
 20m small docs and 2s queries on good hardware sounds suspicious ... do
 slow queries correspond to GC or something else?
 
 Otis
 --
 Performance Monitoring - http://sematext.com/spm





--
View this message in context: 
http://lucene.472066.n3.nabble.com/customize-solr-search-scoring-for-performance-tp4019444p4019675.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: zkcli issues

2012-11-11 Thread Jeevanandam Madanagopal
Nick -

I believe you're experiencing a difficulties with SolrCloud CLI commands for 
interacting ZooKeeper.
Please have a look on below links, it will provide you direction.
Handy SolrCloud ZkCLI Commands
Uploading Solr Configuration into ZooKeeper ensemble

Cheers,
Jeeva

On Nov 12, 2012, at 4:45 AM, Mark Miller markrmil...@gmail.com wrote:

 On 11/11/2012 04:47 PM, Yonik Seeley wrote:
 On Sun, Nov 11, 2012 at 10:39 PM, Nick Chase nch...@earthlink.net wrote:
 So I'm trying to use ZkCLI without success.  I DID start and stop Solr in
 non-cloud mode, so everything is extracted and it IS finding zookeeper*.jar.
 However, now it's NOT finding SolrJ.
 Not sure about your specific problem in this case, but I chatted with
 Mark about this while at ApacheCon... it seems like we should be able
 to explode the WAR ourselves if necessary, eliminating the need to
 start Solr first.  Just throwing it out there before I forgot about it
 ;-)
 
 -Yonik
 http://lucidworks.com
 
 I guess the tricky part might be knowing where to extract it. We know how to 
 do it for the default jetty setup, but that could be reconfigured or you 
 could be using another web container.
 
 Kind of annoying.
 
 - Mark
 



Re: zkcli issues

2012-11-11 Thread Jeevanandam Madanagopal
Nick - Sorry, embedded links are not shown in previous email. I'm mentioning 
below.

 Handy SolrCloud ZkCLI Commands 
 (http://www.myjeeva.com/2012/10/solrcloud-cluster-single-collection-deployment/#handy-solrcloud-cli-commands)

 Uploading Solr Configuration into ZooKeeper ensemble 
 (http://www.myjeeva.com/2012/10/solrcloud-cluster-single-collection-deployment/#uploading-solrconfig-to-zookeeper)


Cheers,
Jeeva


On Nov 12, 2012, at 12:48 PM, Jeevanandam Madanagopal je...@myjeeva.com wrote:

 Nick -
 
 I believe you're experiencing a difficulties with SolrCloud CLI commands for 
 interacting ZooKeeper.
 Please have a look on below links, it will provide you direction.
 Handy SolrCloud ZkCLI Commands
 Uploading Solr Configuration into ZooKeeper ensemble
 
 Cheers,
 Jeeva
 
 On Nov 12, 2012, at 4:45 AM, Mark Miller markrmil...@gmail.com wrote:
 
 On 11/11/2012 04:47 PM, Yonik Seeley wrote:
 On Sun, Nov 11, 2012 at 10:39 PM, Nick Chase nch...@earthlink.net wrote:
 So I'm trying to use ZkCLI without success.  I DID start and stop Solr in
 non-cloud mode, so everything is extracted and it IS finding 
 zookeeper*.jar.
 However, now it's NOT finding SolrJ.
 Not sure about your specific problem in this case, but I chatted with
 Mark about this while at ApacheCon... it seems like we should be able
 to explode the WAR ourselves if necessary, eliminating the need to
 start Solr first.  Just throwing it out there before I forgot about it
 ;-)
 
 -Yonik
 http://lucidworks.com
 
 I guess the tricky part might be knowing where to extract it. We know how to 
 do it for the default jetty setup, but that could be reconfigured or you 
 could be using another web container.
 
 Kind of annoying.
 
 - Mark
 
 



Integrating Solr with Database

2012-11-11 Thread 122jxgcn
Hello,

I'm currently working on file management system based on Solr.

What I have accomplished now is that I have Solr server and windows client
application that runs on different computers.
When the client indexes rich document to Solr server remotely, it also
uploads the file itself via FTP.
So that when anyone searches for the document, he/she can download the raw
file from server.

What I want to do right now is that whenever the client indexes document and
uploads the raw file,
database gets update with the pairs of (Document ID in Solr, path of the raw
file inside server).
So on search result page, instead of giving the direct link of the raw file,
I'd like to make server to look up the database based on the Document ID in
Solr and return the linked file path.

As I'm new to database, Apache, RESTful API, and stuff, 
I'm not sure how to begin implementing this feature.
Any help or starting point would be appreciated.

Thank you.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Integrating-Solr-with-Database-tp4019692.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Integrating Solr with Database

2012-11-11 Thread Gora Mohanty
On 12 November 2012 13:00, 122jxgcn ywpar...@gmail.com wrote:
[...]
 What I want to do right now is that whenever the client indexes document and
 uploads the raw file,
 database gets update with the pairs of (Document ID in Solr, path of the raw
 file inside server).
 So on search result page, instead of giving the direct link of the raw file,
 I'd like to make server to look up the database based on the Document ID in
 Solr and return the linked file path.

This might make sense if you were using Solr to search for the
ID of an object in the database with relations to other objects.
However, if all you are doing is retrieving the file path/URL, why
not index that into Solr, and get it directly from Solr?

If you still want to do what you had in mind, you should handle
that as part of your indexing process, i.e., update both Solr and
the database at the same time, or update the database and index
to Solr from there.

Regards,
Gora


Re: Integrating Solr with Database

2012-11-11 Thread 122jxgcn
 This might make sense if you were using Solr to search for the
 ID of an object in the database with relations to other objects.
 However, if all you are doing is retrieving the file path/URL, why
 not index that into Solr, and get it directly from Solr?

That's what I'm doing right now but since there are some naming and security
issues,
I'd like to integrate Solr with database eventually.

 If you still want to do what you had in mind, you should handle
 that as part of your indexing process, i.e., update both Solr and
 the database at the same time

I have thought about that, but I could not figure out how to update database
when I'm updating Solr.
I'm pretty sure database has to be connected with Solr somehow (first
difficulty)
and database has to be updated remotely with Windows Form Application
written in C# (second difficulty)

Thank you.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Integrating-Solr-with-Database-tp4019692p4019695.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: More references for configuring Solr

2012-11-11 Thread Lance Norskog
LucidFind collects several sources of information in one searchable archive:

http://find.searchhub.org/?q=sort=#%2Fp%3Asolr

- Original Message -
| From: Dmitry Kan dmitry@gmail.com
| To: solr-user@lucene.apache.org
| Sent: Sunday, November 11, 2012 2:24:21 AM
| Subject: Re: More references for configuring Solr
| 
| Hi,
| 
| here are some resources:
| http://wiki.apache.org/solr/ (Solr wiki)
| http://lucene.apache.org/solr/books.html (books published on Solr)
| 
| the goes googling on a specific topic. But before reading a book
| might not
| be a bad idea..
| 
| -- Dmitry
| 
| On Sat, Nov 10, 2012 at 1:15 PM, FARAHZADI, EMAD
| emad.farahz...@netapp.comwrote:
| 
|   Dear Sir or Madam,
| 
|  ** **
| 
|  I want to use to Solr for my final project in university in part of
|  searching and indexing.
| 
|  I’d be appreciated if you send me more resources or documentations
|  about
|  Solr.
| 
|  ** **
| 
|  Regards
| 
|  ** **
| 
|  ** **
| 
| 
|  *
| 
|  Emad Farahzadi [image: brand-site-home-telescope-160x95]*
| 
|  *Professional Services Consultant*
| 
|  *NetApp Middle-East*
| 
|  *
|  **Office: +971 4   4466203
|  Cell:+971 50 9197237*
| 
|  ** **
| 
|  *NetApp MEA (Middle East   Africa)
|  Office No. 214
|  Building 2, 2nd Floor
|  Dubai Internet City
|  P.O. Box 500199
|  Dubai, U.A.E. *
| 
|   [image: netapp-cloud-esig-dollar]
| 
|  ** **
| 
| 
| 
| 
| --
| Regards,
| 
| Dmitry Kan
|