Re: DIH XML configs for multi environment

2012-07-23 Thread jerry.min...@gmail.com
Pranay,

I tried two similar approaches to resolve this in my system which is
Solr 4.0 running in Tomcat 7.x on Ubuntu 9.10.

My preference was to use an alias for each of my database environments
as a JVM parameter because it makes more sense to me that the database
connection be stored in the data config file rather than in a Tomcat
configuration or startup file.
Because of preference, I first attempted the following:
1. Set a JVM environment variable 'solr.dbEnv' to the represent the
database environment that should be accessed. For example, in my dev
environment, the JVM environment variable was set as -Dsolr.dbEnv=dev.
2. In the data config file I had 3 data sources. Each data source had
a name that matched one of the database environment aliases.
3. In the entity of my data config file dataSource parameter was set
as follows dataSource=${solr.dbEnv}.

Unfortunately, this fails to work. Setting dataSource parameter in
the data config file does not override the default. The default
appears to be the first data source defined in the data config file.

Second, I tried what Marcus suggested.

That is, I created a JVM variable to contain the connect URLs for each
of my environments.
I use that variable to set the URL parameter of the dataSource entity
in the data config file.

This works well.


Best,
Jerry Mindek

Unfortunately, the first option did not work. It seemed as though
On Wed, Jul 18, 2012 at 3:46 PM, Pranav Prakash pra...@gmail.com wrote:
 That approach would work for core dependent parameters. In my case, the
 params are environment dependent. I think a simpler approach would be to
 pass the url param as JVM options, and these XMLs get it from there.

 I haven't tried it yet.

 *Pranav Prakash*

 temet nosce



 On Tue, Jul 17, 2012 at 5:09 PM, Markus Klose m...@shi-gmbh.com wrote:

 Hi

 There is one more approach using the property mechanism.

 You could specify the datasource like this:
 dataSource name=database driver=${sqlDriver} url=${sqlURL}/

  And you can specifiy the properties in the solr.xml in your core
 configuration like this:

 core instanceDir=core1 name=core1
 property name=sqlURL value=jdbc:hsqldb:/temp/example/ex/
 
 /core


 Viele Grüße aus Augsburg

 Markus Klose
 SHI Elektronische Medien GmbH


 Adresse: Curt-Frenzel-Str. 12, 86167 Augsburg

 Tel.:   0821 7482633 26
 Tel.:   0821 7482633 0 (Zentrale)
 Mobil:0176 56516869
 Fax:   0821 7482633 29

 E-Mail: markus.kl...@shi-gmbh.com
 Internet: http://www.shi-gmbh.com

 Registergericht Augsburg HRB 17382
 Geschäftsführer: Peter Spiske
 USt.-ID: DE 182167335





 -Ursprüngliche Nachricht-
 Von: Rahul Warawdekar [mailto:rahul.warawde...@gmail.com]
 Gesendet: Mittwoch, 11. Juli 2012 11:21
 An: solr-user@lucene.apache.org
 Betreff: Re: DIH XML configs for multi environment

 http://wiki.eclipse.org/Jetty/Howto/Configure_JNDI_Datasource
 http://docs.codehaus.org/display/JETTY/DataSource+Examples


 On Wed, Jul 11, 2012 at 2:30 PM, Pranav Prakash pra...@gmail.com wrote:

  That's cool. Is there something similar for Jetty as well? We use Jetty!
 
  *Pranav Prakash*
 
  temet nosce
 
 
 
  On Wed, Jul 11, 2012 at 1:49 PM, Rahul Warawdekar 
  rahul.warawde...@gmail.com wrote:
 
   Hi Pranav,
  
   If you are using Tomcat to host Solr, you can define your data
   source in context.xml file under tomcat configuration.
   You have to refer to this datasource with the same name in all the 3
   environments from DIH data-config.xml.
   This context.xml file will vary across 3 environments having
   different credentials for dev, stag and prod.
  
   eg
   DIH data-config.xml will refer to the datasource as listed below
   dataSource jndiName=java:comp/env/*YOUR_DATASOURCE_NAME*
   type=JdbcDataSource readOnly=true /
  
   context.xml file which is located under /TOMCAT_HOME/conf folder
   will have the resource entry as follows
 Resource name=*YOUR_DATASOURCE_NAME* auth=Container
   type= username=X password=X
   driverClassName=
   url=
   maxActive=8
   /
  
   On Wed, Jul 11, 2012 at 1:31 PM, Pranav Prakash pra...@gmail.com
  wrote:
  
The DIH XML config file has to be specified dataSource. In my
case, and possibly with many others, the logon credentials as well
as mysql
  server
paths would differ based on environments (dev, stag, prod). I
don't
  want
   to
end up coming with three different DIH config files, three
different handlers and so on.
   
What is a good way to deal with this?
   
   
*Pranav Prakash*
   
temet nosce
   
  
  
  
   --
   Thanks and Regards
   Rahul A. Warawdekar
  
 



 --
 Thanks and Regards
 Rahul A. Warawdekar



Re: DIH XML configs for multi environment

2012-07-23 Thread jerry.min...@gmail.com
Pranav,

Sorry, I should have checked my response a little better as I
misspelled your name and, mentioned that I tried what Marcus suggested
then described something totally different.
I didn't try using the property mechanism as Marcus suggested as I am
not using a solr.xml file.

What you mentioned in your post on Wed, Jul 18, 2012 at 3:46 PM will
work as I have done it successfully.
That is I created a JVM variable to contain the connect URLs for each
of my environments and one of those to set the URL parameter of the
dataSource entity
in my data config files.

Best,
Jerry


On Mon, Jul 23, 2012 at 3:34 PM, jerry.min...@gmail.com
jerry.min...@gmail.com wrote:
 Pranay,

 I tried two similar approaches to resolve this in my system which is
 Solr 4.0 running in Tomcat 7.x on Ubuntu 9.10.

 My preference was to use an alias for each of my database environments
 as a JVM parameter because it makes more sense to me that the database
 connection be stored in the data config file rather than in a Tomcat
 configuration or startup file.
 Because of preference, I first attempted the following:
 1. Set a JVM environment variable 'solr.dbEnv' to the represent the
 database environment that should be accessed. For example, in my dev
 environment, the JVM environment variable was set as -Dsolr.dbEnv=dev.
 2. In the data config file I had 3 data sources. Each data source had
 a name that matched one of the database environment aliases.
 3. In the entity of my data config file dataSource parameter was set
 as follows dataSource=${solr.dbEnv}.

 Unfortunately, this fails to work. Setting dataSource parameter in
 the data config file does not override the default. The default
 appears to be the first data source defined in the data config file.

 Second, I tried what Marcus suggested.

 That is, I created a JVM variable to contain the connect URLs for each
 of my environments.
 I use that variable to set the URL parameter of the dataSource entity
 in the data config file.

 This works well.


 Best,
 Jerry Mindek

 Unfortunately, the first option did not work. It seemed as though
 On Wed, Jul 18, 2012 at 3:46 PM, Pranav Prakash pra...@gmail.com wrote:
 That approach would work for core dependent parameters. In my case, the
 params are environment dependent. I think a simpler approach would be to
 pass the url param as JVM options, and these XMLs get it from there.

 I haven't tried it yet.

 *Pranav Prakash*

 temet nosce



 On Tue, Jul 17, 2012 at 5:09 PM, Markus Klose m...@shi-gmbh.com wrote:

 Hi

 There is one more approach using the property mechanism.

 You could specify the datasource like this:
 dataSource name=database driver=${sqlDriver} url=${sqlURL}/

  And you can specifiy the properties in the solr.xml in your core
 configuration like this:

 core instanceDir=core1 name=core1
 property name=sqlURL value=jdbc:hsqldb:/temp/example/ex/
 
 /core


 Viele Grüße aus Augsburg

 Markus Klose
 SHI Elektronische Medien GmbH


 Adresse: Curt-Frenzel-Str. 12, 86167 Augsburg

 Tel.:   0821 7482633 26
 Tel.:   0821 7482633 0 (Zentrale)
 Mobil:0176 56516869
 Fax:   0821 7482633 29

 E-Mail: markus.kl...@shi-gmbh.com
 Internet: http://www.shi-gmbh.com

 Registergericht Augsburg HRB 17382
 Geschäftsführer: Peter Spiske
 USt.-ID: DE 182167335





 -Ursprüngliche Nachricht-
 Von: Rahul Warawdekar [mailto:rahul.warawde...@gmail.com]
 Gesendet: Mittwoch, 11. Juli 2012 11:21
 An: solr-user@lucene.apache.org
 Betreff: Re: DIH XML configs for multi environment

 http://wiki.eclipse.org/Jetty/Howto/Configure_JNDI_Datasource
 http://docs.codehaus.org/display/JETTY/DataSource+Examples


 On Wed, Jul 11, 2012 at 2:30 PM, Pranav Prakash pra...@gmail.com wrote:

  That's cool. Is there something similar for Jetty as well? We use Jetty!
 
  *Pranav Prakash*
 
  temet nosce
 
 
 
  On Wed, Jul 11, 2012 at 1:49 PM, Rahul Warawdekar 
  rahul.warawde...@gmail.com wrote:
 
   Hi Pranav,
  
   If you are using Tomcat to host Solr, you can define your data
   source in context.xml file under tomcat configuration.
   You have to refer to this datasource with the same name in all the 3
   environments from DIH data-config.xml.
   This context.xml file will vary across 3 environments having
   different credentials for dev, stag and prod.
  
   eg
   DIH data-config.xml will refer to the datasource as listed below
   dataSource jndiName=java:comp/env/*YOUR_DATASOURCE_NAME*
   type=JdbcDataSource readOnly=true /
  
   context.xml file which is located under /TOMCAT_HOME/conf folder
   will have the resource entry as follows
 Resource name=*YOUR_DATASOURCE_NAME* auth=Container
   type= username=X password=X
   driverClassName=
   url=
   maxActive=8
   /
  
   On Wed, Jul 11, 2012 at 1:31 PM, Pranav Prakash pra...@gmail.com
  wrote:
  
The DIH XML config file has to be specified dataSource. In my
case, and possibly with many others

Re: SolrCloud with Tomcat and external Zookeeper, does it work?

2012-03-27 Thread jerry.min...@gmail.com
Hi Vadim,

I too am experimenting with SolrCloud and need help with setting it up
using Tomcat as the java servlet container.
While searching for help on this question, I found another thread in
the solr-mailing-list that is helpful.
In case you haven't seen this thread that I found, please search the
solr-mailing-list for: SolrCloud new
You can also view it at nabble using this link:
http://lucene.472066.n3.nabble.com/SolrCloud-new-td1528872.html

Best,
Jerry M.




On Wed, Mar 21, 2012 at 5:51 AM, Vadim Kisselmann
v.kisselm...@googlemail.com wrote:

 Hello folks,

 i read the SolrCloud Wiki and Bruno Dumon's blog entry with his First
 Exploration of SolrCloud.
 Examples and a first setup with embedded Jetty and ZK WORKS without problems.

 I tried to setup my own configuration with Tomcat and an external
 Zookeeper(my Master-ZK), but it doesn't work really.

 My setup:
 - latest Solr version from trunk
 - Tomcat 6
 - external ZK
 - Target: 1 Server, 1 Tomcat, 1 Solr instance, 2 collections with
 different config/schema

 What i tried:
 --
 1. After checkout i build solr(ant run-example), it works.
 ---
 2. I send my config/schema files to external ZK with Jetty:
 java -Djetty.port=8080 -Dbootstrap_confdir=/root/solrCloud/conf/
 -Dcollection.configName=conf1 -DzkHost=master-zk:2181 -jar start.jar
 it works, too.
 ---
 3. I create my (empty, without cores)solr.xml, like Bruno:
 http://www.ngdata.com/site/blog/57-ng.html#disqus_thread
 ---
 4. I started my Tomcat, and get the first error:
 in UI: This interface requires that you activate the admin request
 handlers, add the following configuration to your solrconfig.xml:
 !-- Admin Handlers - This will register all the standard admin
 RequestHandlers. --
 requestHandler name=/admin/ class=solr.admin.AdminHandlers /
 Admin request Handlers are definitely activated in my solrconfig.

 I get this error only with the latest trunk versions, with r1292064
 from February not. Sometimes it works with the new version, sometimes
 not and i get this error.

 --
 5. Ok , it it works, after few restarts, i changed my JAVA_OPTS for
 Tomcat and added this: -DzkHost=master-zk:2181
 Next Error:
 This The web application [/solr2] appears to have started a thread
 named [main-SendThread(master-zk:2181)] but has failed to stop it.
 This is very likely to create a memory leak.
 Exception in thread Thread-2 java.lang.NullPointerException
 at 
 org.apache.solr.cloud.Overseer$CloudStateUpdater.amILeader(Overseer.java:179)
 at org.apache.solr.cloud.Overseer$CloudStateUpdater.run(Overseer.java:104)
 at java.lang.Thread.run(Thread.java:662)
 15.03.2012 13:25:17 org.apache.catalina.loader.WebappClassLoader loadClass
 INFO: Illegal access: this web application instance has been stopped
 already. Could not load org.apache.zookeeper.server.ZooTrace. The
 eventual following stack trace is caused by an error thrown for
 debugging purposes as well as to attempt to terminate the thread which
 caused the illegal access, and has no functional impact.
 java.lang.IllegalStateException
 at 
 org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1531)
 at 
 org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1491)
 at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1196)
 15.03.2012 13:25:17 org.apache.coyote.http11.Http11Protocol destroy

 -
 6. Ok, we assume, that the first steps works, and i would create new
 cores and my 2 collections. My requests with CoreAdminHandler are ok,
 my solr.xml looks like this:
 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=true
  cores adminPath=/admin/cores zkClientTimeout=1 hostPort=8080
 hostContext=solr
    core
       name=shard1_data
       collection=col1
       shard=shard1
       instanceDir=xxx/ /
  core
       name=shard2_data
       collection=col2
       shard=shard2
       instanceDir=xx2/ /
  /cores
 /solr

 Now i get the following exception: ...couldn't find conf name for
 collection1...
 I don't have an collection 1. Why this exception?

 ---
 You can see, there are too many exceptions and eventually
 configuration problems with Tomcat and an external ZK.
 Has anyone set up an identical configuration and does it work?
 Does anyone detect mistakes in my configuration steps?

 Best regards
 Vadim


Re: Same id on two shards

2012-02-22 Thread jerry.min...@gmail.com
Hi,

I stumbled across this thread after running into the same question. The
answers presented here seem a little vague and I was hoping to renew the
discussion.

I am using using a branch of Solr 4, distributed searching over 12 shards.
I want the documents in the first shard to always be selected over
documents that appear in the other 11 shards.

The queries to these shards looks something like this: 
http://solrserver/shard_1_app/select?shards=solr_server:/shard_1_app/,solr_server:/shard_2_app,
... ,solr_server:/shard_12_appq=id:

When I execute a query for an ID that I know exists in shard_1 and another
shard, I do always get the result from shard 1.

Here's some questions that I have:
1. Has anyone rigorously tested the comment in the wiki If docs with
duplicate unique keys are encountered, Solr will make an attempt to return
valid results, but the behavior may be non-deterministic.

2. Who is relying on this behavior (the document of the first shard is
returned) today? When do you notice the wrong document is selected? Do you
have a feeling for how frequently your distributed search returns the
document from a shard other than the first?

3. Is there a good web source other than the Solr wiki for information
about Solr distributed queries?


Thanks,
Jerry M.


On Mon, Aug 8, 2011 at 7:41 PM, simon mtnes...@gmail.com wrote:

 I think the first one to respond is indeed the way it works, but
 that's only deterministic up to a point (if your small index is in the
 throes of a commit and everything required for a response happens to
 be  cached on the larger shard ... who knows ?)

 On Mon, Aug 8, 2011 at 7:10 PM, Shawn Heisey s...@elyograg.org wrote:
  On 8/8/2011 4:07 PM, simon wrote:
 
  Only one should be returned, but it's non-deterministic. See
 
 
 http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations
 
  I had heard it was based on which one responded first.  This is part of
 why
  we have a small index that contains the newest content and only
 distribute
  content to the other shards once a day.  The hope is that the small index
  (less than 1GB, fits into RAM on that virtual machine) will always
 respond
  faster than the other larger shards (over 18GB each).  Is this an
 incorrect
  assumption on our part?
 
  The build system does do everything it can to ensure that periods of
 overlap
  are limited to the time it takes to commit a change across all of the
  shards, which should amount to just a few seconds once a day.  There
 might
  be situations when the index gets out of whack and we have duplicate id
  values for a longer time period, but in practice it hasn't happened yet.
 
  Thanks,
  Shawn