Re: Solr on S3FileSystem, Kosmos, GlusterFS, etc….

2014-06-25 Thread jay vyas
Hi paul.

Im not using it on S3 -- But yes - I dont think S3 would be ideal for Solr
at all.   There are several other Hadoop Compatible File Systems, however,
some of which might be ideal for certain types of SolrCloud workloads.

Anyways... would love to see a Solr wiki page on FileSystem compatibiity,
possibly an entry linking here https://wiki.apache.org/hadoop/HCFS.

In the meantime, I will update this thread if I find anything interesting
when we increase load size.



On Wed, Jun 25, 2014 at 1:34 AM, Paul Libbrecht p...@hoplahup.net wrote:

 I've always been under the impression that file-system-access-speed is
 crucial for Lucene-based storage and have always advocated to not use NFS
 for that (for which we had slowness of a factor of 5 approximately). Has
 there any performance measurement made for such a setting? Is FS-caching
 suddenly getting so much better that it is not a problem.

 Also, as far as I know S3 bills by the amount of (giga-)bytes exchanged….
 this gives plenty of room but if each starts needs to exchange a big part
 of the index from the storage to the solr server because of cache filling,
 it looks like it won't be that cheap.

 thanks for experience report.

 paul


 On 25 juin 2014, at 07:16, Jay Vyas jayunit100.apa...@gmail.com wrote:

  Hi Solr !
 
  I got this working .  Here's how :
 
  With the example jetty runner, you can Extract the tarball, and go to
 the examples/ directory, where you can launch an embedded core. Then, find
 the solrconfig.xml file. Edit it to contain the following xml:
 
  directoryFactory name=DirectoryFactory
 class=org.apache.solr.core.HdfsDirectoryFactory
  str name=solr.hdfs.homemyhcfs:///solr/str
  str name=solr.hdfs.confdir/etc/hadoop/conf/str
  /directoryFactory
 
  the confdir is important: That is where you will have something like a
 core-site.xml that defines all the parameters for your filesystem
 (fs.defaultFS, fs.mycfs.impl…. and so on).
 
 
  This tells solr, when launched, to use myhcfs as the underlying file
 store.
 
  You also should make sure that the jar for your plugin (in our case
 glisters, but hadoop will reference it by looking up the dynamically
 generated parameters that come from the base uri myhcfs… classes are on
 the class path, and the hadoop-common jar is also there (Some HCFS shims
 will need FilterFileSystem to run correctly, which is only in
 hadoop-common.jar).
 
  So - how to modify the running sold core's class path?
 
  To do so – you can update the solrconfig.xml jar directives. There are a
 bunch of regular expression templates you can modify in the
 examples/.../solrconfig.xml file. You can also copy the jars in at runtime,
 to be really safe.
 
  Once your example core with gluster configuration is setup, launch it
 with the following properties:
 
  java -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs
 -Dsolr.data.dir=glusterfs:///solr -Dsolr.updatelog=glusterfs:///solr
 -Dlog4j.configuration=file:/opt/solr-4.4.0-cdh5.0.2/example/etc/logging.properties
 -jar start.jar
 
  This starts a basic SOLR server on port 8983.
 
  If you are running from the simple jetty based examples which I've used
 to describe this above, then you should see the collection1 core up and
 running, and you should see its index sitting inside the /solr directory of
 your file system.
 
  Hope this helps those interested in expanding the use of SolrCloud
 outside of a single FS.
 
 
  On Jun 23, 2014, at 6:16 PM, Jay Vyas jayunit100.apa...@gmail.com
 wrote:
 
  Hi folks.  Does anyone deploy solr indices on other HCFS
 implementations (S3FileSystem, for example) regularly ? If so I'm wondering
 
  1) Where are the docs for doing this - or examples?  Seems like
 everything, including parameter names for dfs setup, are based around
 hdfs.   Maybe I should file a JIRA similar to
 https://issues.apache.org/jira/browse/FLUME-2410 (to make the generic
 deployment of SOLR on any file system explicit / obvious).
 
  2) if there are any interesting requirements (i.e. createNonRecursive,
 Atomic mkdirs, sharing, blocking expectations etc etc) which need to be
 implemented
 




-- 
jay vyas


Re: Solr on S3FileSystem, Kosmos, GlusterFS, etc….

2014-06-24 Thread Jay Vyas
Hi Solr ! 

I got this working .  Here's how : 

With the example jetty runner, you can Extract the tarball, and go to the 
examples/ directory, where you can launch an embedded core. Then, find the 
solrconfig.xml file. Edit it to contain the following xml:
 
directoryFactory name=DirectoryFactory 
class=org.apache.solr.core.HdfsDirectoryFactory
str name=solr.hdfs.homemyhcfs:///solr/str  
str name=solr.hdfs.confdir/etc/hadoop/conf/str  
/directoryFactory

the confdir is important: That is where you will have something like a 
core-site.xml that defines all the parameters for your filesystem 
(fs.defaultFS, fs.mycfs.impl…. and so on). 


This tells solr, when launched, to use myhcfs as the underlying file store. 

You also should make sure that the jar for your plugin (in our case glisters, 
but hadoop will reference it by looking up the dynamically generated parameters 
that come from the base uri myhcfs… classes are on the class path, and the 
hadoop-common jar is also there (Some HCFS shims will need FilterFileSystem to 
run correctly, which is only in hadoop-common.jar).

So - how to modify the running sold core's class path?  

To do so – you can update the solrconfig.xml jar directives. There are a bunch 
of regular expression templates you can modify in the 
examples/.../solrconfig.xml file. You can also copy the jars in at runtime, to 
be really safe.
 
Once your example core with gluster configuration is setup, launch it with the 
following properties:
 
java -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs 
-Dsolr.data.dir=glusterfs:///solr -Dsolr.updatelog=glusterfs:///solr 
-Dlog4j.configuration=file:/opt/solr-4.4.0-cdh5.0.2/example/etc/logging.properties
 -jar start.jar

This starts a basic SOLR server on port 8983.
 
If you are running from the simple jetty based examples which I've used to 
describe this above, then you should see the collection1 core up and running, 
and you should see its index sitting inside the /solr directory of your file 
system. 

Hope this helps those interested in expanding the use of SolrCloud outside of a 
single FS. 


On Jun 23, 2014, at 6:16 PM, Jay Vyas jayunit100.apa...@gmail.com wrote:

 Hi folks.  Does anyone deploy solr indices on other HCFS implementations 
 (S3FileSystem, for example) regularly ? If so I'm wondering 
 
 1) Where are the docs for doing this - or examples?  Seems like everything, 
 including parameter names for dfs setup, are based around hdfs.   Maybe I 
 should file a JIRA similar to 
 https://issues.apache.org/jira/browse/FLUME-2410 (to make the generic 
 deployment of SOLR on any file system explicit / obvious).
 
 2) if there are any interesting requirements (i.e. createNonRecursive, Atomic 
 mkdirs, sharing, blocking expectations etc etc) which need to be implemented



Re: Solr on S3FileSystem, Kosmos, GlusterFS, etc….

2014-06-24 Thread Paul Libbrecht
I've always been under the impression that file-system-access-speed is crucial 
for Lucene-based storage and have always advocated to not use NFS for that (for 
which we had slowness of a factor of 5 approximately). Has there any 
performance measurement made for such a setting? Is FS-caching suddenly getting 
so much better that it is not a problem.

Also, as far as I know S3 bills by the amount of (giga-)bytes exchanged…. this 
gives plenty of room but if each starts needs to exchange a big part of the 
index from the storage to the solr server because of cache filling, it looks 
like it won't be that cheap.

thanks for experience report.

paul


On 25 juin 2014, at 07:16, Jay Vyas jayunit100.apa...@gmail.com wrote:

 Hi Solr ! 
 
 I got this working .  Here's how : 
 
 With the example jetty runner, you can Extract the tarball, and go to the 
 examples/ directory, where you can launch an embedded core. Then, find the 
 solrconfig.xml file. Edit it to contain the following xml:
 
 directoryFactory name=DirectoryFactory 
 class=org.apache.solr.core.HdfsDirectoryFactory
 str name=solr.hdfs.homemyhcfs:///solr/str  
 str name=solr.hdfs.confdir/etc/hadoop/conf/str  
 /directoryFactory
 
 the confdir is important: That is where you will have something like a 
 core-site.xml that defines all the parameters for your filesystem 
 (fs.defaultFS, fs.mycfs.impl…. and so on). 
 
 
 This tells solr, when launched, to use myhcfs as the underlying file store. 
 
 You also should make sure that the jar for your plugin (in our case glisters, 
 but hadoop will reference it by looking up the dynamically generated 
 parameters that come from the base uri myhcfs… classes are on the class 
 path, and the hadoop-common jar is also there (Some HCFS shims will need 
 FilterFileSystem to run correctly, which is only in hadoop-common.jar).
 
 So - how to modify the running sold core's class path?  
 
 To do so – you can update the solrconfig.xml jar directives. There are a 
 bunch of regular expression templates you can modify in the 
 examples/.../solrconfig.xml file. You can also copy the jars in at runtime, 
 to be really safe.
 
 Once your example core with gluster configuration is setup, launch it with 
 the following properties:
 
 java -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs 
 -Dsolr.data.dir=glusterfs:///solr -Dsolr.updatelog=glusterfs:///solr 
 -Dlog4j.configuration=file:/opt/solr-4.4.0-cdh5.0.2/example/etc/logging.properties
  -jar start.jar
 
 This starts a basic SOLR server on port 8983.
 
 If you are running from the simple jetty based examples which I've used to 
 describe this above, then you should see the collection1 core up and running, 
 and you should see its index sitting inside the /solr directory of your file 
 system. 
 
 Hope this helps those interested in expanding the use of SolrCloud outside of 
 a single FS. 
 
 
 On Jun 23, 2014, at 6:16 PM, Jay Vyas jayunit100.apa...@gmail.com wrote:
 
 Hi folks.  Does anyone deploy solr indices on other HCFS implementations 
 (S3FileSystem, for example) regularly ? If so I'm wondering 
 
 1) Where are the docs for doing this - or examples?  Seems like everything, 
 including parameter names for dfs setup, are based around hdfs.   Maybe I 
 should file a JIRA similar to 
 https://issues.apache.org/jira/browse/FLUME-2410 (to make the generic 
 deployment of SOLR on any file system explicit / obvious).
 
 2) if there are any interesting requirements (i.e. createNonRecursive, 
 Atomic mkdirs, sharing, blocking expectations etc etc) which need to be 
 implemented
 



Solr on S3FileSystem, Kosmos, GlusterFS, etc….

2014-06-23 Thread Jay Vyas
Hi folks.  Does anyone deploy solr indices on other HCFS implementations 
(S3FileSystem, for example) regularly ? If so I'm wondering 

1) Where are the docs for doing this - or examples?  Seems like everything, 
including parameter names for dfs setup, are based around hdfs.   Maybe I 
should file a JIRA similar to https://issues.apache.org/jira/browse/FLUME-2410 
(to make the generic deployment of SOLR on any file system explicit / obvious).

2) if there are any interesting requirements (i.e. createNonRecursive, Atomic 
mkdirs, sharing, blocking expectations etc etc) which need to be implemented