Re: solr over hdfs for accessing/ changing indexes outside solr

2014-08-07 Thread Ali Nazemian
Thank you very much. But why we should go for solr distributed with hadoop?
There is already solrCloud which is pretty applicable in the case of big
index. Is there any advantage for sending indexes over map reduce that
solrCloud can not provide?
Regards.


On Wed, Aug 6, 2014 at 9:09 PM, Erick Erickson erickerick...@gmail.com
wrote:

 bq: Are you aware of Cloudera search? I know they provide an integrated
 Hadoop ecosystem.

 What Cloudera Search does via the MapReduceIndexerTool (MRIT) is create N
 sub-indexes for
 each shard in the M/R paradigm via EmbeddedSolrServer. Eventually, these
 sub-indexes for
 each shard are merged (perhaps through some number of levels) in the reduce
 phase and
 maybe merged into a live Solr instance (--go-live). You'll note that this
 tool requires the
 address of the ZK ensemble from which it can get the network topology,
 configuration files,
 all that rot. If you don't use the --go-live option, the output is still a
 Solr index, it's just that
 the index for each shard is left in a specific directory on HDFS. Being on
 HDFS allows
 this kind of M/R paradigm for massively parallel indexing operations, and
 perhaps massively
 complex analysis.

 Nowhere is there any low-level non-Solr manipulation of the indexes.

 The Flume fork just writes directly to the Solr nodes. It knows about the
 ZooKeeper
 ensemble and the collection too and communicates via SolrJ I'm pretty sure.

 As far as integrating with HDFS, you're right, HA is part of the package.
 As far as using
 the Solr indexes for analysis, well you can write anything you want to use
 the Solr indexes
 from anywhere in the M/R world and have them available from anywhere in the
 cluster. There's
 no real need to even have Solr running, you could use the output from MRIT
 and access the
 sub-shards with the EmbeddedSolrServer if you wanted, leaving out all the
 pesky servlet
 container stuff.

 bq: So why we go for HDFS in the case of analysis if we want to use SolrJ
 for this purpose?
 What is the point?

 Scale and data access in a nutshell. In the HDFS world, you can scale
 pretty linearly
 with the number of nodes you can rack together.

 Frankly though, if your data set is small enough to fit on a single machine
 _and_ you can get
 through your analysis in a reasonable time (reasonable here is up to you),
 then HDFS
 is probably not worth the hassle. But in the big data world where we're
 talking petabyte scale,
 having HDFS as the underpinning opens up possibilities for working on data
 that were
 difficult/impossible with Solr previously.

 Best,
 Erick



 On Tue, Aug 5, 2014 at 9:37 PM, Ali Nazemian alinazem...@gmail.com
 wrote:

  Dear Erick,
  I remembered some times ago, somebody asked about what is the point of
  modify Solr to use HDFS for storing indexes. As far as I remember
 somebody
  told him integrating Solr with HDFS has two advantages. 1) having hadoop
  replication and HA. 2) using indexes and Solr documents for other
 purposes
  such as Analysis. So why we go for HDFS in the case of analysis if we
 want
  to use SolrJ for this purpose? What is the point?
  Regards.
 
 
  On Wed, Aug 6, 2014 at 8:59 AM, Ali Nazemian alinazem...@gmail.com
  wrote:
 
   Dear Erick,
   Hi,
   Thank you for you reply. Yeah I am aware that SolrJ is my last option.
 I
   was thinking about raw I/O operation. So according to your reply
 probably
   it is not applicable somehow. What about the Lily project that Michael
   mentioned? Is that consider SolrJ too? Are you aware of Cloudera
 search?
  I
   know they provide an integrated Hadoop ecosystem. Do you know what is
  their
   suggestion?
   Best regards.
  
  
  
   On Wed, Aug 6, 2014 at 12:28 AM, Erick Erickson 
 erickerick...@gmail.com
  
   wrote:
  
   What you haven't told us is what you mean by modify the
   index outside Solr. SolrJ? Using raw Lucene? Trying to modify
   things by writing your own codec? Standard Java I/O operations?
   Other?
  
   You could use SolrJ to connect to an existing Solr server and
   both read and modify at will form your M/R jobs. But if you're
   thinking of trying to write/modify the segment files by raw I/O
   operations, good luck! I'm 99.99% certain that's going to cause
   you endless grief.
  
   Best,
   Erick
  
  
   On Tue, Aug 5, 2014 at 9:55 AM, Ali Nazemian alinazem...@gmail.com
   wrote:
  
Actually I am going to do some analysis on the solr data using map
   reduce.
For this purpose it might be needed to change some part of data or
 add
   new
fields from outside solr.
   
   
On Tue, Aug 5, 2014 at 5:51 PM, Shawn Heisey s...@elyograg.org
  wrote:
   
 On 8/5/2014 7:04 AM, Ali Nazemian wrote:
  I changed solr 4.9 to write index and data on hdfs. Now I am
 going
   to
  connect to those data from the outside of solr for changing some
  of
   the
  values. Could somebody please tell me how that is possible?
  Suppose
   I
am
  using Hbase over hdfs for do these changes.


Re: solr over hdfs for accessing/ changing indexes outside solr

2014-08-07 Thread Erick Erickson
If SolrCloud meets your needs, without Hadoop, then
there's no real reason to introduce the added complexity.

There are a bunch of problems that do _not_ work
well with SolrCloud over non-Hadoop file systems. For
those problems, the combination of SolrCloud and Hadoop
make tackling them possible.

Best,
Erick


On Thu, Aug 7, 2014 at 3:55 AM, Ali Nazemian alinazem...@gmail.com wrote:

 Thank you very much. But why we should go for solr distributed with hadoop?
 There is already solrCloud which is pretty applicable in the case of big
 index. Is there any advantage for sending indexes over map reduce that
 solrCloud can not provide?
 Regards.


 On Wed, Aug 6, 2014 at 9:09 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  bq: Are you aware of Cloudera search? I know they provide an integrated
  Hadoop ecosystem.
 
  What Cloudera Search does via the MapReduceIndexerTool (MRIT) is create N
  sub-indexes for
  each shard in the M/R paradigm via EmbeddedSolrServer. Eventually, these
  sub-indexes for
  each shard are merged (perhaps through some number of levels) in the
 reduce
  phase and
  maybe merged into a live Solr instance (--go-live). You'll note that this
  tool requires the
  address of the ZK ensemble from which it can get the network topology,
  configuration files,
  all that rot. If you don't use the --go-live option, the output is still
 a
  Solr index, it's just that
  the index for each shard is left in a specific directory on HDFS. Being
 on
  HDFS allows
  this kind of M/R paradigm for massively parallel indexing operations, and
  perhaps massively
  complex analysis.
 
  Nowhere is there any low-level non-Solr manipulation of the indexes.
 
  The Flume fork just writes directly to the Solr nodes. It knows about the
  ZooKeeper
  ensemble and the collection too and communicates via SolrJ I'm pretty
 sure.
 
  As far as integrating with HDFS, you're right, HA is part of the package.
  As far as using
  the Solr indexes for analysis, well you can write anything you want to
 use
  the Solr indexes
  from anywhere in the M/R world and have them available from anywhere in
 the
  cluster. There's
  no real need to even have Solr running, you could use the output from
 MRIT
  and access the
  sub-shards with the EmbeddedSolrServer if you wanted, leaving out all the
  pesky servlet
  container stuff.
 
  bq: So why we go for HDFS in the case of analysis if we want to use SolrJ
  for this purpose?
  What is the point?
 
  Scale and data access in a nutshell. In the HDFS world, you can scale
  pretty linearly
  with the number of nodes you can rack together.
 
  Frankly though, if your data set is small enough to fit on a single
 machine
  _and_ you can get
  through your analysis in a reasonable time (reasonable here is up to
 you),
  then HDFS
  is probably not worth the hassle. But in the big data world where we're
  talking petabyte scale,
  having HDFS as the underpinning opens up possibilities for working on
 data
  that were
  difficult/impossible with Solr previously.
 
  Best,
  Erick
 
 
 
  On Tue, Aug 5, 2014 at 9:37 PM, Ali Nazemian alinazem...@gmail.com
  wrote:
 
   Dear Erick,
   I remembered some times ago, somebody asked about what is the point of
   modify Solr to use HDFS for storing indexes. As far as I remember
  somebody
   told him integrating Solr with HDFS has two advantages. 1) having
 hadoop
   replication and HA. 2) using indexes and Solr documents for other
  purposes
   such as Analysis. So why we go for HDFS in the case of analysis if we
  want
   to use SolrJ for this purpose? What is the point?
   Regards.
  
  
   On Wed, Aug 6, 2014 at 8:59 AM, Ali Nazemian alinazem...@gmail.com
   wrote:
  
Dear Erick,
Hi,
Thank you for you reply. Yeah I am aware that SolrJ is my last
 option.
  I
was thinking about raw I/O operation. So according to your reply
  probably
it is not applicable somehow. What about the Lily project that
 Michael
mentioned? Is that consider SolrJ too? Are you aware of Cloudera
  search?
   I
know they provide an integrated Hadoop ecosystem. Do you know what is
   their
suggestion?
Best regards.
   
   
   
On Wed, Aug 6, 2014 at 12:28 AM, Erick Erickson 
  erickerick...@gmail.com
   
wrote:
   
What you haven't told us is what you mean by modify the
index outside Solr. SolrJ? Using raw Lucene? Trying to modify
things by writing your own codec? Standard Java I/O operations?
Other?
   
You could use SolrJ to connect to an existing Solr server and
both read and modify at will form your M/R jobs. But if you're
thinking of trying to write/modify the segment files by raw I/O
operations, good luck! I'm 99.99% certain that's going to cause
you endless grief.
   
Best,
Erick
   
   
On Tue, Aug 5, 2014 at 9:55 AM, Ali Nazemian alinazem...@gmail.com
 
wrote:
   
 Actually I am going to do some analysis on the solr data using map
reduce.
 For this 

Re: solr over hdfs for accessing/ changing indexes outside solr

2014-08-07 Thread Ali Nazemian
Dear Erick,
Could you please name those problems that SolrCloud can not tackle them
alone? Maybe I need solrCloud+ Hadoop and I am not aware of that yet.
Regards.


On Thu, Aug 7, 2014 at 7:37 PM, Erick Erickson erickerick...@gmail.com
wrote:

 If SolrCloud meets your needs, without Hadoop, then
 there's no real reason to introduce the added complexity.

 There are a bunch of problems that do _not_ work
 well with SolrCloud over non-Hadoop file systems. For
 those problems, the combination of SolrCloud and Hadoop
 make tackling them possible.

 Best,
 Erick


 On Thu, Aug 7, 2014 at 3:55 AM, Ali Nazemian alinazem...@gmail.com
 wrote:

  Thank you very much. But why we should go for solr distributed with
 hadoop?
  There is already solrCloud which is pretty applicable in the case of big
  index. Is there any advantage for sending indexes over map reduce that
  solrCloud can not provide?
  Regards.
 
 
  On Wed, Aug 6, 2014 at 9:09 PM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   bq: Are you aware of Cloudera search? I know they provide an integrated
   Hadoop ecosystem.
  
   What Cloudera Search does via the MapReduceIndexerTool (MRIT) is
 create N
   sub-indexes for
   each shard in the M/R paradigm via EmbeddedSolrServer. Eventually,
 these
   sub-indexes for
   each shard are merged (perhaps through some number of levels) in the
  reduce
   phase and
   maybe merged into a live Solr instance (--go-live). You'll note that
 this
   tool requires the
   address of the ZK ensemble from which it can get the network topology,
   configuration files,
   all that rot. If you don't use the --go-live option, the output is
 still
  a
   Solr index, it's just that
   the index for each shard is left in a specific directory on HDFS. Being
  on
   HDFS allows
   this kind of M/R paradigm for massively parallel indexing operations,
 and
   perhaps massively
   complex analysis.
  
   Nowhere is there any low-level non-Solr manipulation of the indexes.
  
   The Flume fork just writes directly to the Solr nodes. It knows about
 the
   ZooKeeper
   ensemble and the collection too and communicates via SolrJ I'm pretty
  sure.
  
   As far as integrating with HDFS, you're right, HA is part of the
 package.
   As far as using
   the Solr indexes for analysis, well you can write anything you want to
  use
   the Solr indexes
   from anywhere in the M/R world and have them available from anywhere in
  the
   cluster. There's
   no real need to even have Solr running, you could use the output from
  MRIT
   and access the
   sub-shards with the EmbeddedSolrServer if you wanted, leaving out all
 the
   pesky servlet
   container stuff.
  
   bq: So why we go for HDFS in the case of analysis if we want to use
 SolrJ
   for this purpose?
   What is the point?
  
   Scale and data access in a nutshell. In the HDFS world, you can scale
   pretty linearly
   with the number of nodes you can rack together.
  
   Frankly though, if your data set is small enough to fit on a single
  machine
   _and_ you can get
   through your analysis in a reasonable time (reasonable here is up to
  you),
   then HDFS
   is probably not worth the hassle. But in the big data world where we're
   talking petabyte scale,
   having HDFS as the underpinning opens up possibilities for working on
  data
   that were
   difficult/impossible with Solr previously.
  
   Best,
   Erick
  
  
  
   On Tue, Aug 5, 2014 at 9:37 PM, Ali Nazemian alinazem...@gmail.com
   wrote:
  
Dear Erick,
I remembered some times ago, somebody asked about what is the point
 of
modify Solr to use HDFS for storing indexes. As far as I remember
   somebody
told him integrating Solr with HDFS has two advantages. 1) having
  hadoop
replication and HA. 2) using indexes and Solr documents for other
   purposes
such as Analysis. So why we go for HDFS in the case of analysis if we
   want
to use SolrJ for this purpose? What is the point?
Regards.
   
   
On Wed, Aug 6, 2014 at 8:59 AM, Ali Nazemian alinazem...@gmail.com
wrote:
   
 Dear Erick,
 Hi,
 Thank you for you reply. Yeah I am aware that SolrJ is my last
  option.
   I
 was thinking about raw I/O operation. So according to your reply
   probably
 it is not applicable somehow. What about the Lily project that
  Michael
 mentioned? Is that consider SolrJ too? Are you aware of Cloudera
   search?
I
 know they provide an integrated Hadoop ecosystem. Do you know what
 is
their
 suggestion?
 Best regards.



 On Wed, Aug 6, 2014 at 12:28 AM, Erick Erickson 
   erickerick...@gmail.com

 wrote:

 What you haven't told us is what you mean by modify the
 index outside Solr. SolrJ? Using raw Lucene? Trying to modify
 things by writing your own codec? Standard Java I/O operations?
 Other?

 You could use SolrJ to connect to an existing Solr server and
 both read and modify at will form your M/R 

Re: solr over hdfs for accessing/ changing indexes outside solr

2014-08-06 Thread Erick Erickson
bq: Are you aware of Cloudera search? I know they provide an integrated
Hadoop ecosystem.

What Cloudera Search does via the MapReduceIndexerTool (MRIT) is create N
sub-indexes for
each shard in the M/R paradigm via EmbeddedSolrServer. Eventually, these
sub-indexes for
each shard are merged (perhaps through some number of levels) in the reduce
phase and
maybe merged into a live Solr instance (--go-live). You'll note that this
tool requires the
address of the ZK ensemble from which it can get the network topology,
configuration files,
all that rot. If you don't use the --go-live option, the output is still a
Solr index, it's just that
the index for each shard is left in a specific directory on HDFS. Being on
HDFS allows
this kind of M/R paradigm for massively parallel indexing operations, and
perhaps massively
complex analysis.

Nowhere is there any low-level non-Solr manipulation of the indexes.

The Flume fork just writes directly to the Solr nodes. It knows about the
ZooKeeper
ensemble and the collection too and communicates via SolrJ I'm pretty sure.

As far as integrating with HDFS, you're right, HA is part of the package.
As far as using
the Solr indexes for analysis, well you can write anything you want to use
the Solr indexes
from anywhere in the M/R world and have them available from anywhere in the
cluster. There's
no real need to even have Solr running, you could use the output from MRIT
and access the
sub-shards with the EmbeddedSolrServer if you wanted, leaving out all the
pesky servlet
container stuff.

bq: So why we go for HDFS in the case of analysis if we want to use SolrJ
for this purpose?
What is the point?

Scale and data access in a nutshell. In the HDFS world, you can scale
pretty linearly
with the number of nodes you can rack together.

Frankly though, if your data set is small enough to fit on a single machine
_and_ you can get
through your analysis in a reasonable time (reasonable here is up to you),
then HDFS
is probably not worth the hassle. But in the big data world where we're
talking petabyte scale,
having HDFS as the underpinning opens up possibilities for working on data
that were
difficult/impossible with Solr previously.

Best,
Erick



On Tue, Aug 5, 2014 at 9:37 PM, Ali Nazemian alinazem...@gmail.com wrote:

 Dear Erick,
 I remembered some times ago, somebody asked about what is the point of
 modify Solr to use HDFS for storing indexes. As far as I remember somebody
 told him integrating Solr with HDFS has two advantages. 1) having hadoop
 replication and HA. 2) using indexes and Solr documents for other purposes
 such as Analysis. So why we go for HDFS in the case of analysis if we want
 to use SolrJ for this purpose? What is the point?
 Regards.


 On Wed, Aug 6, 2014 at 8:59 AM, Ali Nazemian alinazem...@gmail.com
 wrote:

  Dear Erick,
  Hi,
  Thank you for you reply. Yeah I am aware that SolrJ is my last option. I
  was thinking about raw I/O operation. So according to your reply probably
  it is not applicable somehow. What about the Lily project that Michael
  mentioned? Is that consider SolrJ too? Are you aware of Cloudera search?
 I
  know they provide an integrated Hadoop ecosystem. Do you know what is
 their
  suggestion?
  Best regards.
 
 
 
  On Wed, Aug 6, 2014 at 12:28 AM, Erick Erickson erickerick...@gmail.com
 
  wrote:
 
  What you haven't told us is what you mean by modify the
  index outside Solr. SolrJ? Using raw Lucene? Trying to modify
  things by writing your own codec? Standard Java I/O operations?
  Other?
 
  You could use SolrJ to connect to an existing Solr server and
  both read and modify at will form your M/R jobs. But if you're
  thinking of trying to write/modify the segment files by raw I/O
  operations, good luck! I'm 99.99% certain that's going to cause
  you endless grief.
 
  Best,
  Erick
 
 
  On Tue, Aug 5, 2014 at 9:55 AM, Ali Nazemian alinazem...@gmail.com
  wrote:
 
   Actually I am going to do some analysis on the solr data using map
  reduce.
   For this purpose it might be needed to change some part of data or add
  new
   fields from outside solr.
  
  
   On Tue, Aug 5, 2014 at 5:51 PM, Shawn Heisey s...@elyograg.org
 wrote:
  
On 8/5/2014 7:04 AM, Ali Nazemian wrote:
 I changed solr 4.9 to write index and data on hdfs. Now I am going
  to
 connect to those data from the outside of solr for changing some
 of
  the
 values. Could somebody please tell me how that is possible?
 Suppose
  I
   am
 using Hbase over hdfs for do these changes.
   
I don't know how you could safely modify the index without a Lucene
application or another instance of Solr, but if you do manage to
  modify
the index, simply reloading the core or restarting Solr should cause
  it
to pick up the changes. Either you would need to make sure that Solr
never modifies the index, or you would need some way of coordinating
updates so that Solr and the other application would never try to
  modify
the index at 

solr over hdfs for accessing/ changing indexes outside solr

2014-08-05 Thread Ali Nazemian
Dear all,
Hi,
I changed solr 4.9 to write index and data on hdfs. Now I am going to
connect to those data from the outside of solr for changing some of the
values. Could somebody please tell me how that is possible? Suppose I am
using Hbase over hdfs for do these changes.
Best regards.

-- 
A.Nazemian


Re: solr over hdfs for accessing/ changing indexes outside solr

2014-08-05 Thread Shawn Heisey
On 8/5/2014 7:04 AM, Ali Nazemian wrote:
 I changed solr 4.9 to write index and data on hdfs. Now I am going to
 connect to those data from the outside of solr for changing some of the
 values. Could somebody please tell me how that is possible? Suppose I am
 using Hbase over hdfs for do these changes.

I don't know how you could safely modify the index without a Lucene
application or another instance of Solr, but if you do manage to modify
the index, simply reloading the core or restarting Solr should cause it
to pick up the changes. Either you would need to make sure that Solr
never modifies the index, or you would need some way of coordinating
updates so that Solr and the other application would never try to modify
the index at the same time.

Thanks,
Shawn



Re: solr over hdfs for accessing/ changing indexes outside solr

2014-08-05 Thread Michael Della Bitta
Probably the most correct way to modify the index would be to use the
Solr REST API to push your changes out.

Another thing you might want to look at is Lilly. Basically it's a way to
set up a Solr collection as an HBase replication target, so changes to your
HBase table would automatically propagate over to Solr.

http://www.ngdata.com/on-lily-hbase-hadoop-and-solr/

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Tue, Aug 5, 2014 at 9:04 AM, Ali Nazemian alinazem...@gmail.com wrote:

 Dear all,
 Hi,
 I changed solr 4.9 to write index and data on hdfs. Now I am going to
 connect to those data from the outside of solr for changing some of the
 values. Could somebody please tell me how that is possible? Suppose I am
 using Hbase over hdfs for do these changes.
 Best regards.

 --
 A.Nazemian



Re: solr over hdfs for accessing/ changing indexes outside solr

2014-08-05 Thread Ali Nazemian
Actually I am going to do some analysis on the solr data using map reduce.
For this purpose it might be needed to change some part of data or add new
fields from outside solr.


On Tue, Aug 5, 2014 at 5:51 PM, Shawn Heisey s...@elyograg.org wrote:

 On 8/5/2014 7:04 AM, Ali Nazemian wrote:
  I changed solr 4.9 to write index and data on hdfs. Now I am going to
  connect to those data from the outside of solr for changing some of the
  values. Could somebody please tell me how that is possible? Suppose I am
  using Hbase over hdfs for do these changes.

 I don't know how you could safely modify the index without a Lucene
 application or another instance of Solr, but if you do manage to modify
 the index, simply reloading the core or restarting Solr should cause it
 to pick up the changes. Either you would need to make sure that Solr
 never modifies the index, or you would need some way of coordinating
 updates so that Solr and the other application would never try to modify
 the index at the same time.

 Thanks,
 Shawn




-- 
A.Nazemian


Re: solr over hdfs for accessing/ changing indexes outside solr

2014-08-05 Thread Erick Erickson
What you haven't told us is what you mean by modify the
index outside Solr. SolrJ? Using raw Lucene? Trying to modify
things by writing your own codec? Standard Java I/O operations?
Other?

You could use SolrJ to connect to an existing Solr server and
both read and modify at will form your M/R jobs. But if you're
thinking of trying to write/modify the segment files by raw I/O
operations, good luck! I'm 99.99% certain that's going to cause
you endless grief.

Best,
Erick


On Tue, Aug 5, 2014 at 9:55 AM, Ali Nazemian alinazem...@gmail.com wrote:

 Actually I am going to do some analysis on the solr data using map reduce.
 For this purpose it might be needed to change some part of data or add new
 fields from outside solr.


 On Tue, Aug 5, 2014 at 5:51 PM, Shawn Heisey s...@elyograg.org wrote:

  On 8/5/2014 7:04 AM, Ali Nazemian wrote:
   I changed solr 4.9 to write index and data on hdfs. Now I am going to
   connect to those data from the outside of solr for changing some of the
   values. Could somebody please tell me how that is possible? Suppose I
 am
   using Hbase over hdfs for do these changes.
 
  I don't know how you could safely modify the index without a Lucene
  application or another instance of Solr, but if you do manage to modify
  the index, simply reloading the core or restarting Solr should cause it
  to pick up the changes. Either you would need to make sure that Solr
  never modifies the index, or you would need some way of coordinating
  updates so that Solr and the other application would never try to modify
  the index at the same time.
 
  Thanks,
  Shawn
 
 


 --
 A.Nazemian



Re: solr over hdfs for accessing/ changing indexes outside solr

2014-08-05 Thread Ali Nazemian
Dear Erick,
Hi,
Thank you for you reply. Yeah I am aware that SolrJ is my last option. I
was thinking about raw I/O operation. So according to your reply probably
it is not applicable somehow. What about the Lily project that Michael
mentioned? Is that consider SolrJ too? Are you aware of Cloudera search? I
know they provide an integrated Hadoop ecosystem. Do you know what is their
suggestion?
Best regards.



On Wed, Aug 6, 2014 at 12:28 AM, Erick Erickson erickerick...@gmail.com
wrote:

 What you haven't told us is what you mean by modify the
 index outside Solr. SolrJ? Using raw Lucene? Trying to modify
 things by writing your own codec? Standard Java I/O operations?
 Other?

 You could use SolrJ to connect to an existing Solr server and
 both read and modify at will form your M/R jobs. But if you're
 thinking of trying to write/modify the segment files by raw I/O
 operations, good luck! I'm 99.99% certain that's going to cause
 you endless grief.

 Best,
 Erick


 On Tue, Aug 5, 2014 at 9:55 AM, Ali Nazemian alinazem...@gmail.com
 wrote:

  Actually I am going to do some analysis on the solr data using map
 reduce.
  For this purpose it might be needed to change some part of data or add
 new
  fields from outside solr.
 
 
  On Tue, Aug 5, 2014 at 5:51 PM, Shawn Heisey s...@elyograg.org wrote:
 
   On 8/5/2014 7:04 AM, Ali Nazemian wrote:
I changed solr 4.9 to write index and data on hdfs. Now I am going to
connect to those data from the outside of solr for changing some of
 the
values. Could somebody please tell me how that is possible? Suppose I
  am
using Hbase over hdfs for do these changes.
  
   I don't know how you could safely modify the index without a Lucene
   application or another instance of Solr, but if you do manage to modify
   the index, simply reloading the core or restarting Solr should cause it
   to pick up the changes. Either you would need to make sure that Solr
   never modifies the index, or you would need some way of coordinating
   updates so that Solr and the other application would never try to
 modify
   the index at the same time.
  
   Thanks,
   Shawn
  
  
 
 
  --
  A.Nazemian
 




-- 
A.Nazemian


Re: solr over hdfs for accessing/ changing indexes outside solr

2014-08-05 Thread Ali Nazemian
Dear Erick,
I remembered some times ago, somebody asked about what is the point of
modify Solr to use HDFS for storing indexes. As far as I remember somebody
told him integrating Solr with HDFS has two advantages. 1) having hadoop
replication and HA. 2) using indexes and Solr documents for other purposes
such as Analysis. So why we go for HDFS in the case of analysis if we want
to use SolrJ for this purpose? What is the point?
Regards.


On Wed, Aug 6, 2014 at 8:59 AM, Ali Nazemian alinazem...@gmail.com wrote:

 Dear Erick,
 Hi,
 Thank you for you reply. Yeah I am aware that SolrJ is my last option. I
 was thinking about raw I/O operation. So according to your reply probably
 it is not applicable somehow. What about the Lily project that Michael
 mentioned? Is that consider SolrJ too? Are you aware of Cloudera search? I
 know they provide an integrated Hadoop ecosystem. Do you know what is their
 suggestion?
 Best regards.



 On Wed, Aug 6, 2014 at 12:28 AM, Erick Erickson erickerick...@gmail.com
 wrote:

 What you haven't told us is what you mean by modify the
 index outside Solr. SolrJ? Using raw Lucene? Trying to modify
 things by writing your own codec? Standard Java I/O operations?
 Other?

 You could use SolrJ to connect to an existing Solr server and
 both read and modify at will form your M/R jobs. But if you're
 thinking of trying to write/modify the segment files by raw I/O
 operations, good luck! I'm 99.99% certain that's going to cause
 you endless grief.

 Best,
 Erick


 On Tue, Aug 5, 2014 at 9:55 AM, Ali Nazemian alinazem...@gmail.com
 wrote:

  Actually I am going to do some analysis on the solr data using map
 reduce.
  For this purpose it might be needed to change some part of data or add
 new
  fields from outside solr.
 
 
  On Tue, Aug 5, 2014 at 5:51 PM, Shawn Heisey s...@elyograg.org wrote:
 
   On 8/5/2014 7:04 AM, Ali Nazemian wrote:
I changed solr 4.9 to write index and data on hdfs. Now I am going
 to
connect to those data from the outside of solr for changing some of
 the
values. Could somebody please tell me how that is possible? Suppose
 I
  am
using Hbase over hdfs for do these changes.
  
   I don't know how you could safely modify the index without a Lucene
   application or another instance of Solr, but if you do manage to
 modify
   the index, simply reloading the core or restarting Solr should cause
 it
   to pick up the changes. Either you would need to make sure that Solr
   never modifies the index, or you would need some way of coordinating
   updates so that Solr and the other application would never try to
 modify
   the index at the same time.
  
   Thanks,
   Shawn
  
  
 
 
  --
  A.Nazemian
 




 --
 A.Nazemian




-- 
A.Nazemian