[jira] Commented: (SOLR-1045) Build Solr index using Hadoop MapReduce

2010-02-02 Thread Kevin Peterson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828962#action_12828962
 ] 

Kevin Peterson commented on SOLR-1045:
--

Can anyone using this code comment on how this relates to SOLR-1301?

https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828915#action_12828915

These seem to have identical goals but very different approaches.

> Build Solr index using Hadoop MapReduce
> ---
>
> Key: SOLR-1045
> URL: https://issues.apache.org/jira/browse/SOLR-1045
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ning Li
> Fix For: 1.5
>
> Attachments: SOLR-1045.0.patch
>
>
> The goal is a contrib module that builds Solr index using Hadoop MapReduce.
> It is different from the Solr support in Nutch. The Solr support in Nutch 
> sends a document to a Solr server in a reduce task. Here, the goal is to 
> build/update Solr index within map/reduce tasks. Also, it achieves better 
> parallelism when the number of map tasks is greater than the number of reduce 
> tasks, which is usually the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Build Solr index using Hadoop MapReduce

2009-12-07 Thread JerylCook

 Build Solr index using Hadoop MapReduce
http://issues.apache.org/jira/browse/SOLR-1045


Ning Li-3 wrote:
> 
> SOLR-1045 it is. More details will be available in that issue.
> 
> Marc, you can check out Hadoop contrib/index which builds a Lucene
> index using Hadoop MapReduce. However, it does not handle duplicate
> detection.
> 
> Cheers,
> Ning
> 
> 
> On Mon, Mar 2, 2009 at 4:25 PM, Marc Sturlese 
> wrote:
>>
>> I am doing some research about creating lucene/solr index using hadoop
>> but
>> there's not so much info around, would be great to see some code!!! (I am
>> experiencing problems specially in duplication detection)
>> Thanks
>>
>> Shalin Shekhar Mangar wrote:
>>>
>>> On Mon, Mar 2, 2009 at 11:24 PM, Ning Li  wrote:
>>>
>>>> Hi,
>>>>
>>>> I wonder if there is interest in a contrib module that builds Solr
>>>> index using Hadoop MapReduce?
>>>>
>>>
>>> Absolutely!
>>>
>>>
>>>> It is different from the Solr support in Nutch. The Solr support in
>>>> Nutch sends a document to a Solr server in a reduce task. Here, I aim
>>>> at building/updating Solr index within map/reduce tasks. Also, it
>>>> achieves better parallelism when the number of map tasks is greater
>>>> than the number of reduce tasks, which is usually the case.
>>>>
>>>> I worked out a very simple initial version. But I want to check if
>>>> there is any interest before proceeding. If so, I'll open a Jira
>>>> issue.
>>>>
>>>
>>> +1
>>>
>>> Please do. It'd be great to see this in Solr.
>>>
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Build-Solr-index-using-Hadoop-MapReduce-tp22293172p22296832.html
>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Build-Solr-index-using-Hadoop-MapReduce-tp22293172p26684154.html
Sent from the Solr - Dev mailing list archive at Nabble.com.



[jira] Updated: (SOLR-1045) Build Solr index using Hadoop MapReduce

2009-10-12 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-1045:


Fix Version/s: 1.5

> Build Solr index using Hadoop MapReduce
> ---
>
> Key: SOLR-1045
> URL: https://issues.apache.org/jira/browse/SOLR-1045
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ning Li
> Fix For: 1.5
>
> Attachments: SOLR-1045.0.patch
>
>
> The goal is a contrib module that builds Solr index using Hadoop MapReduce.
> It is different from the Solr support in Nutch. The Solr support in Nutch 
> sends a document to a Solr server in a reduce task. Here, the goal is to 
> build/update Solr index within map/reduce tasks. Also, it achieves better 
> parallelism when the number of map tasks is greater than the number of reduce 
> tasks, which is usually the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1045) Build Solr index using Hadoop MapReduce

2009-08-24 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747070#action_12747070
 ] 

Lance Norskog commented on SOLR-1045:
-

Map/Reduce would also be useful in the DataImportHandler. We're talking about 
parallelizing analysis stacks that require a lot of CPU. I would rather push 
this sort of thing out into the DIH - Solr Cell, for example. The DIH 
declaration language could have something like the ANT parallelization 
directives.

At this level of multi-threaded sophistication, Solr really wants to be an OSGi 
application instead of a custom-built mini application server.

> Build Solr index using Hadoop MapReduce
> ---
>
> Key: SOLR-1045
> URL: https://issues.apache.org/jira/browse/SOLR-1045
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ning Li
> Attachments: SOLR-1045.0.patch
>
>
> The goal is a contrib module that builds Solr index using Hadoop MapReduce.
> It is different from the Solr support in Nutch. The Solr support in Nutch 
> sends a document to a Solr server in a reduce task. Here, the goal is to 
> build/update Solr index within map/reduce tasks. Also, it achieves better 
> parallelism when the number of map tasks is greater than the number of reduce 
> tasks, which is usually the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1045) Build Solr index using Hadoop MapReduce

2009-08-22 Thread Alex Baranov (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746545#action_12746545
 ] 

Alex Baranov commented on SOLR-1045:


{quote}write a Solr index in a ram directory{quote}

Please, take a look at [https://issues.apache.org/jira/browse/SOLR-1379] - 
RAMDirectoryFactory

> Build Solr index using Hadoop MapReduce
> ---
>
> Key: SOLR-1045
> URL: https://issues.apache.org/jira/browse/SOLR-1045
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ning Li
> Attachments: SOLR-1045.0.patch
>
>
> The goal is a contrib module that builds Solr index using Hadoop MapReduce.
> It is different from the Solr support in Nutch. The Solr support in Nutch 
> sends a document to a Solr server in a reduce task. Here, the goal is to 
> build/update Solr index within map/reduce tasks. Also, it achieves better 
> parallelism when the number of map tasks is greater than the number of reduce 
> tasks, which is usually the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1045) Build Solr index using Hadoop MapReduce

2009-03-04 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678770#action_12678770
 ] 

Yonik Seeley commented on SOLR-1045:


The "solr index" is normally just a Lucene index which has been indexed 
according to the particular schema.
There are exceptions, as you have noted:
 - the spell check index
 - ExternalFileField

It's worth keeping these in mind, and perhaps could be useful to be able to 
handle at some point, but it certainly doesn't seem critical.

> Build Solr index using Hadoop MapReduce
> ---
>
> Key: SOLR-1045
> URL: https://issues.apache.org/jira/browse/SOLR-1045
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ning Li
> Attachments: SOLR-1045.0.patch
>
>
> The goal is a contrib module that builds Solr index using Hadoop MapReduce.
> It is different from the Solr support in Nutch. The Solr support in Nutch 
> sends a document to a Solr server in a reduce task. Here, the goal is to 
> build/update Solr index within map/reduce tasks. Also, it achieves better 
> parallelism when the number of map tasks is greater than the number of reduce 
> tasks, which is usually the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1045) Build Solr index using Hadoop MapReduce

2009-03-04 Thread Ning Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678763#action_12678763
 ] 

Ning Li commented on SOLR-1045:
---

Shalin and Yonik, thanks for the comments on the two features. But what is a 
Solr index? I thought it is everything in the data directory, not just the 
Lucene index in the data/index directory, no? If that's the case:
  - On writing a Solr index in a ram directory, I'm aware of the directory 
factory, but it's only for the directory of Lucene index.
  - On merging multiple Solr indexes, besides merging the Lucene indexes, it 
also means somehow "merging" other data in the data directory (e.g. "merging" 
by rebuilding the spell check index).

Am I correct?

> Build Solr index using Hadoop MapReduce
> ---
>
> Key: SOLR-1045
> URL: https://issues.apache.org/jira/browse/SOLR-1045
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ning Li
> Attachments: SOLR-1045.0.patch
>
>
> The goal is a contrib module that builds Solr index using Hadoop MapReduce.
> It is different from the Solr support in Nutch. The Solr support in Nutch 
> sends a document to a Solr server in a reduce task. Here, the goal is to 
> build/update Solr index within map/reduce tasks. Also, it achieves better 
> parallelism when the number of map tasks is greater than the number of reduce 
> tasks, which is usually the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1045) Build Solr index using Hadoop MapReduce

2009-03-03 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678481#action_12678481
 ] 

Yonik Seeley commented on SOLR-1045:


bq. merge multiple Solr indexes into one Solr index

+1
I think Solr should support multiple local indexes (call them fragments?) per 
"index" and be able to perform operations such as merging.
I mentioned this here a while ago too:
http://www.lucidimagination.com/search/document/de518893396af002/solr2_onward_and_upward

> Build Solr index using Hadoop MapReduce
> ---
>
> Key: SOLR-1045
> URL: https://issues.apache.org/jira/browse/SOLR-1045
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ning Li
> Attachments: SOLR-1045.0.patch
>
>
> The goal is a contrib module that builds Solr index using Hadoop MapReduce.
> It is different from the Solr support in Nutch. The Solr support in Nutch 
> sends a document to a Solr server in a reduce task. Here, the goal is to 
> build/update Solr index within map/reduce tasks. Also, it achieves better 
> parallelism when the number of map tasks is greater than the number of reduce 
> tasks, which is usually the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1045) Build Solr index using Hadoop MapReduce

2009-03-03 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678436#action_12678436
 ] 

Shalin Shekhar Mangar commented on SOLR-1045:
-

bq. write a Solr index in a ram directory

It is possible to use a RAMDirectory but I haven't tried. See SOLR-465 for 
details.

bq. merge multiple Solr indexes into one Solr index

Please go ahead. Do you mean merging indexes of two solr cores? I have thought 
of exposing that as a CoreAdmin command.

> Build Solr index using Hadoop MapReduce
> ---
>
> Key: SOLR-1045
> URL: https://issues.apache.org/jira/browse/SOLR-1045
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ning Li
> Attachments: SOLR-1045.0.patch
>
>
> The goal is a contrib module that builds Solr index using Hadoop MapReduce.
> It is different from the Solr support in Nutch. The Solr support in Nutch 
> sends a document to a Solr server in a reduce task. Here, the goal is to 
> build/update Solr index within map/reduce tasks. Also, it achieves better 
> parallelism when the number of map tasks is greater than the number of reduce 
> tasks, which is usually the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1045) Build Solr index using Hadoop MapReduce

2009-03-03 Thread Ning Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678431#action_12678431
 ] 

Ning Li commented on SOLR-1045:
---

Building Solr index (the data directory) in a mapreduce job also means we 
should be able to:
  - write a Solr index in a ram directory
  - merge multiple Solr indexes into one Solr index

Any objections if I open Jira issues on supporting these two features?

> Build Solr index using Hadoop MapReduce
> ---
>
> Key: SOLR-1045
> URL: https://issues.apache.org/jira/browse/SOLR-1045
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ning Li
> Attachments: SOLR-1045.0.patch
>
>
> The goal is a contrib module that builds Solr index using Hadoop MapReduce.
> It is different from the Solr support in Nutch. The Solr support in Nutch 
> sends a document to a Solr server in a reduce task. Here, the goal is to 
> build/update Solr index within map/reduce tasks. Also, it achieves better 
> parallelism when the number of map tasks is greater than the number of reduce 
> tasks, which is usually the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1045) Build Solr index using Hadoop MapReduce

2009-03-03 Thread Ning Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678373#action_12678373
 ] 

Ning Li commented on SOLR-1045:
---

If SolrCore supports an indexing-only mode, no resource will be spent on 
search, which is not used by the mapreduce job. If you feel this is 
"good-to-have" instead of "must-have", then I think this is an important 
"good-to-have".

> Build Solr index using Hadoop MapReduce
> ---
>
> Key: SOLR-1045
> URL: https://issues.apache.org/jira/browse/SOLR-1045
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ning Li
> Attachments: SOLR-1045.0.patch
>
>
> The goal is a contrib module that builds Solr index using Hadoop MapReduce.
> It is different from the Solr support in Nutch. The Solr support in Nutch 
> sends a document to a Solr server in a reduce task. Here, the goal is to 
> build/update Solr index within map/reduce tasks. Also, it achieves better 
> parallelism when the number of map tasks is greater than the number of reduce 
> tasks, which is usually the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1045) Build Solr index using Hadoop MapReduce

2009-03-03 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678284#action_12678284
 ] 

Noble Paul commented on SOLR-1045:
--

bq. First is to make SolrCore support an indexing-only mode (i.e. no search)

why is this a pre-requisite?

> Build Solr index using Hadoop MapReduce
> ---
>
> Key: SOLR-1045
> URL: https://issues.apache.org/jira/browse/SOLR-1045
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ning Li
> Attachments: SOLR-1045.0.patch
>
>
> The goal is a contrib module that builds Solr index using Hadoop MapReduce.
> It is different from the Solr support in Nutch. The Solr support in Nutch 
> sends a document to a Solr server in a reduce task. Here, the goal is to 
> build/update Solr index within map/reduce tasks. Also, it achieves better 
> parallelism when the number of map tasks is greater than the number of reduce 
> tasks, which is usually the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1045) Build Solr index using Hadoop MapReduce

2009-03-02 Thread Ning Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678200#action_12678200
 ] 

Ning Li commented on SOLR-1045:
---

The purpose of this simple initial version is to give people an idea of the 
functionality. It uses Hadoop contrib/index, which uses Hadoop mapred package. 
Future versions will be very different from this version. The main difference 
is that in this version, after a Solr input document is converted to a Lucene 
document, a Lucene index writer is used to build the index. In future versions, 
a Solr writer/core will be used.

Here are some pre-requisites for this issue:
  - Hadoop 0.20. Hadoop 0.20 is to be released. There are two features in 0.20 
that are important for this issue.
First is the new mapreduce package. The flexibility of the new mapreduce 
api makes it possible to use a Solr writer/core in mapper tasks.
Second is the upgrade to Jetty 6 (6.1.14). The current release 0.19 uses 
Jetty 5.

  - There are a couple of changes required in Solr.
First is to make SolrCore support an indexing-only mode (i.e. no search). 
Only then is it feasible to use it for indexing in a map task.
Second is to upgrate from Jetty 6.1.3 to Jetty 6.1.14. Hadoop 0.20 uses a 
feature that is not available in 6.1.3.

What do you think about making "SolrCore support an indexing-only mode"?


> Build Solr index using Hadoop MapReduce
> ---
>
> Key: SOLR-1045
> URL: https://issues.apache.org/jira/browse/SOLR-1045
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ning Li
> Attachments: SOLR-1045.0.patch
>
>
> The goal is a contrib module that builds Solr index using Hadoop MapReduce.
> It is different from the Solr support in Nutch. The Solr support in Nutch 
> sends a document to a Solr server in a reduce task. Here, the goal is to 
> build/update Solr index within map/reduce tasks. Also, it achieves better 
> parallelism when the number of map tasks is greater than the number of reduce 
> tasks, which is usually the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1045) Build Solr index using Hadoop MapReduce

2009-03-02 Thread Ning Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Li updated SOLR-1045:
--

Attachment: SOLR-1045.0.patch

> Build Solr index using Hadoop MapReduce
> ---
>
> Key: SOLR-1045
> URL: https://issues.apache.org/jira/browse/SOLR-1045
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ning Li
> Attachments: SOLR-1045.0.patch
>
>
> The goal is a contrib module that builds Solr index using Hadoop MapReduce.
> It is different from the Solr support in Nutch. The Solr support in Nutch 
> sends a document to a Solr server in a reduce task. Here, the goal is to 
> build/update Solr index within map/reduce tasks. Also, it achieves better 
> parallelism when the number of map tasks is greater than the number of reduce 
> tasks, which is usually the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Build Solr index using Hadoop MapReduce

2009-03-02 Thread Ning Li
SOLR-1045 it is. More details will be available in that issue.

Marc, you can check out Hadoop contrib/index which builds a Lucene
index using Hadoop MapReduce. However, it does not handle duplicate
detection.

Cheers,
Ning


On Mon, Mar 2, 2009 at 4:25 PM, Marc Sturlese  wrote:
>
> I am doing some research about creating lucene/solr index using hadoop but
> there's not so much info around, would be great to see some code!!! (I am
> experiencing problems specially in duplication detection)
> Thanks
>
> Shalin Shekhar Mangar wrote:
>>
>> On Mon, Mar 2, 2009 at 11:24 PM, Ning Li  wrote:
>>
>>> Hi,
>>>
>>> I wonder if there is interest in a contrib module that builds Solr
>>> index using Hadoop MapReduce?
>>>
>>
>> Absolutely!
>>
>>
>>> It is different from the Solr support in Nutch. The Solr support in
>>> Nutch sends a document to a Solr server in a reduce task. Here, I aim
>>> at building/updating Solr index within map/reduce tasks. Also, it
>>> achieves better parallelism when the number of map tasks is greater
>>> than the number of reduce tasks, which is usually the case.
>>>
>>> I worked out a very simple initial version. But I want to check if
>>> there is any interest before proceeding. If so, I'll open a Jira
>>> issue.
>>>
>>
>> +1
>>
>> Please do. It'd be great to see this in Solr.
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Build-Solr-index-using-Hadoop-MapReduce-tp22293172p22296832.html
> Sent from the Solr - Dev mailing list archive at Nabble.com.
>
>


[jira] Created: (SOLR-1045) Build Solr index using Hadoop MapReduce

2009-03-02 Thread Ning Li (JIRA)
Build Solr index using Hadoop MapReduce
---

 Key: SOLR-1045
 URL: https://issues.apache.org/jira/browse/SOLR-1045
 Project: Solr
  Issue Type: New Feature
Reporter: Ning Li


The goal is a contrib module that builds Solr index using Hadoop MapReduce.

It is different from the Solr support in Nutch. The Solr support in Nutch sends 
a document to a Solr server in a reduce task. Here, the goal is to build/update 
Solr index within map/reduce tasks. Also, it achieves better parallelism when 
the number of map tasks is greater than the number of reduce tasks, which is 
usually the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Build Solr index using Hadoop MapReduce

2009-03-02 Thread Marc Sturlese

I am doing some research about creating lucene/solr index using hadoop but
there's not so much info around, would be great to see some code!!! (I am
experiencing problems specially in duplication detection)
Thanks

Shalin Shekhar Mangar wrote:
> 
> On Mon, Mar 2, 2009 at 11:24 PM, Ning Li  wrote:
> 
>> Hi,
>>
>> I wonder if there is interest in a contrib module that builds Solr
>> index using Hadoop MapReduce?
>>
> 
> Absolutely!
> 
> 
>> It is different from the Solr support in Nutch. The Solr support in
>> Nutch sends a document to a Solr server in a reduce task. Here, I aim
>> at building/updating Solr index within map/reduce tasks. Also, it
>> achieves better parallelism when the number of map tasks is greater
>> than the number of reduce tasks, which is usually the case.
>>
>> I worked out a very simple initial version. But I want to check if
>> there is any interest before proceeding. If so, I'll open a Jira
>> issue.
>>
> 
> +1
> 
> Please do. It'd be great to see this in Solr.
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Build-Solr-index-using-Hadoop-MapReduce-tp22293172p22296832.html
Sent from the Solr - Dev mailing list archive at Nabble.com.



Re: Build Solr index using Hadoop MapReduce

2009-03-02 Thread Shalin Shekhar Mangar
On Mon, Mar 2, 2009 at 11:24 PM, Ning Li  wrote:

> Hi,
>
> I wonder if there is interest in a contrib module that builds Solr
> index using Hadoop MapReduce?
>

Absolutely!


> It is different from the Solr support in Nutch. The Solr support in
> Nutch sends a document to a Solr server in a reduce task. Here, I aim
> at building/updating Solr index within map/reduce tasks. Also, it
> achieves better parallelism when the number of map tasks is greater
> than the number of reduce tasks, which is usually the case.
>
> I worked out a very simple initial version. But I want to check if
> there is any interest before proceeding. If so, I'll open a Jira
> issue.
>

+1

Please do. It'd be great to see this in Solr.

-- 
Regards,
Shalin Shekhar Mangar.


Build Solr index using Hadoop MapReduce

2009-03-02 Thread Ning Li
Hi,

I wonder if there is interest in a contrib module that builds Solr
index using Hadoop MapReduce?

It is different from the Solr support in Nutch. The Solr support in
Nutch sends a document to a Solr server in a reduce task. Here, I aim
at building/updating Solr index within map/reduce tasks. Also, it
achieves better parallelism when the number of map tasks is greater
than the number of reduce tasks, which is usually the case.

I worked out a very simple initial version. But I want to check if
there is any interest before proceeding. If so, I'll open a Jira
issue.

Cheers,
Ning