I just created the issue NUTCH-923.
I'll try to provide a patch in the next days.

Matthias

On Wed, Oct 20, 2010 at 4:18 PM, Markus Jelsma
<markus.jel...@openindex.io>wrote:

> Hi,
>
> The LanguageIndexingFilter adds a lang field to the NutchDocument object
> which
> in turn can be read in the Solr indexer.
>
> You other suggestion seems separate from this issue but you could open a
> ticket for that one, or you can use the subcollection plugin to set a
> value,
> although this seems a bit overkill ;)
>
> Cheers,
>
> On Wednesday 20 October 2010 16:04:25 Robert Douglass wrote:
> > This highlights a general need in this mapping scheme, and yes, the
> > ${lang} approach is desirable.
> >
> > The general need is to be able to pass more information into Solr than
> > just a simple 1:1 mapping of existing values to Solr fields. Another
> > area where I've run into this need is the case of passing a constant
> > field into Solr that identifies this document as coming from Nutch.
> >
> > In my Solr schema, there is an "entity" field which tracks the kind of
> > document we're dealing with. For those documents coming from Nutch, I'd
> > like to be able to do something like:
> >
> > <field dest="entity" value="nutch"/>
> > or
> > <field dest="entity" source="some nutch field" default="nutch"/>
> > (the second example would only use the default if the source were null)
> >
> > Note that in the case of ${lang}, this doesn't seem to be one of the
> > available NutchField's (see write() in indexer.solr.SolrWriter.java); is
> > there a configuration of Nutch that makes language available at that
> > point in execution?
> >
> > On 10/20/2010 03:50 PM, Markus Jelsma wrote:
> > > Hi,
> > >
> > > I believe this is very useful indeed. I'd go for the ${lang} method
> > > because it allows you to keep your own preferred Solr schema
> namespacing
> > > for languages. The first method isn't clear on how the fields are named
> > > in the Solr schema.
> > >
> > > Other thoughts in this one?
> > >
> > > Could you open an issue in Jira?
> > >
> > > Cheers,
> > >
> > > On Wednesday, October 20, 2010 09:27:42 am Matthias Paul wrote:
> > >> Hi,
> > >>
> > >> I'm using Nutch with the language-identifier plugin enabled to detect
> > >> the language of the html-pages. For indexing I use a Solr server.
> > >> So far everything works but there's one problem: I don't know how to
> map
> > >> multilingual fields to their corresponding Solr-field.
> > >> The mapping file solrindex-mapping.xml contains the following:
> > >> <field dest="lang" source="lang"/>
> > >> <field dest="title" source="title"/>
> > >>
> > >> But what I would like to have is the following
> > >> <field dest="lang" source="lang"/>
> > >> <field dest="title" source="title" multilingual="true"
> language="lang"/>
> > >> or maybe
> > >> <field dest="lang" source="lang"/>
> > >> <field dest="title_${lang}" source="title" />
> > >> so that the title-field gets mapped to title_en for English-pages and
> > >> tilte_fr for French pages.
> > >>
> > >> I found the SolrWriter- and SorlMappingReader-classes in the
> > >> source-code, an it should be easy to integrate it there.
> > >> What do you think? Could this be useful also to others?
> > >> Or are there any other solutions out there?
> > >>
> > >> Thanks
> > >> Matthias
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536600 / 06-50258350
>

Reply via email to