I just created the issue NUTCH-923. I'll try to provide a patch in the next days.
Matthias On Wed, Oct 20, 2010 at 4:18 PM, Markus Jelsma <markus.jel...@openindex.io>wrote: > Hi, > > The LanguageIndexingFilter adds a lang field to the NutchDocument object > which > in turn can be read in the Solr indexer. > > You other suggestion seems separate from this issue but you could open a > ticket for that one, or you can use the subcollection plugin to set a > value, > although this seems a bit overkill ;) > > Cheers, > > On Wednesday 20 October 2010 16:04:25 Robert Douglass wrote: > > This highlights a general need in this mapping scheme, and yes, the > > ${lang} approach is desirable. > > > > The general need is to be able to pass more information into Solr than > > just a simple 1:1 mapping of existing values to Solr fields. Another > > area where I've run into this need is the case of passing a constant > > field into Solr that identifies this document as coming from Nutch. > > > > In my Solr schema, there is an "entity" field which tracks the kind of > > document we're dealing with. For those documents coming from Nutch, I'd > > like to be able to do something like: > > > > <field dest="entity" value="nutch"/> > > or > > <field dest="entity" source="some nutch field" default="nutch"/> > > (the second example would only use the default if the source were null) > > > > Note that in the case of ${lang}, this doesn't seem to be one of the > > available NutchField's (see write() in indexer.solr.SolrWriter.java); is > > there a configuration of Nutch that makes language available at that > > point in execution? > > > > On 10/20/2010 03:50 PM, Markus Jelsma wrote: > > > Hi, > > > > > > I believe this is very useful indeed. I'd go for the ${lang} method > > > because it allows you to keep your own preferred Solr schema > namespacing > > > for languages. The first method isn't clear on how the fields are named > > > in the Solr schema. > > > > > > Other thoughts in this one? > > > > > > Could you open an issue in Jira? > > > > > > Cheers, > > > > > > On Wednesday, October 20, 2010 09:27:42 am Matthias Paul wrote: > > >> Hi, > > >> > > >> I'm using Nutch with the language-identifier plugin enabled to detect > > >> the language of the html-pages. For indexing I use a Solr server. > > >> So far everything works but there's one problem: I don't know how to > map > > >> multilingual fields to their corresponding Solr-field. > > >> The mapping file solrindex-mapping.xml contains the following: > > >> <field dest="lang" source="lang"/> > > >> <field dest="title" source="title"/> > > >> > > >> But what I would like to have is the following > > >> <field dest="lang" source="lang"/> > > >> <field dest="title" source="title" multilingual="true" > language="lang"/> > > >> or maybe > > >> <field dest="lang" source="lang"/> > > >> <field dest="title_${lang}" source="title" /> > > >> so that the title-field gets mapped to title_en for English-pages and > > >> tilte_fr for French pages. > > >> > > >> I found the SolrWriter- and SorlMappingReader-classes in the > > >> source-code, an it should be easy to integrate it there. > > >> What do you think? Could this be useful also to others? > > >> Or are there any other solutions out there? > > >> > > >> Thanks > > >> Matthias > > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536600 / 06-50258350 >