Re: Update schema to get solrdedup working again

2011-05-11 Thread Julien Nioche
Resending to dev@nutch - had sent to markus only


>
>> We still need to do
>> something about the moreindexing filter.
>>
>> https://issues.apache.org/jira/browse/NUTCH-985
>>
>
> For now a quick fix for the moreindexingfilter would be OK, but we can
> maybe create a new issue for 1.4 and rely on Date objects everywhere then
> format it properly in the SOLRWriter. We could of course to the latter now,
> but since I have no time to do it in the short time and don't want to twist
> your arm I'll let you decide
>
>
>
>>
>> On Thursday 05 May 2011 15:34:56 Julien Nioche wrote:
>> > Hi Markus,
>> >
>> > Sorry for the late reply. Definitely +1 to change to Date in the schema,
>> it
>> > is the right thing to do and it's also the right time to do it
>> >
>> > Thanks
>> >
>> > Julien
>> >
>> > On 28 April 2011 12:43, Markus Jelsma 
>> wrote:
>> > > Hi devs,
>> > >
>> > > The Solr schema must be updated as well to get dedup to work in 1.3.
>> This
>> > > is
>> > > because in december last year index-basic seems to have been updated
>> to
>> > > write
>> > > proper formatted dates to Solr but the schema field was still a long.
>> > >
>> > > Somehow Solr accepted (this is a bug) the input but cannot cope with
>> the
>> > > output, nor could Nutch convert the date to the internally used long
>> > > (which it
>> > > now can). The remaining issue is to update the field to use date
>> instead
>> > > of long. But this will break existing Solr set ups for sure because of
>> > > field incompatibility.
>> > >
>> > > I propose to update the field, regardless of current Solr set ups
>> because
>> > > of
>> > > the assumption that 1) an index can always be recreated from segments
>> and
>> > > 2)
>> > > the current indexer assumes the Solr bug remains in 3.1 and higher as
>> > > well.
>> > >
>> > > I haven't tested it with 3.1 but the bug is in 1.4.1 for sure.
>> > >
>> > > Thoughts?
>> > >
>> > > Cheers,
>> > > --
>> > > Markus Jelsma - CTO - Openindex
>> > > http://www.linkedin.com/in/markus17
>> > > 050-8536620 / 06-50258350
>>
>> --
>> Markus Jelsma - CTO - Openindex
>> http://www.linkedin.com/in/markus17
>> 050-8536620 / 06-50258350
>>
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com


Re: Update schema to get solrdedup working again

2011-05-05 Thread Markus Jelsma
Don't worry, the sun is shining! The change is committed. We still need to do 
something about the moreindexing filter.

https://issues.apache.org/jira/browse/NUTCH-985

On Thursday 05 May 2011 15:34:56 Julien Nioche wrote:
> Hi Markus,
> 
> Sorry for the late reply. Definitely +1 to change to Date in the schema, it
> is the right thing to do and it's also the right time to do it
> 
> Thanks
> 
> Julien
> 
> On 28 April 2011 12:43, Markus Jelsma  wrote:
> > Hi devs,
> > 
> > The Solr schema must be updated as well to get dedup to work in 1.3. This
> > is
> > because in december last year index-basic seems to have been updated to
> > write
> > proper formatted dates to Solr but the schema field was still a long.
> > 
> > Somehow Solr accepted (this is a bug) the input but cannot cope with the
> > output, nor could Nutch convert the date to the internally used long
> > (which it
> > now can). The remaining issue is to update the field to use date instead
> > of long. But this will break existing Solr set ups for sure because of
> > field incompatibility.
> > 
> > I propose to update the field, regardless of current Solr set ups because
> > of
> > the assumption that 1) an index can always be recreated from segments and
> > 2)
> > the current indexer assumes the Solr bug remains in 3.1 and higher as
> > well.
> > 
> > I haven't tested it with 3.1 but the bug is in 1.4.1 for sure.
> > 
> > Thoughts?
> > 
> > Cheers,
> > --
> > Markus Jelsma - CTO - Openindex
> > http://www.linkedin.com/in/markus17
> > 050-8536620 / 06-50258350

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Update schema to get solrdedup working again

2011-05-05 Thread Julien Nioche
Hi Markus,

Sorry for the late reply. Definitely +1 to change to Date in the schema, it
is the right thing to do and it's also the right time to do it

Thanks

Julien


On 28 April 2011 12:43, Markus Jelsma  wrote:

> Hi devs,
>
> The Solr schema must be updated as well to get dedup to work in 1.3. This
> is
> because in december last year index-basic seems to have been updated to
> write
> proper formatted dates to Solr but the schema field was still a long.
>
> Somehow Solr accepted (this is a bug) the input but cannot cope with the
> output, nor could Nutch convert the date to the internally used long (which
> it
> now can). The remaining issue is to update the field to use date instead of
> long. But this will break existing Solr set ups for sure because of field
> incompatibility.
>
> I propose to update the field, regardless of current Solr set ups because
> of
> the assumption that 1) an index can always be recreated from segments and
> 2)
> the current indexer assumes the Solr bug remains in 3.1 and higher as well.
>
> I haven't tested it with 3.1 but the bug is in 1.4.1 for sure.
>
> Thoughts?
>
> Cheers,
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com