Re: Parsing dating during indexing - Year Only

2015-06-19 Thread Chris Hostetter

I'm not sure i understand your question ...

if you know that you are only ever going to have the 'year' then why not 
just index the year as an int?

a TrieDateField isn't really of any use to you, because normal date type 
usage (date math, date ranges) are useless because you don't have any real 
date values (ie: it's ambiguous wether 2007 should match 
just_the_year:[2006-06-01T00:00:00Z TO 2007-06-01T00:00:00Z])


If you really need a true date field because *most* of your documents have 
real dates, but only sometimes do you injest documents with only the 
year, and when you injest documents like this you wnat to assume some 
fixed month/day/hour/etc... then you can easily do this with update 
processors ... consider a chain of...

  RegexReplaceProcessorFactory: 
just_the_year: ^(\d+)$ - $1-01-01T00:00:00Z
  CloneFieldUpdateProcessor: 
just_the_year - real_date_field
  FirstFieldValueUpdateProcessorFactory:
real_date_field 

(if a doc already had a value in the real field, ignore the new year only value)

https://lucene.apache.org/solr/5_2_0/solr-core/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html
https://lucene.apache.org/solr/5_2_0/solr-core/org/apache/solr/update/processor/RegexReplaceProcessorFactory.html
https://lucene.apache.org/solr/5_2_0/solr-core/org/apache/solr/update/processor/FirstFieldValueUpdateProcessorFactory.html


: Date: Fri, 19 Jun 2015 13:57:04 -0700 (MST)
: From: levanDev levandev9...@gmail.com
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Parsing dating during indexing - Year Only
: 
: Hello,
: 
: Example csv doc has column 'just_the_year' and value '2010':  
: 
: With the Schema API I can tell the indexing process to treat 'just_the_year'
: as a date field. 
: 
: I know that I can update the solrconfig.xml to correctly parse formats such
: as MM/dd/ (which is awesome) but has anyone tried to covert just the
: year value to a full date (2010-01-01T00:00:00Z) by updating the
: solrconfig.xml?
: 
: I know it's possible to import csv, do the date transformation, export again
: and have everything work nicely but it would be cool to reduce the number of
: steps involved and use the powerful date processor. 
: 
: Thank you, 
: Levan
: 
: 
: 
: --
: View this message in context: 
http://lucene.472066.n3.nabble.com/Parsing-dating-during-indexing-Year-Only-tp4213045.html
: Sent from the Solr - User mailing list archive at Nabble.com.
: 

-Hoss
http://www.lucidworks.com/


Parsing dating during indexing - Year Only

2015-06-19 Thread levanDev
Hello,

Example csv doc has column 'just_the_year' and value '2010':  

With the Schema API I can tell the indexing process to treat 'just_the_year'
as a date field. 

I know that I can update the solrconfig.xml to correctly parse formats such
as MM/dd/ (which is awesome) but has anyone tried to covert just the
year value to a full date (2010-01-01T00:00:00Z) by updating the
solrconfig.xml?

I know it's possible to import csv, do the date transformation, export again
and have everything work nicely but it would be cool to reduce the number of
steps involved and use the powerful date processor. 

Thank you, 
Levan



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Parsing-dating-during-indexing-Year-Only-tp4213045.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Parsing dating during indexing - Year Only

2015-06-19 Thread levanDev
Hi Chris, 

Thank you for taking the time to write the detailed response. Very helpful.
Dealing with interesting formats in the source data and trying to evaluate
various options for our business needs. The second scenario you described
(where some values in the date field are just the year) will either come up
pretty soon for me or will certainly help someone else dealing with that
issue currently. 

Thank you,  
Levan



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Parsing-date-during-indexing-Year-Only-tp4213045p4213065.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Parsing dating during indexing - Year Only

2015-06-19 Thread Erick Erickson
Hmm, I can see some things you couldn't do with just using
a tint field for the year. Or rather, some things that wouldn't
be as convenient

But this might help:
http://lucene.apache.org/solr/5_2_0/solr-core/org/apache/solr/update/processor/ParseDateFieldUpdateProcessorFactory.html

or you can also consider a http://wiki.apache.org/solr/ScriptUpdateProcessor

Best,
Erick

On Fri, Jun 19, 2015 at 1:57 PM, levanDev levandev9...@gmail.com wrote:
 Hello,

 Example csv doc has column 'just_the_year' and value '2010':

 With the Schema API I can tell the indexing process to treat 'just_the_year'
 as a date field.

 I know that I can update the solrconfig.xml to correctly parse formats such
 as MM/dd/ (which is awesome) but has anyone tried to covert just the
 year value to a full date (2010-01-01T00:00:00Z) by updating the
 solrconfig.xml?

 I know it's possible to import csv, do the date transformation, export again
 and have everything work nicely but it would be cool to reduce the number of
 steps involved and use the powerful date processor.

 Thank you,
 Levan



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Parsing-dating-during-indexing-Year-Only-tp4213045.html
 Sent from the Solr - User mailing list archive at Nabble.com.