Invalid_Date_String on posting XML to the index

2009-04-16 Thread Mark Allan

Hi all,

I'm encountering a problem when I try to add records with a date field  
to the index.


The records I'm adding have very little date precision, usually  
MMDD but some only have year and month, others only have a year.   
I'm trying to get around this by using a text pattern factory to  
modify the field before indexing.  This seems to work fine if the  
class is solr.TextField and a date will be converted from eg 1953 to  
1953-01-01T00:00:00.000Z and then inserted into the index.


However, if I want to have the field as an actual date field (for  
doing range searches etc) I get the following error when I post the  
XML file.


SimplePostTool: FATAL: Solr returned an error: Invalid_Date_String1953

The corresponding stack trace from the solr server is:

Apr 15, 2009 4:27:26 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Invalid Date String:'1953'
at org.apache.solr.schema.DateField.parseMath(DateField.java:167)
at org.apache.solr.schema.DateField.toInternal(DateField.java:138)
at org.apache.solr.schema.FieldType.createField(FieldType.java:179)
at org.apache.solr.schema.SchemaField.createField(SchemaField.java:93)
	at  
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java: 
243)
	at  
org 
.apache 
.solr 
.update 
.processor 
.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:58)
	at  
org 
.apache 
.solr 
.handler 
.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196)
	at  
org 
.apache 
.solr 
.handler 
.XmlUpdateRequestHandler 
.handleRequestBody(XmlUpdateRequestHandler.java:123)
	at  
org 
.apache 
.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
131)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
	at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
	at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
	at org.mortbay.jetty.servlet.ServletHandler 
$CachedChain.doFilter(ServletHandler.java:1089)
	at  
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
	at  
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java: 
216)
	at  
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
	at  
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java: 
405)
	at  
org 
.mortbay 
.jetty 
.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java: 
211)
	at  
org 
.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java: 
114)
	at  
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)

at org.mortbay.jetty.Server.handle(Server.java:285)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java: 
502)
	at org.mortbay.jetty.HttpConnection 
$RequestHandler.content(HttpConnection.java:835)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
	at org.mortbay.jetty.bio.SocketConnector 
$Connection.run(SocketConnector.java:226)
	at org.mortbay.thread.BoundedThreadPool 
$PoolThread.run(BoundedThreadPool.java:442)


My schema.xml file looks something like this:

...
   fieldType name=dateFormatter class=solr.DateField  
sortMissingLast=true omitNorms=true

analyzer
filter class=solr.TrimFilterFactory /
tokenizer class=solr.KeywordTokenizerFactory/
			filter class=solr.PatternReplaceFilterFactory pattern=^(\d{4}) 
$ replacement=$1.01.01 replace=all /
			filter class=solr.PatternReplaceFilterFactory pattern=^(\d{4})\. 
(\d{2})$ replacement=$1.$2.01 replace=all /
			filter class=solr.PatternReplaceFilterFactory pattern=^(\d{4})\. 
(\d{2})\.(\d{2})$ replacement=$1-$2-$3T00:00:00.000Z replace=all /

/analyzer
   /fieldType
...
field name=DateRecorded type=dateFormatter indexed=true  
stored=true multiValued=false/

...


My thinking is that Solr is trying to add the field directly as '1953'  
before doing the text factory stuff and is therefore not in the right  
format for indexing.  Does that sound like a reasonable assumption and  
am I missing something which is causing it to go wrong?  Can anyone  
help please?


I was originally storing the date in YYMMDD format as a text field and  
searching with wildcards, but that strikes me as somewhat  
inefficient.  I could go back to doing that if necessary, but I'd  
rather do it the right way if I can.


Many thanks for your help.

Mark
PS. Apologies if this message comes through twice - I sent it  
yesterday afternoon but it hasn't turned up on the mailing list yet,  
so I'm trying again.


--
The University of Edinburgh is a charitable body, registered in
Scotland, 

Re: Invalid_Date_String on posting XML to the index

2009-04-16 Thread Shalin Shekhar Mangar
On Thu, Apr 16, 2009 at 1:20 PM, Mark Allan mark.al...@ed.ac.uk wrote:


 My thinking is that Solr is trying to add the field directly as '1953'
 before doing the text factory stuff and is therefore not in the right format
 for indexing.  Does that sound like a reasonable assumption and am I missing
 something which is causing it to go wrong?  Can anyone help please?


That is correct. You'll need to do the date creation in your own code so
that you send a well-formed date to Solr.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Invalid_Date_String on posting XML to the index

2009-04-16 Thread Mark Allan


On 16 Apr 2009, at 9:00 am, Shalin Shekhar Mangar wrote:

On Thu, Apr 16, 2009 at 1:20 PM, Mark Allan mark.al...@ed.ac.uk  
wrote:


My thinking is that Solr is trying to add the field directly as  
'1953'
before doing the text factory stuff and is therefore not in the  
right format
for indexing.  Does that sound like a reasonable assumption and am  
I missing

something which is causing it to go wrong?  Can anyone help please?


That is correct. You'll need to do the date creation in your own  
code so

that you send a well-formed date to Solr.



Hi, thanks for your prompt reply.  I'm a bit confused though - the  
only way to do this is a two-step process?


I have to write code to munge the XML into another document which is  
exactly the same except for the format of the Date field, and then  
import that second file?  Isn't that the whole purpose of having an  
analyzer with the solr.PatternReplaceFilterFactory filters?  What's  
odd is that the pattern replacement works if I store the field as text  
but not as a date.  Are you sure this isn't a bug?


Mark

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: Invalid_Date_String on posting XML to the index

2009-04-16 Thread Shalin Shekhar Mangar
On Thu, Apr 16, 2009 at 1:45 PM, Mark Allan mark.al...@ed.ac.uk wrote:


 Hi, thanks for your prompt reply.  I'm a bit confused though - the only way
 to do this is a two-step process?

 I have to write code to munge the XML into another document which is
 exactly the same except for the format of the Date field, and then import
 that second file?  Isn't that the whole purpose of having an analyzer with
 the solr.PatternReplaceFilterFactory filters?  What's odd is that the
 pattern replacement works if I store the field as text but not as a date.
  Are you sure this isn't a bug?


Analyzers are applied only for the indexed value but not the stored value. A
value which is added to DateField is converted to the same internal format
(for both indexing and storing purposes) and then added to the index. The
DateField#toInternal method is the one which is attempting to parse the
string into a date and failing when the field is created.

There is another option. You could create a class which extends DateField
and overrides toInternal(String) to do the conversion. You can specify this
class in the schema.xml instead of DateField.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Invalid_Date_String on posting XML to the index

2009-04-16 Thread Shalin Shekhar Mangar
On Thu, Apr 16, 2009 at 1:45 PM, Mark Allan mark.al...@ed.ac.uk wrote:


 Hi, thanks for your prompt reply.  I'm a bit confused though - the only way
 to do this is a two-step process?

 I have to write code to munge the XML into another document which is
 exactly the same except for the format of the Date field, and then import
 that second file?  Isn't that the whole purpose of having an analyzer with
 the solr.PatternReplaceFilterFactory filters?  What's odd is that the
 pattern replacement works if I store the field as text but not as a date.
  Are you sure this isn't a bug?


Analyzers are applied only for the indexed value but not the stored value. A
value which is added to DateField is converted to the same internal format
(for both indexing and storing purposes) and then added to the index. The
DateField#toInternal method is the one which is attempting to parse the
string into a date and failing when the field is created.

There is another option. You could create a class which extends DateField
and overrides toInternal(String) to do the conversion. You can specify this
class in the schema.xml instead of DateField.

-- 
Regards,
Shalin Shekhar Mangar.