Re: how to get modified field data if it doesn't exist in meta
Hi, Who can compile me this to jar file? (I found something similar i need in google: ( http://stackoverflow.com/questions/20745935/set-last-modified-field-when-not-defined-in-document-in-solr )) package modifiedG4; import java.io.IOException; import org.apache.solr.common.SolrInputDocument; import org.apache.solr.request.SolrQueryRequest; import org.apache.solr.response.SolrQueryResponse; import org.apache.solr.update.AddUpdateCommand; import org.apache.solr.update.processor.UpdateRequestProcessor; import org.apache.solr.update.processor.UpdateRequestProcessorFactory; public class LastModifiedMergeProcessorFactory extends UpdateRequestProcessorFactory { @Override public UpdateRequestProcessor getInstance(SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor next) { return new LastModifiedMergeProcessor(next); } } class LastModifiedMergeProcessor extends UpdateRequestProcessor { public LastModifiedMergeProcessor(UpdateRequestProcessor next) { super(next); } @Override public void processAdd(AddUpdateCommand cmd) throws IOException { SolrInputDocument doc = cmd.getSolrInputDocument(); Object metaDate = doc.getFieldValue( "last_modified" ); Object fileDate = doc.getFieldValue( "file_date" ); if( metaDate == null && fileDate != null) { doc.addField( "last_modified", fileDate ); } // pass it up the chain super.processAdd(cmd); } } On Sun, Feb 12, 2017 at 8:45 PM, Alexandre Rafalovitch wrote: > It would have to be a custom one. One you write. But I believe Tika > would pass a file name as one of the parameters, so you just need to > use standard Java API to look up the system date. That - of course - > assumes that the files you index are on the same filesystem as Solr > itself, so it could look it up. > > You can find more about the UPRs at: > https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors > You can find the full list of the URPs at: > http://www.solr-start.com/info/update-request-processors/ > If you are on the latest Solr 6.4, you would probably want to subclass > SimpleUpdateProcessorFactory and follow the implementation example of > TemplateUpdateProcessorFactory > https://github.com/apache/lucene-solr/blob/releases/ > lucene-solr/6.4.0/solr/core/src/java/org/apache/solr/update/processor/ > TemplateUpdateProcessorFactory.java > > Alternatively, you could implement your URP in Javascript, but I am > not sure that has an API to check file dates. > > Regards, >Alex. > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 12 February 2017 at 13:28, Gytis Mikuciunas wrote: > > Alexandre, could you provide some link or give more info about this > > processor? > > I'm novice in the solr world;) > > > > > > Regards, > > Gytis > > > > On Feb 10, 2017 14:59, "Alexandre Rafalovitch" > wrote: > > > > Custom update request processor that looks up a file from the name and > gets > > the date should work. > > > > Regards, > > Alex > > > > On 10 Feb 2017 2:39 AM, "Gytis Mikuciunas" wrote: > > > > Hi, > > > > We have started to use solr for our documents indexing (vsd, vsdx, > > xls,xlsx, doc, docx, pdf, txt). > > > > Modified date values is needed for each file. MS Office's files, pdfs > have > > this value. > > Problem is with txt files as they don't have this value in their meta. > > > > Is there any possibility to get it somehow from os level and force adding > > it to solr when we do indexing. > > > > p.s. > > > > Windows 2012 server, single instance > > > > typical command we use: java -Dauto -Dc=index_sandbox -Dport=80 > > -Dfiletypes=vsd,vsdx,xls,xlsx,doc,docx,pdf,txt -Dbasicauth=admin: > -jar > > example/exampledocs/post.jar "M:\DNS_dump" > > > > > > Regards, > > > > Gytis >
Re: how to get modified field data if it doesn't exist in meta
It would have to be a custom one. One you write. But I believe Tika would pass a file name as one of the parameters, so you just need to use standard Java API to look up the system date. That - of course - assumes that the files you index are on the same filesystem as Solr itself, so it could look it up. You can find more about the UPRs at: https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors You can find the full list of the URPs at: http://www.solr-start.com/info/update-request-processors/ If you are on the latest Solr 6.4, you would probably want to subclass SimpleUpdateProcessorFactory and follow the implementation example of TemplateUpdateProcessorFactory https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.4.0/solr/core/src/java/org/apache/solr/update/processor/TemplateUpdateProcessorFactory.java Alternatively, you could implement your URP in Javascript, but I am not sure that has an API to check file dates. Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 12 February 2017 at 13:28, Gytis Mikuciunas wrote: > Alexandre, could you provide some link or give more info about this > processor? > I'm novice in the solr world;) > > > Regards, > Gytis > > On Feb 10, 2017 14:59, "Alexandre Rafalovitch" wrote: > > Custom update request processor that looks up a file from the name and gets > the date should work. > > Regards, > Alex > > On 10 Feb 2017 2:39 AM, "Gytis Mikuciunas" wrote: > > Hi, > > We have started to use solr for our documents indexing (vsd, vsdx, > xls,xlsx, doc, docx, pdf, txt). > > Modified date values is needed for each file. MS Office's files, pdfs have > this value. > Problem is with txt files as they don't have this value in their meta. > > Is there any possibility to get it somehow from os level and force adding > it to solr when we do indexing. > > p.s. > > Windows 2012 server, single instance > > typical command we use: java -Dauto -Dc=index_sandbox -Dport=80 > -Dfiletypes=vsd,vsdx,xls,xlsx,doc,docx,pdf,txt -Dbasicauth=admin: -jar > example/exampledocs/post.jar "M:\DNS_dump" > > > Regards, > > Gytis
Re: how to get modified field data if it doesn't exist in meta
Alexandre, could you provide some link or give more info about this processor? I'm novice in the solr world;) Regards, Gytis On Feb 10, 2017 14:59, "Alexandre Rafalovitch" wrote: Custom update request processor that looks up a file from the name and gets the date should work. Regards, Alex On 10 Feb 2017 2:39 AM, "Gytis Mikuciunas" wrote: Hi, We have started to use solr for our documents indexing (vsd, vsdx, xls,xlsx, doc, docx, pdf, txt). Modified date values is needed for each file. MS Office's files, pdfs have this value. Problem is with txt files as they don't have this value in their meta. Is there any possibility to get it somehow from os level and force adding it to solr when we do indexing. p.s. Windows 2012 server, single instance typical command we use: java -Dauto -Dc=index_sandbox -Dport=80 -Dfiletypes=vsd,vsdx,xls,xlsx,doc,docx,pdf,txt -Dbasicauth=admin: -jar example/exampledocs/post.jar "M:\DNS_dump" Regards, Gytis
Re: how to get modified field data if it doesn't exist in meta
As I understand TimestampUpdateProcessorFactory will insert current date(now). We don't want this. Regards, Gytis On Feb 10, 2017 19:18, "Erick Erickson" wrote: > Would TimestampUpdateProcessorFactory work? > > Best, > Erick > > On Fri, Feb 10, 2017 at 4:59 AM, Alexandre Rafalovitch > wrote: > > Custom update request processor that looks up a file from the name and > gets > > the date should work. > > > > Regards, > > Alex > > > > On 10 Feb 2017 2:39 AM, "Gytis Mikuciunas" wrote: > > > > Hi, > > > > We have started to use solr for our documents indexing (vsd, vsdx, > > xls,xlsx, doc, docx, pdf, txt). > > > > Modified date values is needed for each file. MS Office's files, pdfs > have > > this value. > > Problem is with txt files as they don't have this value in their meta. > > > > Is there any possibility to get it somehow from os level and force adding > > it to solr when we do indexing. > > > > p.s. > > > > Windows 2012 server, single instance > > > > typical command we use: java -Dauto -Dc=index_sandbox -Dport=80 > > -Dfiletypes=vsd,vsdx,xls,xlsx,doc,docx,pdf,txt -Dbasicauth=admin: > -jar > > example/exampledocs/post.jar "M:\DNS_dump" > > > > > > Regards, > > > > Gytis >
Re: how to get modified field data if it doesn't exist in meta
Would TimestampUpdateProcessorFactory work? Best, Erick On Fri, Feb 10, 2017 at 4:59 AM, Alexandre Rafalovitch wrote: > Custom update request processor that looks up a file from the name and gets > the date should work. > > Regards, > Alex > > On 10 Feb 2017 2:39 AM, "Gytis Mikuciunas" wrote: > > Hi, > > We have started to use solr for our documents indexing (vsd, vsdx, > xls,xlsx, doc, docx, pdf, txt). > > Modified date values is needed for each file. MS Office's files, pdfs have > this value. > Problem is with txt files as they don't have this value in their meta. > > Is there any possibility to get it somehow from os level and force adding > it to solr when we do indexing. > > p.s. > > Windows 2012 server, single instance > > typical command we use: java -Dauto -Dc=index_sandbox -Dport=80 > -Dfiletypes=vsd,vsdx,xls,xlsx,doc,docx,pdf,txt -Dbasicauth=admin: -jar > example/exampledocs/post.jar "M:\DNS_dump" > > > Regards, > > Gytis
Re: how to get modified field data if it doesn't exist in meta
Custom update request processor that looks up a file from the name and gets the date should work. Regards, Alex On 10 Feb 2017 2:39 AM, "Gytis Mikuciunas" wrote: Hi, We have started to use solr for our documents indexing (vsd, vsdx, xls,xlsx, doc, docx, pdf, txt). Modified date values is needed for each file. MS Office's files, pdfs have this value. Problem is with txt files as they don't have this value in their meta. Is there any possibility to get it somehow from os level and force adding it to solr when we do indexing. p.s. Windows 2012 server, single instance typical command we use: java -Dauto -Dc=index_sandbox -Dport=80 -Dfiletypes=vsd,vsdx,xls,xlsx,doc,docx,pdf,txt -Dbasicauth=admin: -jar example/exampledocs/post.jar "M:\DNS_dump" Regards, Gytis