Re: Search a folder with File name and retrieve all the files matched
Sure Erik, Or since we already default to full path name as id, perhaps we could change literal.resourcename to be the filename only. Guess that one is mostly for Tika to have more hints to guess the type of file, so it doesn't need to be absolute, especially when you have it in the ID already. See any downsides? Please just go ahead with whatever you think best :) -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 9. mars 2013 kl. 04:35 skrev Erik Hatcher erik.hatc...@gmail.com: Thanks, Jan, for making the post tool do this type of thing. Great stuff. The filename would be a good one add for out of the box goodness. We can easily add just the filename to the index with something like the patch below. And on that note, what else would folks want in an easy to use document search system like this? Erik Index: core/src/java/org/apache/solr/util/SimplePostTool.java === --- core/src/java/org/apache/solr/util/SimplePostTool.java(revision 1450270) +++ core/src/java/org/apache/solr/util/SimplePostTool.java(working copy) @@ -749,6 +749,7 @@ urlStr = appendParam(urlStr, resource.name= + URLEncoder.encode(file.getAbsolutePath(), UTF-8)); if(urlStr.indexOf(literal.id)==-1) urlStr = appendParam(urlStr, literal.id= + URLEncoder.encode(file.getAbsolutePath(), UTF-8)); +urlStr = appendParam(urlStr, literal.filename_s= + URLEncoder.encode(file.getName(), UTF-8)); url = new URL(urlStr); } } else { On Mar 8, 2013, at 19:16 , Jan Høydahl wrote: Since this is a POC you could simply run this command with the default example schema: cd solr/example/exampledocs java -Dauto -Drecursive=0 -jar post.jar path/to/folder You will get the full file name with path in field resourcename If you need to search just the filename, you can achieve that through adding a new field filename with a copyField resourcename-filename and a custom fieldType for filename with a PatternReplaceFilterFactory to remove the path. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 7. mars 2013 kl. 22:11 skrev Alexandre Rafalovitch arafa...@gmail.com: You could use DataImportHandler with FileListEntityProcessor to get the file names in: http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor Then, if it is recursive enumeration and not just one level, you probably want a tokenizer that splits on path separator characters (e.g. /). Or maybe you want to index filename as a separate field from full path (can do it in FileListEntityProcessor itself). And if you combined the list of files with inner entity using Tika, you can load the file content for searching as well: http://wiki.apache.org/solr/DataImportHandler#Tika_Integration Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Mar 7, 2013 at 3:39 PM, pavangolla pavango...@gmail.com wrote: HI, I am new to apache solr, I am doing a poc, where there is a folder (in sys or some repository) which has different files with diff extensions pdf, doc, xls.., I want to search with a file name and retrieve all the files with the name matching How do i proceed on this. Please help me on this. -- View this message in context: http://lucene.472066.n3.nabble.com/Search-a-folder-with-File-name-and-retrieve-all-the-files-matched-tp4045629.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search a folder with File name and retrieve all the files matched
Since this is a POC you could simply run this command with the default example schema: cd solr/example/exampledocs java -Dauto -Drecursive=0 -jar post.jar path/to/folder You will get the full file name with path in field resourcename If you need to search just the filename, you can achieve that through adding a new field filename with a copyField resourcename-filename and a custom fieldType for filename with a PatternReplaceFilterFactory to remove the path. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 7. mars 2013 kl. 22:11 skrev Alexandre Rafalovitch arafa...@gmail.com: You could use DataImportHandler with FileListEntityProcessor to get the file names in: http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor Then, if it is recursive enumeration and not just one level, you probably want a tokenizer that splits on path separator characters (e.g. /). Or maybe you want to index filename as a separate field from full path (can do it in FileListEntityProcessor itself). And if you combined the list of files with inner entity using Tika, you can load the file content for searching as well: http://wiki.apache.org/solr/DataImportHandler#Tika_Integration Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Mar 7, 2013 at 3:39 PM, pavangolla pavango...@gmail.com wrote: HI, I am new to apache solr, I am doing a poc, where there is a folder (in sys or some repository) which has different files with diff extensions pdf, doc, xls.., I want to search with a file name and retrieve all the files with the name matching How do i proceed on this. Please help me on this. -- View this message in context: http://lucene.472066.n3.nabble.com/Search-a-folder-with-File-name-and-retrieve-all-the-files-matched-tp4045629.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search a folder with File name and retrieve all the files matched
Thanks, Jan, for making the post tool do this type of thing. Great stuff. The filename would be a good one add for out of the box goodness. We can easily add just the filename to the index with something like the patch below. And on that note, what else would folks want in an easy to use document search system like this? Erik Index: core/src/java/org/apache/solr/util/SimplePostTool.java === --- core/src/java/org/apache/solr/util/SimplePostTool.java (revision 1450270) +++ core/src/java/org/apache/solr/util/SimplePostTool.java (working copy) @@ -749,6 +749,7 @@ urlStr = appendParam(urlStr, resource.name= + URLEncoder.encode(file.getAbsolutePath(), UTF-8)); if(urlStr.indexOf(literal.id)==-1) urlStr = appendParam(urlStr, literal.id= + URLEncoder.encode(file.getAbsolutePath(), UTF-8)); +urlStr = appendParam(urlStr, literal.filename_s= + URLEncoder.encode(file.getName(), UTF-8)); url = new URL(urlStr); } } else { On Mar 8, 2013, at 19:16 , Jan Høydahl wrote: Since this is a POC you could simply run this command with the default example schema: cd solr/example/exampledocs java -Dauto -Drecursive=0 -jar post.jar path/to/folder You will get the full file name with path in field resourcename If you need to search just the filename, you can achieve that through adding a new field filename with a copyField resourcename-filename and a custom fieldType for filename with a PatternReplaceFilterFactory to remove the path. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 7. mars 2013 kl. 22:11 skrev Alexandre Rafalovitch arafa...@gmail.com: You could use DataImportHandler with FileListEntityProcessor to get the file names in: http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor Then, if it is recursive enumeration and not just one level, you probably want a tokenizer that splits on path separator characters (e.g. /). Or maybe you want to index filename as a separate field from full path (can do it in FileListEntityProcessor itself). And if you combined the list of files with inner entity using Tika, you can load the file content for searching as well: http://wiki.apache.org/solr/DataImportHandler#Tika_Integration Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Mar 7, 2013 at 3:39 PM, pavangolla pavango...@gmail.com wrote: HI, I am new to apache solr, I am doing a poc, where there is a folder (in sys or some repository) which has different files with diff extensions pdf, doc, xls.., I want to search with a file name and retrieve all the files with the name matching How do i proceed on this. Please help me on this. -- View this message in context: http://lucene.472066.n3.nabble.com/Search-a-folder-with-File-name-and-retrieve-all-the-files-matched-tp4045629.html Sent from the Solr - User mailing list archive at Nabble.com.
Search a folder with File name and retrieve all the files matched
HI, I am new to apache solr, I am doing a poc, where there is a folder (in sys or some repository) which has different files with diff extensions pdf, doc, xls.., I want to search with a file name and retrieve all the files with the name matching How do i proceed on this. Please help me on this. -- View this message in context: http://lucene.472066.n3.nabble.com/Search-a-folder-with-File-name-and-retrieve-all-the-files-matched-tp4045629.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search a folder with File name and retrieve all the files matched
You could use DataImportHandler with FileListEntityProcessor to get the file names in: http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor Then, if it is recursive enumeration and not just one level, you probably want a tokenizer that splits on path separator characters (e.g. /). Or maybe you want to index filename as a separate field from full path (can do it in FileListEntityProcessor itself). And if you combined the list of files with inner entity using Tika, you can load the file content for searching as well: http://wiki.apache.org/solr/DataImportHandler#Tika_Integration Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Mar 7, 2013 at 3:39 PM, pavangolla pavango...@gmail.com wrote: HI, I am new to apache solr, I am doing a poc, where there is a folder (in sys or some repository) which has different files with diff extensions pdf, doc, xls.., I want to search with a file name and retrieve all the files with the name matching How do i proceed on this. Please help me on this. -- View this message in context: http://lucene.472066.n3.nabble.com/Search-a-folder-with-File-name-and-retrieve-all-the-files-matched-tp4045629.html Sent from the Solr - User mailing list archive at Nabble.com.