Re: Search a folder with File name and retrieve all the files matched

2013-03-09 Thread Jan Høydahl
Sure Erik,

Or since we already default to full path name as id, perhaps we could change 
literal.resourcename to be the filename only. Guess that one is mostly for Tika 
to have more hints to guess the type of file, so it doesn't need to be 
absolute, especially when you have it in the ID already. See any downsides? 
Please just go ahead with whatever you think best :)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

9. mars 2013 kl. 04:35 skrev Erik Hatcher erik.hatc...@gmail.com:

 Thanks, Jan, for making the post tool do this type of thing.  Great stuff.
 
 The filename would be a good one add for out of the box goodness.  We can 
 easily add just the filename to the index with something like the patch 
 below.  And on that note, what else would folks want in an easy to use 
 document search system like this?
 
   Erik
 
 Index: core/src/java/org/apache/solr/util/SimplePostTool.java
 ===
 --- core/src/java/org/apache/solr/util/SimplePostTool.java(revision 
 1450270)
 +++ core/src/java/org/apache/solr/util/SimplePostTool.java(working copy)
 @@ -749,6 +749,7 @@
   urlStr = appendParam(urlStr, resource.name= + 
 URLEncoder.encode(file.getAbsolutePath(), UTF-8));
 if(urlStr.indexOf(literal.id)==-1)
   urlStr = appendParam(urlStr, literal.id= + 
 URLEncoder.encode(file.getAbsolutePath(), UTF-8));
 +urlStr = appendParam(urlStr, literal.filename_s= + 
 URLEncoder.encode(file.getName(), UTF-8));
 url = new URL(urlStr);
   }
 } else {
 
 
 
 On Mar 8, 2013, at 19:16 , Jan Høydahl wrote:
 
 Since this is a POC you could simply run this command with the default 
 example schema:
 
 cd solr/example/exampledocs
 java -Dauto -Drecursive=0 -jar post.jar path/to/folder
 
 You will get the full file name with path in field resourcename
 If you need to search just the filename, you can achieve that through adding 
 a new field filename with a copyField resourcename-filename and a custom 
 fieldType for filename with a PatternReplaceFilterFactory to remove the path.
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com
 
 7. mars 2013 kl. 22:11 skrev Alexandre Rafalovitch arafa...@gmail.com:
 
 You could use DataImportHandler with FileListEntityProcessor to get the
 file names in:
 http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor
 
 Then, if it is recursive enumeration and not just one level, you probably
 want a tokenizer that splits on path separator characters (e.g. /). Or
 maybe you want to index filename as a separate field from full path (can do
 it in FileListEntityProcessor itself).
 
 And if you combined the list of files with inner entity using Tika, you can
 load the file content for searching as well:
 http://wiki.apache.org/solr/DataImportHandler#Tika_Integration
 
 Regards,
 Alex.
 
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
 
 
 On Thu, Mar 7, 2013 at 3:39 PM, pavangolla pavango...@gmail.com wrote:
 
 HI,
 I am new to apache solr,
 
 I am doing a poc, where there is a folder (in sys or some repository) which
 has different files with diff extensions pdf, doc, xls..,
 
 I want to search with a file name and retrieve all the files with the name
 matching
 
 How do i proceed on this.
 
 Please help me on this.
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Search-a-folder-with-File-name-and-retrieve-all-the-files-matched-tp4045629.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 



Re: Search a folder with File name and retrieve all the files matched

2013-03-08 Thread Jan Høydahl
Since this is a POC you could simply run this command with the default example 
schema:

cd solr/example/exampledocs
java -Dauto -Drecursive=0 -jar post.jar path/to/folder

You will get the full file name with path in field resourcename
If you need to search just the filename, you can achieve that through adding a 
new field filename with a copyField resourcename-filename and a custom 
fieldType for filename with a PatternReplaceFilterFactory to remove the path.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

7. mars 2013 kl. 22:11 skrev Alexandre Rafalovitch arafa...@gmail.com:

 You could use DataImportHandler with FileListEntityProcessor to get the
 file names in:
 http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor
 
 Then, if it is recursive enumeration and not just one level, you probably
 want a tokenizer that splits on path separator characters (e.g. /). Or
 maybe you want to index filename as a separate field from full path (can do
 it in FileListEntityProcessor itself).
 
 And if you combined the list of files with inner entity using Tika, you can
 load the file content for searching as well:
 http://wiki.apache.org/solr/DataImportHandler#Tika_Integration
 
 Regards,
   Alex.
 
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
 
 
 On Thu, Mar 7, 2013 at 3:39 PM, pavangolla pavango...@gmail.com wrote:
 
 HI,
 I am new to apache solr,
 
 I am doing a poc, where there is a folder (in sys or some repository) which
 has different files with diff extensions pdf, doc, xls..,
 
 I want to search with a file name and retrieve all the files with the name
 matching
 
 How do i proceed on this.
 
 Please help me on this.
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Search-a-folder-with-File-name-and-retrieve-all-the-files-matched-tp4045629.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: Search a folder with File name and retrieve all the files matched

2013-03-08 Thread Erik Hatcher
Thanks, Jan, for making the post tool do this type of thing.  Great stuff.

The filename would be a good one add for out of the box goodness.  We can 
easily add just the filename to the index with something like the patch below.  
And on that note, what else would folks want in an easy to use document search 
system like this?

Erik

Index: core/src/java/org/apache/solr/util/SimplePostTool.java
===
--- core/src/java/org/apache/solr/util/SimplePostTool.java  (revision 
1450270)
+++ core/src/java/org/apache/solr/util/SimplePostTool.java  (working copy)
@@ -749,6 +749,7 @@
   urlStr = appendParam(urlStr, resource.name= + 
URLEncoder.encode(file.getAbsolutePath(), UTF-8));
 if(urlStr.indexOf(literal.id)==-1)
   urlStr = appendParam(urlStr, literal.id= + 
URLEncoder.encode(file.getAbsolutePath(), UTF-8));
+urlStr = appendParam(urlStr, literal.filename_s= + 
URLEncoder.encode(file.getName(), UTF-8));
 url = new URL(urlStr);
   }
 } else {



On Mar 8, 2013, at 19:16 , Jan Høydahl wrote:

 Since this is a POC you could simply run this command with the default 
 example schema:
 
 cd solr/example/exampledocs
 java -Dauto -Drecursive=0 -jar post.jar path/to/folder
 
 You will get the full file name with path in field resourcename
 If you need to search just the filename, you can achieve that through adding 
 a new field filename with a copyField resourcename-filename and a custom 
 fieldType for filename with a PatternReplaceFilterFactory to remove the path.
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com
 
 7. mars 2013 kl. 22:11 skrev Alexandre Rafalovitch arafa...@gmail.com:
 
 You could use DataImportHandler with FileListEntityProcessor to get the
 file names in:
 http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor
 
 Then, if it is recursive enumeration and not just one level, you probably
 want a tokenizer that splits on path separator characters (e.g. /). Or
 maybe you want to index filename as a separate field from full path (can do
 it in FileListEntityProcessor itself).
 
 And if you combined the list of files with inner entity using Tika, you can
 load the file content for searching as well:
 http://wiki.apache.org/solr/DataImportHandler#Tika_Integration
 
 Regards,
  Alex.
 
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
 
 
 On Thu, Mar 7, 2013 at 3:39 PM, pavangolla pavango...@gmail.com wrote:
 
 HI,
 I am new to apache solr,
 
 I am doing a poc, where there is a folder (in sys or some repository) which
 has different files with diff extensions pdf, doc, xls..,
 
 I want to search with a file name and retrieve all the files with the name
 matching
 
 How do i proceed on this.
 
 Please help me on this.
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Search-a-folder-with-File-name-and-retrieve-all-the-files-matched-tp4045629.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 



Search a folder with File name and retrieve all the files matched

2013-03-07 Thread pavangolla
HI,
I am new to apache solr,

I am doing a poc, where there is a folder (in sys or some repository) which
has different files with diff extensions pdf, doc, xls..,

I want to search with a file name and retrieve all the files with the name
matching

How do i proceed on this.

Please help me on this.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-a-folder-with-File-name-and-retrieve-all-the-files-matched-tp4045629.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search a folder with File name and retrieve all the files matched

2013-03-07 Thread Alexandre Rafalovitch
You could use DataImportHandler with FileListEntityProcessor to get the
file names in:
http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor

Then, if it is recursive enumeration and not just one level, you probably
want a tokenizer that splits on path separator characters (e.g. /). Or
maybe you want to index filename as a separate field from full path (can do
it in FileListEntityProcessor itself).

And if you combined the list of files with inner entity using Tika, you can
load the file content for searching as well:
http://wiki.apache.org/solr/DataImportHandler#Tika_Integration

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Thu, Mar 7, 2013 at 3:39 PM, pavangolla pavango...@gmail.com wrote:

 HI,
 I am new to apache solr,

 I am doing a poc, where there is a folder (in sys or some repository) which
 has different files with diff extensions pdf, doc, xls..,

 I want to search with a file name and retrieve all the files with the name
 matching

 How do i proceed on this.

 Please help me on this.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Search-a-folder-with-File-name-and-retrieve-all-the-files-matched-tp4045629.html
 Sent from the Solr - User mailing list archive at Nabble.com.