combine to MultiTermQuery with OR

2015-02-10 Thread Sascha Janz

Hi,
 
i want to combine two MultiTermQueries.
 
One searches over FieldA, one over FieldB.  Both queries should be combined 
with OR operator.
 
so in lucene Syntax i want  to search
 
FieldA:Term1 OR FieldB:Term1,   FieldA:Term2 OR FieldB:Term2, FieldA:Term3 OR 
FieldB:Term3...
 
how can i do this?
 
greetings
sascha

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Request to be added to the ContributorsGroup

2015-02-10 Thread Steve Rowe
Hi Charlie,

You need to create an account on the wiki and tell us your account name.

Steve

 On Feb 10, 2015, at 3:46 AM, Charlie Picorini charlie.picor...@gmail.com 
 wrote:
 
 Dear Lucene Team,
 
 Please add me to the contributorsGroup so that I can add IntraCherche which
 is actually based on Lucene.
 
 Kind regards,


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



StandardQueryParser with date/time fields stored as longs

2015-02-10 Thread Jon Stewart
Hello,

I've done a lot of googling, but haven't stumbled upon the magic
answer: how does one use StandardQueryParser with numeric fields
representing timestamps, to allow for range queries?

When indexing, my timestamp fields are ISO 8601 strings. I'm parsing
them and then storing the milliseconds epoch time as a long, i.e.:

  doc.add(new LongField(created, ts.getMillis(), Field.Store.NO));

From reading around, this seems to be the preferred method to index a
timestamp (makes sense). However, how can you get StandardQueryParser
to handle a query like created:[2010-01-01 TO 2014-12-31]?

For other numeric fields, StandardQueryParser.setNumericConfigMap() is
working just fine for me. It would seem that the created field would
have to be part of this map in order to execute the range query
properly, but that there must also be a component to parse the
date/time strings in the query and convert them to long values, right?

Thanks in advance,

Jon

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Aw: Re: combine to MultiTermQuery with OR

2015-02-10 Thread Sascha Janz
hm,  already thought this could be the solution but didn't know how to do the 
or Operation

so i tried this
 
BooleanQuery bquery = new BooleanQuery();
bquery.add(queryFieldA, BooleanClause.Occur.SHOULD);
bquery.add(queryFieldB, BooleanClause.Occur.SHOULD);

this is the correct way?


Gesendet: Dienstag, 10. Februar 2015 um 17:31 Uhr
Von: Ian Lea ian@gmail.com
An: java-user@lucene.apache.org
Betreff: Re: combine to MultiTermQuery with OR
org.apache.lucene.search.BooleanQuery.


--
Ian.


On Tue, Feb 10, 2015 at 3:28 PM, Sascha Janz sascha.j...@gmx.net wrote:

 Hi,

 i want to combine two MultiTermQueries.

 One searches over FieldA, one over FieldB. Both queries should be combined 
 with OR operator.

 so in lucene Syntax i want to search

 FieldA:Term1 OR FieldB:Term1, FieldA:Term2 OR FieldB:Term2, FieldA:Term3 OR 
 FieldB:Term3...

 how can i do this?

 greetings
 sascha

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
 

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: combine to MultiTermQuery with OR

2015-02-10 Thread Ian Lea
org.apache.lucene.search.BooleanQuery.


--
Ian.


On Tue, Feb 10, 2015 at 3:28 PM, Sascha Janz sascha.j...@gmx.net wrote:

 Hi,

 i want to combine two MultiTermQueries.

 One searches over FieldA, one over FieldB.  Both queries should be combined 
 with OR operator.

 so in lucene Syntax i want  to search

 FieldA:Term1 OR FieldB:Term1,   FieldA:Term2 OR FieldB:Term2, FieldA:Term3 OR 
 FieldB:Term3...

 how can i do this?

 greetings
 sascha

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Lucene search in attachments

2015-02-10 Thread David Pilato
If you don’t index content, you won’t be able to search for it I guess.
That said, Tika can have this extracted characters limit. See indexedChars 
below:

tika().parseToString(new BytesStreamInput(content, false), metadata, 
indexedChars);

[1] 
https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L456
 
https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L456

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
https://twitter.com/elasticsearchfr | @scrutmydocs 
https://twitter.com/scrutmydocs



 Le 10 févr. 2015 à 09:24, sreedevi s sreedevi.payik...@gmail.com a écrit :
 
 Hi,
Which is the best method to search in attachments in lucene? I am new
 to lucene and I am using version 4.10.2. By making use of Tika, I know I
 can convert files to text and then index it as another field. But for large
 files that will not be the ideal solution. I believe the maximum characters
 per field is 10,000. So, what can be ideal method to search attachments then
 
 
 Best Regards,
 Sreedevi S



Re: Lucene search in attachments

2015-02-10 Thread sreedevi s
Thank you David. Yes, it has a restriction of characters to 1.
But for large files, what could be done in that case?

Best Regards,
Sreedevi S

On Tue, Feb 10, 2015 at 2:04 PM, David Pilato da...@pilato.fr wrote:

 If you don’t index content, you won’t be able to search for it I guess.
 That said, Tika can have this extracted characters limit. See indexedChars
 below:

 tika().parseToString(new BytesStreamInput(content, false), metadata,
 indexedChars);

 [1]
 https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L456
 
 https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L456
 

 --
 David Pilato | Technical Advocate | Elasticsearch.com
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr | @scrutmydocs 
 https://twitter.com/scrutmydocs



  Le 10 févr. 2015 à 09:24, sreedevi s sreedevi.payik...@gmail.com a
 écrit :
 
  Hi,
 Which is the best method to search in attachments in lucene? I am new
  to lucene and I am using version 4.10.2. By making use of Tika, I know I
  can convert files to text and then index it as another field. But for
 large
  files that will not be the ideal solution. I believe the maximum
 characters
  per field is 10,000. So, what can be ideal method to search attachments
 then
 
 
  Best Regards,
  Sreedevi S




Lucene search in attachments

2015-02-10 Thread sreedevi s
Hi,
Which is the best method to search in attachments in lucene? I am new
to lucene and I am using version 4.10.2. By making use of Tika, I know I
can convert files to text and then index it as another field. But for large
files that will not be the ideal solution. I believe the maximum characters
per field is 10,000. So, what can be ideal method to search attachments then


Best Regards,
Sreedevi S


Re: Lucene search in attachments

2015-02-10 Thread David Pilato
I don’t understand.
If you don’t raise this restriction to a higher value (or to -1), all the text 
won’t be extracted so only a subset of the text will be indexed.
Non indexed parts of the text won’t be searchable.

Did I misunderstand your question?

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
https://twitter.com/elasticsearchfr | @scrutmydocs 
https://twitter.com/scrutmydocs



 Le 10 févr. 2015 à 09:52, sreedevi s sreedevi.payik...@gmail.com a écrit :
 
 Thank you David. Yes, it has a restriction of characters to 1.
 But for large files, what could be done in that case?
 
 Best Regards,
 Sreedevi S
 
 On Tue, Feb 10, 2015 at 2:04 PM, David Pilato da...@pilato.fr wrote:
 
 If you don’t index content, you won’t be able to search for it I guess.
 That said, Tika can have this extracted characters limit. See indexedChars
 below:
 
 tika().parseToString(new BytesStreamInput(content, false), metadata,
 indexedChars);
 
 [1]
 https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L456
 
 https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L456
 
 
 --
 David Pilato | Technical Advocate | Elasticsearch.com
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr | @scrutmydocs 
 https://twitter.com/scrutmydocs
 
 
 
 Le 10 févr. 2015 à 09:24, sreedevi s sreedevi.payik...@gmail.com a
 écrit :
 
 Hi,
   Which is the best method to search in attachments in lucene? I am new
 to lucene and I am using version 4.10.2. By making use of Tika, I know I
 can convert files to text and then index it as another field. But for
 large
 files that will not be the ideal solution. I believe the maximum
 characters
 per field is 10,000. So, what can be ideal method to search attachments
 then
 
 
 Best Regards,
 Sreedevi S
 
 



Re: Lucene search in attachments

2015-02-10 Thread sreedevi s
No David. By increasing the value or I can set to -1 to make it unlimited
but still I cannot assure that my whole text can be searchable, which is
still a problem with large files because only the part which is indexed
will be searchable.
Was looking for some alternatives.

Best Regards,
Sreedevi S

On Tue, Feb 10, 2015 at 2:26 PM, David Pilato da...@pilato.fr wrote:

 I don’t understand.
 If you don’t raise this restriction to a higher value (or to -1), all the
 text won’t be extracted so only a subset of the text will be indexed.
 Non indexed parts of the text won’t be searchable.

 Did I misunderstand your question?

 --
 David Pilato | Technical Advocate | Elasticsearch.com
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr | @scrutmydocs 
 https://twitter.com/scrutmydocs



  Le 10 févr. 2015 à 09:52, sreedevi s sreedevi.payik...@gmail.com a
 écrit :
 
  Thank you David. Yes, it has a restriction of characters to 1.
  But for large files, what could be done in that case?
 
  Best Regards,
  Sreedevi S
 
  On Tue, Feb 10, 2015 at 2:04 PM, David Pilato da...@pilato.fr wrote:
 
  If you don’t index content, you won’t be able to search for it I guess.
  That said, Tika can have this extracted characters limit. See
 indexedChars
  below:
 
  tika().parseToString(new BytesStreamInput(content, false), metadata,
  indexedChars);
 
  [1]
 
 https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L456
  
 
 https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L456
 
 
  --
  David Pilato | Technical Advocate | Elasticsearch.com
  @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
  https://twitter.com/elasticsearchfr | @scrutmydocs 
  https://twitter.com/scrutmydocs
 
 
 
  Le 10 févr. 2015 à 09:24, sreedevi s sreedevi.payik...@gmail.com a
  écrit :
 
  Hi,
Which is the best method to search in attachments in lucene? I am new
  to lucene and I am using version 4.10.2. By making use of Tika, I know
 I
  can convert files to text and then index it as another field. But for
  large
  files that will not be the ideal solution. I believe the maximum
  characters
  per field is 10,000. So, what can be ideal method to search attachments
  then
 
 
  Best Regards,
  Sreedevi S
 
 




RE: Lucene search in attachments

2015-02-10 Thread Uwe Schindler
Hi,

There is no restriction to 1 characters inside Lucene and there never was 
one. In earlier Lucene versions (long time ago) there was an implicit 
restriction to 10,000 TERMS (not characters). This is no longer the case. If 
you still want this, you have to wrap your Analyzer: http://goo.gl/SRf45A

If you have a limitation to 10,000 characters somewhere, it might be your TIKA 
text extraction.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: sreedevi s [mailto:sreedevi.payik...@gmail.com]
 Sent: Tuesday, February 10, 2015 9:53 AM
 To: java-user@lucene.apache.org
 Subject: Re: Lucene search in attachments
 
 Thank you David. Yes, it has a restriction of characters to 1.
 But for large files, what could be done in that case?
 
 Best Regards,
 Sreedevi S
 
 On Tue, Feb 10, 2015 at 2:04 PM, David Pilato da...@pilato.fr wrote:
 
  If you don’t index content, you won’t be able to search for it I guess.
  That said, Tika can have this extracted characters limit. See
  indexedChars
  below:
 
  tika().parseToString(new BytesStreamInput(content, false), metadata,
  indexedChars);
 
  [1]
  https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob
 
 /master/src/main/java/org/elasticsearch/index/mapper/attachment/Attach
  mentMapper.java#L456
  
  https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob
 
 /master/src/main/java/org/elasticsearch/index/mapper/attachment/Attach
  mentMapper.java#L456
  
 
  --
  David Pilato | Technical Advocate | Elasticsearch.com @dadoonet
  https://twitter.com/dadoonet | @elasticsearchfr 
  https://twitter.com/elasticsearchfr | @scrutmydocs 
  https://twitter.com/scrutmydocs
 
 
 
   Le 10 févr. 2015 à 09:24, sreedevi s sreedevi.payik...@gmail.com a
  écrit :
  
   Hi,
  Which is the best method to search in attachments in lucene? I am
   new to lucene and I am using version 4.10.2. By making use of Tika,
   I know I can convert files to text and then index it as another
   field. But for
  large
   files that will not be the ideal solution. I believe the maximum
  characters
   per field is 10,000. So, what can be ideal method to search
   attachments
  then
  
  
   Best Regards,
   Sreedevi S
 
 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Lucene search in attachments

2015-02-10 Thread Uwe Schindler
Hi,

 -Original Message-
 From: sreedevi s [mailto:sreedevi.payik...@gmail.com]
 Sent: Tuesday, February 10, 2015 10:46 AM
 To: java-user@lucene.apache.org
 Subject: Re: Lucene search in attachments
 
 Hi Uwe,
 Thank you for the info update.I will remove the limit in tika and check.
 So, my understanding is,currently lucene doesnt have any restriction on
 number of terms per field but  when a term is greater then 2^15 bytes it is
 silently ignored at indexing time – a message is logged in to infoStream if
 enabled, but no error is thrown .

Yes. There is only a limit on a single term *after* text analysis. But keep in 
mind that some Analyzers like StandardAnalyzer have other limits way below that 
one. On the other hand, if you index your documents as StingField or with 
KeywordAnalyzer, there is no tokenization done at all, in that case the whole 
field is indexed as a single term - but that’s not useful for searching in full 
text anyways. So use a suitable analyzer!

 Is that right?

Yes!

Uwe

 Best Regards,
 Sreedevi S
 
 On Tue, Feb 10, 2015 at 2:45 PM, Uwe Schindler u...@thetaphi.de wrote:
 
  Hi,
 
  There is no restriction to 1 characters inside Lucene and there
  never was one. In earlier Lucene versions (long time ago) there was an
  implicit restriction to 10,000 TERMS (not characters). This is no longer the
 case.
  If you still want this, you have to wrap your Analyzer:
  http://goo.gl/SRf45A
 
  If you have a limitation to 10,000 characters somewhere, it might be
  your TIKA text extraction.
 
  Uwe
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
   -Original Message-
   From: sreedevi s [mailto:sreedevi.payik...@gmail.com]
   Sent: Tuesday, February 10, 2015 9:53 AM
   To: java-user@lucene.apache.org
   Subject: Re: Lucene search in attachments
  
   Thank you David. Yes, it has a restriction of characters to 1.
   But for large files, what could be done in that case?
  
   Best Regards,
   Sreedevi S
  
   On Tue, Feb 10, 2015 at 2:04 PM, David Pilato da...@pilato.fr wrote:
  
If you don’t index content, you won’t be able to search for it I guess.
That said, Tika can have this extracted characters limit. See
indexedChars
below:
   
tika().parseToString(new BytesStreamInput(content, false),
metadata, indexedChars);
   
[1]
https://github.com/elasticsearch/elasticsearch-mapper-attachments/
blob
   
  
 /master/src/main/java/org/elasticsearch/index/mapper/attachment/Atta
   ch
mentMapper.java#L456

https://github.com/elasticsearch/elasticsearch-mapper-attachments/
blob
   
  
 /master/src/main/java/org/elasticsearch/index/mapper/attachment/Atta
   ch
mentMapper.java#L456

   
--
David Pilato | Technical Advocate | Elasticsearch.com @dadoonet
https://twitter.com/dadoonet | @elasticsearchfr 
https://twitter.com/elasticsearchfr | @scrutmydocs 
https://twitter.com/scrutmydocs
   
   
   
 Le 10 févr. 2015 à 09:24, sreedevi s
 sreedevi.payik...@gmail.com a
écrit :

 Hi,
Which is the best method to search in attachments in lucene?
 I am new to lucene and I am using version 4.10.2. By making use
 of Tika, I know I can convert files to text and then index it as
 another field. But for
large
 files that will not be the ideal solution. I believe the maximum
characters
 per field is 10,000. So, what can be ideal method to search
 attachments
then


 Best Regards,
 Sreedevi S
   
   
 
 
  -
  To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-user-h...@lucene.apache.org
 
 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Indexing and searching a DateTime range

2015-02-10 Thread Uwe Schindler
Hi,

 OK. I found the Alfresco code on GitHub. So it's open source it seems.
 
 And I found the DateTimeAnalyser, so I will just take that code as a starting
 point:
 https://github.com/lsbueno/alfresco/tree/master/root/projects/repository/
 source/java/org/alfresco/repo/search/impl/lucene/analysis

This won't help you:
a) its outdated code from very early Lucene versions
b) it would be slow, because it does not use the numeric features of Lucene, so 
your code would be very slow if you search for date ranges

Basically, I don't really understand your problem:
If you use Lucene directly you are responsible for processing the text before 
it goes into the index. If you want to create a Lucene Document per Line, it is 
your turn to do this. Lucene has no functionality to split documents. You have 
to process your input and bring it into a format that Lucene wants: Documents 
consisting of Key/Value pairs. Analyzers are only there for processing one 
specific field and tokenize the input (so the index contains words and not the 
whole field as one term). Analyzers have nothing to do with Analysis of the 
structure of Log lines (because they would only work on one field, which does 
not help for structured queries like on date).

So basically your indexing workflow is:

- Open Log file
- Read log file line by line
- Create a Lucene IndexDocument instance
- Extract interesting key/value pairs from your log file, e.g. by using 
regular expressions (like Logstash does). Basically this would for example 
detect the date, class name from Log4J files, or whatever else
- Put those key/value pairs as fields (numeric, text,...)  to the Lucene 
IndexDocument: One field for the date, one field for message content, one field 
for classname,... (those fields don't need to be stored, unless you want to 
display only them in search results, see below).
- In addition, it is wise to add an additional Lucene TextField instance (that 
is also STORED=TRUE, INDEXED=TRUE with good Analyzer) that contains the whole 
line (redundant). By STORING it, you are able to return the whole log line in 
your search results
- Index the document
- Process next line

If you don't want to write this code on your own, use Logstash and 
Elasticsearch (or write a separate plugin for Logstash that indexes to lucene). 
But your comment is strange: You say: Elasticsearch and Logstah is too slow for 
many log lines. How should then Lucene be faster? Elasticsearch also uses 
Lucene under the hood. The main problem if its slow is in most cases incorrect 
data types while indexing (like using a text field for dates and doing ranges). 
It is the same like indexing a number in a relational database as String and 
then do like queries instead of real numeric comparisons - just wrong and 
slow.

Uwe

 Thank you for everybody for the time to respond.
 
 2015-02-10 9:55 GMT+09:00 Gergely Nagy foge...@gmail.com:
 
  Thank you Barry, I really appreciate your time to respond,
 
  Let me clarify this a little bit more. I think it was not clear.
 
  I know how to parse dates, this is not the question here. (See my
  previous
  email: how can I pipe my converter logic into the indexing process?)
 
  All of your solutions guys would work fine if I wanted to index
  per-document. Which I do NOT want to do. What I would like to do to
  index per log line.
 
  I need to do a full text search, but with the additional requirement
  to filter those search hits by DateTime range.
 
  I hope this makes it clearer. So any suggestions how to do that?
 
  Sidenote: I saw that Alfresco implemented this analyzer, called
  DateTimeAnalyzer, but Alfresco is not open source. So I was wondering
  how to implement the same. Actually after wondering for 2 days, I
  became convinced that writing an Analyzer should be the way to go. I
  will post my solution later if I have a working code.
 
  2015-02-10 8:50 GMT+09:00 Barry Coughlan b.coughl...@gmail.com:
 
  Hi Gergely,
 
  Writing an analyzer would work but it is unnecessarily complicated.
  You could just parse the date from the string in your input code and
  index it in the LongField like this:
 
  SimpleDateFormat format = new SimpleDateFormat(-MM-dd
  HH:mm:ss.S'Z'); format.setTimeZone(TimeZone.getTimeZone(UTC));
  long t = format.parse(2015-02-08 00:02:06.123Z INFO...).getTime();
 
  Barry
 
  On Tue, Feb 10, 2015 at 12:21 AM, Gergely Nagy foge...@gmail.com
 wrote:
 
   Thank you for taking your time to respond Karthik,
  
   Can you show me an example how to convert DateTime to milliseconds?
   I
  mean
   how can I pipe my converter logic into the indexing process?
  
   I suspect I need to write my own Analyzer/Tokenizer to achieve
   this. Is this correct?
  
   2015-02-09 22:58 GMT+09:00 KARTHIK SHIVAKUMAR
 nskarthi...@gmail.com:
  
Hi
   
Long time ago,.. I used to store datetime in millisecond .
   
TermRangequery used to work in perfect condition
   
Convert all datetime to millisecond and index 

Re: Lucene search in attachments

2015-02-10 Thread sreedevi s
Hi Uwe,
Thank you for the info update.I will remove the limit in tika and check.
So, my understanding is,currently lucene doesnt have any restriction on
number of terms per field but  when a term is greater then 2^15 bytes it is
silently ignored at indexing time – a message is logged in to infoStream if
enabled, but no error is thrown .
Is that right?



Best Regards,
Sreedevi S

On Tue, Feb 10, 2015 at 2:45 PM, Uwe Schindler u...@thetaphi.de wrote:

 Hi,

 There is no restriction to 1 characters inside Lucene and there never
 was one. In earlier Lucene versions (long time ago) there was an implicit
 restriction to 10,000 TERMS (not characters). This is no longer the case.
 If you still want this, you have to wrap your Analyzer:
 http://goo.gl/SRf45A

 If you have a limitation to 10,000 characters somewhere, it might be your
 TIKA text extraction.

 Uwe

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


  -Original Message-
  From: sreedevi s [mailto:sreedevi.payik...@gmail.com]
  Sent: Tuesday, February 10, 2015 9:53 AM
  To: java-user@lucene.apache.org
  Subject: Re: Lucene search in attachments
 
  Thank you David. Yes, it has a restriction of characters to 1.
  But for large files, what could be done in that case?
 
  Best Regards,
  Sreedevi S
 
  On Tue, Feb 10, 2015 at 2:04 PM, David Pilato da...@pilato.fr wrote:
 
   If you don’t index content, you won’t be able to search for it I guess.
   That said, Tika can have this extracted characters limit. See
   indexedChars
   below:
  
   tika().parseToString(new BytesStreamInput(content, false), metadata,
   indexedChars);
  
   [1]
   https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob
  
  /master/src/main/java/org/elasticsearch/index/mapper/attachment/Attach
   mentMapper.java#L456
   
   https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob
  
  /master/src/main/java/org/elasticsearch/index/mapper/attachment/Attach
   mentMapper.java#L456
   
  
   --
   David Pilato | Technical Advocate | Elasticsearch.com @dadoonet
   https://twitter.com/dadoonet | @elasticsearchfr 
   https://twitter.com/elasticsearchfr | @scrutmydocs 
   https://twitter.com/scrutmydocs
  
  
  
Le 10 févr. 2015 à 09:24, sreedevi s sreedevi.payik...@gmail.com a
   écrit :
   
Hi,
   Which is the best method to search in attachments in lucene? I am
new to lucene and I am using version 4.10.2. By making use of Tika,
I know I can convert files to text and then index it as another
field. But for
   large
files that will not be the ideal solution. I believe the maximum
   characters
per field is 10,000. So, what can be ideal method to search
attachments
   then
   
   
Best Regards,
Sreedevi S
  
  


 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




re-mapping lucene index

2015-02-10 Thread Vijay B
We use MMapdirectory impl. in our search application. Occasionally we need
to do a full indexing by dropping entire directory contents. How does
re-mapping work with MMapDirectory as the directory contents are going to
replace with new ones? is this going to be seamless or an application
restart required?

Additonal Info: We use SearcherManger to acquire searchers and we do
periodically refresh serachers.


Re: re-mapping lucene index

2015-02-10 Thread Vijay B
searching and indexing apps run in diffrent jvms. we use lucene 4.7 and
using the default openmode.

For full indexing, we use java.io.File.delete() to recursively delete index
directory contents. will remapping cause any issues in this case if I dont
use options you suggested?

On Tue, Feb 10, 2015 at 1:56 PM, Michael McCandless 
luc...@mikemccandless.com wrote:

 Just open a new IndexWriter with OpenMode.CREATE.  It will replace the
 index.

 Or if you already have an IW open, use deleteAll.

 Mike McCandless

 http://blog.mikemccandless.com


 On Tue, Feb 10, 2015 at 1:31 PM, Vijay B vijay.nip...@gmail.com wrote:
  We use MMapdirectory impl. in our search application. Occasionally we
 need
  to do a full indexing by dropping entire directory contents. How does
  re-mapping work with MMapDirectory as the directory contents are going to
  replace with new ones? is this going to be seamless or an application
  restart required?
 
  Additonal Info: We use SearcherManger to acquire searchers and we do
  periodically refresh serachers.

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




Re: re-mapping lucene index

2015-02-10 Thread Michael McCandless
Just open a new IndexWriter with OpenMode.CREATE.  It will replace the index.

Or if you already have an IW open, use deleteAll.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Feb 10, 2015 at 1:31 PM, Vijay B vijay.nip...@gmail.com wrote:
 We use MMapdirectory impl. in our search application. Occasionally we need
 to do a full indexing by dropping entire directory contents. How does
 re-mapping work with MMapDirectory as the directory contents are going to
 replace with new ones? is this going to be seamless or an application
 restart required?

 Additonal Info: We use SearcherManger to acquire searchers and we do
 periodically refresh serachers.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: re-mapping lucene index

2015-02-10 Thread Michael McCandless
It's fine if writer and reader are in separate JVMs.

You really should not rm -rf yourself.

It's better to let Lucene's do it, e.g. it's transactional at that
point so that if your new IndexWriter (that deleted all docs) crashes
before it could commit, the old index is still intact.  It also
ensures file names won't be reused, which is important on windows if
you still have readers open on the index.

Regardless of which approach you use, the old mappings will remain
alive until you've closed all open readers agains the old index.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Feb 10, 2015 at 2:09 PM, Vijay B vijay.nip...@gmail.com wrote:
 searching and indexing apps run in diffrent jvms. we use lucene 4.7 and
 using the default openmode.

 For full indexing, we use java.io.File.delete() to recursively delete index
 directory contents. will remapping cause any issues in this case if I dont
 use options you suggested?

 On Tue, Feb 10, 2015 at 1:56 PM, Michael McCandless 
 luc...@mikemccandless.com wrote:

 Just open a new IndexWriter with OpenMode.CREATE.  It will replace the
 index.

 Or if you already have an IW open, use deleteAll.

 Mike McCandless

 http://blog.mikemccandless.com


 On Tue, Feb 10, 2015 at 1:31 PM, Vijay B vijay.nip...@gmail.com wrote:
  We use MMapdirectory impl. in our search application. Occasionally we
 need
  to do a full indexing by dropping entire directory contents. How does
  re-mapping work with MMapDirectory as the directory contents are going to
  replace with new ones? is this going to be seamless or an application
  restart required?
 
  Additonal Info: We use SearcherManger to acquire searchers and we do
  periodically refresh serachers.

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: re-mapping lucene index

2015-02-10 Thread Uwe Schindler
Hi,

In Linux/Solaris/BSD/... operating systems you can delete files while they are 
open (or mmapped, does not matter). The inode/file on disk stays alive until 
everything is closed (delete on last close semantics), it just disappears 
from the directory listing, so you cannot open new handles to the file. This 
means: If there are still index readers open, deleting the underlying directory 
and/or its files has no effect on the IndexReader - you can still search it 
(until you close it).

But in any case, don't do this! Just let IndexWriter clean up by explicitely 
creating a new index.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Vijay B [mailto:vijay.nip...@gmail.com]
 Sent: Tuesday, February 10, 2015 8:38 PM
 To: java-user@lucene.apache.org
 Subject: Re: re-mapping lucene index
 
 Appreciate it Mike. That answeres it all.
 
 BTW we use solaris.
 
 On Tue, Feb 10, 2015 at 2:29 PM, Michael McCandless 
 luc...@mikemccandless.com wrote:
 
  It's fine if writer and reader are in separate JVMs.
 
  You really should not rm -rf yourself.
 
  It's better to let Lucene's do it, e.g. it's transactional at that
  point so that if your new IndexWriter (that deleted all docs) crashes
  before it could commit, the old index is still intact.  It also
  ensures file names won't be reused, which is important on windows if
  you still have readers open on the index.
 
  Regardless of which approach you use, the old mappings will remain
  alive until you've closed all open readers agains the old index.
 
  Mike McCandless
 
  http://blog.mikemccandless.com
 
 
  On Tue, Feb 10, 2015 at 2:09 PM, Vijay B vijay.nip...@gmail.com wrote:
   searching and indexing apps run in diffrent jvms. we use lucene 4.7
   and using the default openmode.
  
   For full indexing, we use java.io.File.delete() to recursively
   delete
  index
   directory contents. will remapping cause any issues in this case if
   I
  dont
   use options you suggested?
  
   On Tue, Feb 10, 2015 at 1:56 PM, Michael McCandless 
   luc...@mikemccandless.com wrote:
  
   Just open a new IndexWriter with OpenMode.CREATE.  It will replace
   the index.
  
   Or if you already have an IW open, use deleteAll.
  
   Mike McCandless
  
   http://blog.mikemccandless.com
  
  
   On Tue, Feb 10, 2015 at 1:31 PM, Vijay B vijay.nip...@gmail.com
  wrote:
We use MMapdirectory impl. in our search application.
Occasionally we
   need
to do a full indexing by dropping entire directory contents. How
does re-mapping work with MMapDirectory as the directory contents
are
  going to
replace with new ones? is this going to be seamless or an
application restart required?
   
Additonal Info: We use SearcherManger to acquire searchers and we
do periodically refresh serachers.
  
   ---
   -- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
   For additional commands, e-mail: java-user-h...@lucene.apache.org
  
  
 
  -
  To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-user-h...@lucene.apache.org
 
 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: re-mapping lucene index

2015-02-10 Thread Vijay B
Appreciate it Mike. That answeres it all.

BTW we use solaris.

On Tue, Feb 10, 2015 at 2:29 PM, Michael McCandless 
luc...@mikemccandless.com wrote:

 It's fine if writer and reader are in separate JVMs.

 You really should not rm -rf yourself.

 It's better to let Lucene's do it, e.g. it's transactional at that
 point so that if your new IndexWriter (that deleted all docs) crashes
 before it could commit, the old index is still intact.  It also
 ensures file names won't be reused, which is important on windows if
 you still have readers open on the index.

 Regardless of which approach you use, the old mappings will remain
 alive until you've closed all open readers agains the old index.

 Mike McCandless

 http://blog.mikemccandless.com


 On Tue, Feb 10, 2015 at 2:09 PM, Vijay B vijay.nip...@gmail.com wrote:
  searching and indexing apps run in diffrent jvms. we use lucene 4.7 and
  using the default openmode.
 
  For full indexing, we use java.io.File.delete() to recursively delete
 index
  directory contents. will remapping cause any issues in this case if I
 dont
  use options you suggested?
 
  On Tue, Feb 10, 2015 at 1:56 PM, Michael McCandless 
  luc...@mikemccandless.com wrote:
 
  Just open a new IndexWriter with OpenMode.CREATE.  It will replace the
  index.
 
  Or if you already have an IW open, use deleteAll.
 
  Mike McCandless
 
  http://blog.mikemccandless.com
 
 
  On Tue, Feb 10, 2015 at 1:31 PM, Vijay B vijay.nip...@gmail.com
 wrote:
   We use MMapdirectory impl. in our search application. Occasionally we
  need
   to do a full indexing by dropping entire directory contents. How does
   re-mapping work with MMapDirectory as the directory contents are
 going to
   replace with new ones? is this going to be seamless or an application
   restart required?
  
   Additonal Info: We use SearcherManger to acquire searchers and we do
   periodically refresh serachers.
 
  -
  To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-user-h...@lucene.apache.org
 
 

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




Re: Re: combine to MultiTermQuery with OR

2015-02-10 Thread Ian Lea
Yep, that looks good to me.


--
Ian.


On Tue, Feb 10, 2015 at 5:01 PM, Sascha Janz sascha.j...@gmx.net wrote:
 hm,  already thought this could be the solution but didn't know how to do the 
 or Operation

 so i tried this

 BooleanQuery bquery = new BooleanQuery();
 bquery.add(queryFieldA, BooleanClause.Occur.SHOULD);
 bquery.add(queryFieldB, BooleanClause.Occur.SHOULD);

 this is the correct way?


 Gesendet: Dienstag, 10. Februar 2015 um 17:31 Uhr
 Von: Ian Lea ian@gmail.com
 An: java-user@lucene.apache.org
 Betreff: Re: combine to MultiTermQuery with OR
 org.apache.lucene.search.BooleanQuery.


 --
 Ian.


 On Tue, Feb 10, 2015 at 3:28 PM, Sascha Janz sascha.j...@gmx.net wrote:

 Hi,

 i want to combine two MultiTermQueries.

 One searches over FieldA, one over FieldB. Both queries should be combined 
 with OR operator.

 so in lucene Syntax i want to search

 FieldA:Term1 OR FieldB:Term1, FieldA:Term2 OR FieldB:Term2, FieldA:Term3 OR 
 FieldB:Term3...

 how can i do this?

 greetings
 sascha

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: combine to MultiTermQuery with OR

2015-02-10 Thread Nitin Kothwal

Hi sascha,

You can do with boolean query, Take your three queries and OR them with 
boolean clause Occur.should.


-Nitin
On Tuesday 10 February 2015 08:58 PM, Sascha Janz wrote:

Hi,
  
i want to combine two MultiTermQueries.
  
One searches over FieldA, one over FieldB.  Both queries should be combined with OR operator.
  
so in lucene Syntax i want  to search
  
FieldA:Term1 OR FieldB:Term1,   FieldA:Term2 OR FieldB:Term2, FieldA:Term3 OR FieldB:Term3...
  
how can i do this?
  
greetings

sascha

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



search on a field by a single word

2015-02-10 Thread wangdong

Hi folks

I have a question as follows:

suppose there are 3 document in field name:
1) a b c
2) a b
3) a

I just want to retrival doc 3) only. I try to use syntax like this:
name:a
but I find it is not correct.is there any way to solve my question.

please help me!
thanks ahead!



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Request to be added to the ContributorsGroup

2015-02-10 Thread Charlie Picorini
Dear Lucene Team,

Please add me to the contributorsGroup so that I can add IntraCherche which
is actually based on Lucene.

Kind regards,


BulkScorer and .explain() compute scores separately?

2015-02-10 Thread danield
I have subclassed the BooleanQuery and changed the BooleanWeight constructor
to change the way the /coord/ and /idf /components of the similiarity
formula are computed, and my changes work as expected when calling
IndexSearcher.explain().

However, I now find that when just calling IndexSearcher.search(), the
scores reported for each document and resulting ranking are quite different
from what .explain() shows me.

What is going on? Clearly scores are computed somewhere else when done by
BulkScorer and not in BooleanQuery.BooleanWeight(). 

I have been looking at the code but it's mighty confusing and I still
haven't figured out how to make the same changes on this pipeline.

Please help!!
Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/BulkScorer-and-explain-compute-scores-separately-tp4185544.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org