AW: SolrJ : fieldcontent from (multiple) file(s)

2014-09-14 Thread Clemens Wyss DEV
Thanks for all you advices and thoughts.

The client in our case is/are the tomcats. To be more precise the webapps 
running in the tomcats. These should serve http request.

I'd also like to note that it's he batch-updates that in my opinion cause load 
(cpu and memory (dependeing on the pdf)) which I would like to take of the 
webapps.  Not the single document insertions/updates.

But if I don't get a clean/stable Solr-way-to-do-it solution to this problem 
I will do the extraction in the webapps, as is 


-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com] 
Gesendet: Samstag, 13. September 2014 23:22
An: solr-user@lucene.apache.org
Betreff: Re: SolrJ : fieldcontent from (multiple) file(s)

Alexandre:

Hmmm, if you're correct, that pretty much shoots SolrCel in the head too. You'd 
probably have to do something with a custom UpdateRequestProcessor in that 
case...

On Sat, Sep 13, 2014 at 2:06 PM, Alexandre Rafalovitch arafa...@gmail.com 
wrote:
 On 13 September 2014 17:03, Erick Erickson erickerick...@gmail.com wrote:
 Which probably just means I don't understand your problem space in 
 sufficient depth

 I suspect this means the clients do not have access to the shared 
 drive with the files, but the Solr server does. A firewall in between 
 or some such.

 If I am right, that would make invoking DataImportHandler a bit 
 complicated as well, due to change of push to pull.

 Regards,
Alex.

 Personal: http://www.outerthoughts.com/ and @arafalov Solr resources 
 and newsletter: http://www.solr-start.com/ and @solrstart Solr 
 popularizers community: https://www.linkedin.com/groups?gid=6713853


Re: Advice on highlighting

2014-09-14 Thread Ramkumar R. Aiyengar
https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-2878
provides lucene API what you are trying to do, it's not yet in though.
There's a fork which has the change in
https://github.com/flaxsearch/lucene-solr-intervals
On 12 Sep 2014 21:24, Craig Longman clong...@iconect.com wrote:

 In order to take our Solr usage to the next step, we really need to
 improve its highlighting abilities.  What I'm trying to do is to be able
 to write a new component that can return the fields that matched the
 search (including numeric fields) and the start/end positions for the
 alphanumeric matches.



 I see three different approaches take, either way will require making
 some modifications to the lucene/solr parts, as it just does not appear
 to be doable as a completely stand alone component.



 1) At initial search time.

 This seemed like a good approach.  I can follow IndexSearcher creating
 the TermContext that parses through AtomicReaderContexts to see if it
 contains a match and then adds it to the contexts available for later.
 However, at this point, inside SegmentTermsEnum.seekExact() it seems
 like Solr is not really looking for matching terms as such, it's just
 scanning what looks like the raw index.  So, I don't think I can easily
 extract term positions at this point.



 2) Write a odified HighlighterComponent.  We have managed to get phrases
 to highlight properly, but it seems like getting the full field matches
 would be more difficult in this module, however, because it does its
 highlighting oblivious to any other criteria, we can't use it as is.
 For example, this search:



   (body:large+AND+user_id:7)+OR+user_id:346



 Will highlight large in records that have user_id = 346 when
 technically (for our purposes at least) it should not be considered a
 hit because the large was accompanied by the user_id = 7 criteria.
 It's not immediately clear to me how difficult it would be to change
 this.



 3) Make a modified DebugComponent and enhance the existing explain()
 methods (in the query types we require it at least) to include more
 information such as the start/end positions of the term that was hit.
 I'm exploring this now, but I don't easily see how I can figure out what
 those positions might be from the explain() information.  Any pointers
 on how, at the point that TermQuery.explain() is being called that I can
 figure out which indexed token was the actual hit on?





 Craig Longman

 C++ Developer

 iCONECT Development, LLC
 519-645-1663





 This message and any attachments are intended only for the use of the
 addressee and may contain information that is privileged and confidential.
 If the reader of the message is not the intended recipient or an authorized
 representative of the intended recipient, you are hereby notified that any
 dissemination of this communication is strictly prohibited. If you have
 received this communication in error, notify the sender immediately by
 return email and delete the message and any attachments from your system.




Re: New tiny high-performance HTTP/Servlet server for Solr

2014-09-14 Thread Gopal Patwa
Thanks for sharing, since in future Solr may move towards standalone server
this (undertow) could be one option.

On Sat, Sep 13, 2014 at 9:36 PM, William Bell billnb...@gmail.com wrote:

 Can we get some stats? Do you have any numbers on performance?

 On Sat, Sep 13, 2014 at 3:03 PM, Jayson Minard jay...@bremeld.com.invalid
 
 wrote:

  Instead of within an Application Server such as Jetty, Tomcat or Wildly
 ...
  Solr can also now be run standalone on Undertow without the overhead or
  complexity of a full application server. Open-sourced on
  https://github.com/bremeld/solr-undertow
 
  solr-undertow
 
  Solr running in standalone server - High Performance, tiny, fast, easy,
  standalone deployment. Requires JDK 1.7 or newer. Less than 4MB download,
  faster than Jetty, Tomcat and all the others. Written in the Kotlin
  language
  http://kotlinlang.org/ for the JVM.
 
  Releases are available here
  https://github.com/bremeld/solr-undertow/releases on GitHub.
 
  This application launches a Solr WAR file as a standalone server running
 a
  high performance HTTP front-end based on undertow.io (the engine behind
  WildFly, the new JBoss). It has no features of an application server,
 does
  nothing more than load Solr servlets and also service the Admin UI. It is
  production-quality for a stand-alone Solr server.
 



 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076



Solr Dynamic Field Performance

2014-09-14 Thread Saumitra Srivastav
I have a collection with 200 fields and 300M docs running in cloud mode.
Each doc have around 20 fields. I now have a use case where I need to
replace these explicit fields with 6 dynamic fields. Each of these 200
fields will match one of the 6 dynamic field.

I am evaluating performance implications of switching to dynamicFields. I
have tested with a smaller dataset(5M docs) but didn't noticed any indexing
or query performance degradation. 

Query on dynamic fields will either be faceting, range query or full text
search.

Are there any known performance issues with using dynamicFields instead of
explicit ones?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Dynamic-Field-Performance-tp4158737.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Dynamic Field Performance

2014-09-14 Thread Erick Erickson
Dynamic fields, once they are actually _in_ a document, aren't any
different than statically defined fields. Literally, there's no place
in the search code that I know of that _ever_ has to check
whether a field was dynamically or statically defined.

AFAIK, the only additional cost would be figuring out which pattern
matched at index time, which is such a tiny portion of the cost of
indexing that I doubt you could measure it.

Best,
Erick

On Sun, Sep 14, 2014 at 7:58 AM, Saumitra Srivastav
saumitra.srivast...@gmail.com wrote:
 I have a collection with 200 fields and 300M docs running in cloud mode.
 Each doc have around 20 fields. I now have a use case where I need to
 replace these explicit fields with 6 dynamic fields. Each of these 200
 fields will match one of the 6 dynamic field.

 I am evaluating performance implications of switching to dynamicFields. I
 have tested with a smaller dataset(5M docs) but didn't noticed any indexing
 or query performance degradation.

 Query on dynamic fields will either be faceting, range query or full text
 search.

 Are there any known performance issues with using dynamicFields instead of
 explicit ones?




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Dynamic-Field-Performance-tp4158737.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr: Tricky exact match, unwanted search results

2014-09-14 Thread FiMka
*Erick*, thank you for help!
For exact match I still want:
to use stemming (e.g. for sleep I want the word forms slept, sleeping,
sleeps also to be used in searching)
to disregard case sensitivity
to disregard prepositions, conjunctions and other function words
to match only docs having all of the query words and in the given order
(except function words)
to match only docs if there are no other words in the doc field besides the
words in the query
to use synonyms (e.g. GB == gigabyte, Television == TV)

Erick Erickson wrote
 The easiest way to make your examples work wouldbe to use a copyField to
 an exact match field thatuses the KeywordTokenizer

The KeywordTokenizer treats the entire field as a single token, regardless
of its content. So this does not fit to my requirements.

Erick Erickson wrote
 You'll have to be a little careful to escape spaces for muti-term bits,
 like exact_field:pussy\ cat. 

Hmm... I don't care about quoting right now at all. But should I?
Erick Erickson wrote
 As far as your question about if and in, what you're probably getting
 here is stopword removal, but that's a guess.

I have the following document:After I disabled solr.StopFilterFactory for
analyzer type=query Solr stopped returning this document for the query:
http://localhost:8983/solr/lexikos/select?q=phraseExact%3A%22on+a+case-by-case%22.Can
I somehow implement the desired exact match behavior?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Tricky-exact-match-unwanted-search-results-tp4158652p4158745.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr: Tricky exact match, unwanted search results

2014-09-14 Thread FiMka
FiMka wrote
 After I disabled solr.StopFilterFactory for analyzer type=query Solr
 stopped returning this document for the query:
 http://localhost:8983/solr/lexikos/select?q=phraseExact%3A%22on+a+case-by-case%22.

Forgot to say, I have also disabled solr.StopFilterFactory for analyzer
type=index, removed all the documents and then re-added them.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Tricky-exact-match-unwanted-search-results-tp4158652p4158748.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tricky exact match, unwanted search results

2014-09-14 Thread Jack Krupansky
I keep asking people this eternal question: What training or doc are you 
reading that is using this term exact match? Clearly the term is being 
used by a lot of people in a lot of ambiguous ways, when exact should 
be... exact.


I think we need to start using the term exact match ONLY for string field 
queries, and that don't use wildcard, fuzzy, or range queries. And maybe 
also keyword tokenizer text fields that don't have any filters, which might 
as well be string fields.


-- Jack Krupansky

-Original Message- 
From: FiMka

Sent: Sunday, September 14, 2014 9:34 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr: Tricky exact match, unwanted search results

*Erick*, thank you for help!
For exact match I still want:
to use stemming (e.g. for sleep I want the word forms slept, sleeping,
sleeps also to be used in searching)
to disregard case sensitivity
to disregard prepositions, conjunctions and other function words
to match only docs having all of the query words and in the given order
(except function words)
to match only docs if there are no other words in the doc field besides the
words in the query
to use synonyms (e.g. GB == gigabyte, Television == TV)

Erick Erickson wrote

The easiest way to make your examples work wouldbe to use a copyField to
an exact match field thatuses the KeywordTokenizer


The KeywordTokenizer treats the entire field as a single token, regardless
of its content. So this does not fit to my requirements.

Erick Erickson wrote

You'll have to be a little careful to escape spaces for muti-term bits,
like exact_field:pussy\ cat.


Hmm... I don't care about quoting right now at all. But should I?
Erick Erickson wrote

As far as your question about if and in, what you're probably getting
here is stopword removal, but that's a guess.


I have the following document:After I disabled solr.StopFilterFactory for
analyzer type=query Solr stopped returning this document for the query:
http://localhost:8983/solr/lexikos/select?q=phraseExact%3A%22on+a+case-by-case%22.Can
I somehow implement the desired exact match behavior?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Tricky-exact-match-unwanted-search-results-tp4158652p4158745.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Altenative preview for specific fields

2014-09-14 Thread SolrUser1543
Suppose I have the following fields : 

text,author,title

users performs a query on all those fileds : 

...?q=(text:XX OR author:XX OR title:XX)

if this query has a match in 'text' field , so highligter will generate a
hit preview based on this field , which is fine . 

But suppose a query matched an 'author' field , so the preview will not be 
much intresting . 
In this case I would like to show something else e.g. first 3 lines of
'text' filed. 

What will be the best way to achive this ? 







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Altenative-preview-for-specific-fields-tp4158771.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Altenative preview for specific fields

2014-09-14 Thread Ahmet Arslan
Hi,

hl.alternateField and hl.maxAlternateFieldLength would be useful.

http://wiki.apache.org/solr/HighlightingParameters

Ahmet


On Sunday, September 14, 2014 9:35 PM, SolrUser1543 osta...@gmail.com wrote:



Suppose I have the following fields : 

text,author,title

users performs a query on all those fileds : 

...?q=(text:XX OR author:XX OR title:XX)

if this query has a match in 'text' field , so highligter will generate a
hit preview based on this field , which is fine . 

But suppose a query matched an 'author' field , so the preview will not be 
much intresting . 
In this case I would like to show something else e.g. first 3 lines of
'text' filed. 

What will be the best way to achive this ? 







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Altenative-preview-for-specific-fields-tp4158771.html
Sent from the Solr - User mailing list archive at Nabble.com.  


Re: Solr Dynamic Field Performance

2014-09-14 Thread Bill Bell
How about perf if you dynamically create 5000 fields ?

Bill Bell
Sent from mobile


 On Sep 14, 2014, at 10:06 AM, Erick Erickson erickerick...@gmail.com wrote:
 
 Dynamic fields, once they are actually _in_ a document, aren't any
 different than statically defined fields. Literally, there's no place
 in the search code that I know of that _ever_ has to check
 whether a field was dynamically or statically defined.
 
 AFAIK, the only additional cost would be figuring out which pattern
 matched at index time, which is such a tiny portion of the cost of
 indexing that I doubt you could measure it.
 
 Best,
 Erick
 
 On Sun, Sep 14, 2014 at 7:58 AM, Saumitra Srivastav
 saumitra.srivast...@gmail.com wrote:
 I have a collection with 200 fields and 300M docs running in cloud mode.
 Each doc have around 20 fields. I now have a use case where I need to
 replace these explicit fields with 6 dynamic fields. Each of these 200
 fields will match one of the 6 dynamic field.
 
 I am evaluating performance implications of switching to dynamicFields. I
 have tested with a smaller dataset(5M docs) but didn't noticed any indexing
 or query performance degradation.
 
 Query on dynamic fields will either be faceting, range query or full text
 search.
 
 Are there any known performance issues with using dynamicFields instead of
 explicit ones?
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Dynamic-Field-Performance-tp4158737.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Altenative preview for specific fields

2014-09-14 Thread SolrUser1543
Hi , thanks for the answer. 

I tried to use this technique , but the desired result was not achieved. 

Can you please provide an example of document to index and some sample query
?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Altenative-preview-for-specific-fields-tp4158771p4158807.html
Sent from the Solr - User mailing list archive at Nabble.com.