Query multiple collections together

2015-05-11 Thread Zheng Lin Edwin Yeo
Hi,

Would like to check, is there a way to query multiple collections together
in a single query and return the results in one result set?

For example, I have 2 collections and I want to search for records with the
word 'solr' in both of the collections. Is there a query to do that, or
must I query both collections separately, and get two different result sets?

Regards,
Edwin


Re: Query multiple collections together

2015-05-11 Thread Anshum Gupta
You can query multiple collections by specifying the list of collections
e.g.:

http://hostname:port
/solr/gettingstarted/select?q=testcollection=collection1,collection2,collection3

On Sun, May 10, 2015 at 11:49 PM, Zheng Lin Edwin Yeo edwinye...@gmail.com
wrote:

 Hi,

 Would like to check, is there a way to query multiple collections together
 in a single query and return the results in one result set?

 For example, I have 2 collections and I want to search for records with the
 word 'solr' in both of the collections. Is there a query to do that, or
 must I query both collections separately, and get two different result
 sets?

 Regards,
 Edwin




-- 
Anshum Gupta


Re: Upgraded to 4.10.3, highlighting performance unusably slow

2015-05-11 Thread William Bell
Has anyone looked at it?

On Sun, May 3, 2015 at 10:18 AM, jaime spicciati jaime.spicci...@gmail.com
wrote:

 We ran into this as well on 4.10.3 (not related to an upgrade). It was
 identified during load testing when a small percentage of queries would
 take more than 20 seconds to return. We were able to isolate it by
 rerunning the same query multiple times and regardless of cache hits the
 queries would still take a long time to return. We used this method to
 narrow down the performance problem to a small number of very large records
 (many many fields in a single record).

 We fixed it by turning on hl.requireFieldMatch on the query so that only
 fields that have an actual hit are passed through the highlighter.

 Hopefully this helps,
 Jaime Spicciati

 On Sat, May 2, 2015 at 8:20 PM, Joel Bernstein joels...@gmail.com wrote:

  Hi,
 
  Can you also include the details of your research that narrowed the issue
  to the highlighter?
 
  Joel Bernstein
  http://joelsolr.blogspot.com/
 
  On Sat, May 2, 2015 at 5:27 PM, Ryan, Michael F. (LNG-DAY) 
  michael.r...@lexisnexis.com wrote:
 
   Are you able to identify if there is a particular part of the code that
  is
   slow?
  
   A simple way to do this is to use the jstack command (assuming your
  server
   has the full JDK installed). You can run it like this:
   /path/to/java/bin/jstack PID
  
   If you run that a bunch of times while your highlight query is running,
   you might be able to spot the hotspot. Usually I'll do something like
  this
   to see the stacktrace for the thread running the query:
   /path/to/java/bin/jstack PID | grep SearchHandler -B30
  
   A few more questions:
   - What are response times you are seeing before and after the upgrade?
 Is
   unusably slow 1 second, 10 seconds...?
   - If you run the exact same query multiple times, is it consistently
  slow?
   Or is it only slow on the first run?
   - While the query is running, do you see high user CPU on your server,
 or
   high IO wait, or both? (You can check this with the top command or
 vmstat
   command in Linux.)
  
   -Michael
  
   -Original Message-
   From: Cheng, Sophia Kuen [mailto:sophia_ch...@hms.harvard.edu]
   Sent: Saturday, May 02, 2015 4:13 PM
   To: solr-user@lucene.apache.org
   Subject: Upgraded to 4.10.3, highlighting performance unusably slow
  
   Hello,
  
   We recently upgraded solr from 3.8.0 to 4.10.3.  We saw that this
 upgrade
   caused a incredible slowdown in our searches. We were able to narrow it
   down to the highlighting. The slowdown is extreme enough that we are
   holding back our release until we can resolve this.  Our research
  indicated
   using TermVectors  FastHighlighter were the way to go, however this
  still
   does nothing for the performance. I think we may be overlooking a
 crucial
   configuration, but cannot figure it out. I was hoping for some guidance
  and
   help. Sorry for the long email, I wanted to provide enough information.
  
   Our documents are largely dynamic fields, and so we have been using ‘*’
  as
   the field for highlighting. This is the same setting as in prior
 versions
   of solr use. The dynamic fields are of type ’text’ and we added
   customizations to the schema.xml for the type ’text’:
  
   fieldType name=text class=solr.TextField
 positionIncrementGap=100
   storeOffsetsWithPositions=true termVectors=true
 termPositions=true
   termOffsets=true
 analyzer type=index
   !--  this charFilter removes all xml-tagging from the text: --
   charFilter class=solr.HTMLStripCharFilterFactory/
   tokenizer class=solr.WhitespaceTokenizerFactory/
   !-- Case insensitive stop word removal.
 add enablePositionIncrements=true in both the index and query
 analyzers to leave a 'gap' for more accurate phrase queries.
   --
   filter class=solr.StopFilterFactory ignoreCase=true
   words=stopwords.txt enablePositionIncrements=true/
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1
   generateNumberParts=1 catenateWords=1 catenateNumbers=1
   catenateAll=0 splitOnCaseChange=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.SnowballPorterFilterFactory language=English
   protected=protwords.txt/
 /analyzer
 analyzer type=query
   !--  this charFilter removes all xml-tagging from the text. Needed
   also in query due to autosuggest --
   charFilter class=solr.HTMLStripCharFilterFactory/
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
   words=stopwords.txt enablePositionIncrements=true/
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1
   generateNumberParts=1 catenateWords=0 catenateNumbers=0
   catenateAll=0 splitOnCaseChange=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.SnowballPorterFilterFactory language=English
   protected=protwords.txt/
 /analyzer
   /fieldType
  
   

Re: Unable to identify why faceting is taking so much time

2015-05-11 Thread Toke Eskildsen
On Mon, 2015-05-11 at 05:48 +, Abhishek Gupta wrote:
 According to this there are 137 records. Now I am faceting over these 137
 records with facet.method=fc. Ideally it should just iterate over these 137
 records and sub up the facets.

That is only the ideal method if you are not planning on issuing
subsequent calls: facet.method=fc does more work up front to ensure that
later calls are fast.

 http://localhost:9020/search/p1-umShard-1/select?q=*:*;
 fq=(msgType:38+AND+snCreatedTime:[2015-04-15T00:00:00Z%20TO%20*])
 facet.field=conversationIdfacet=trueindent=onwt=jsonrows=0
 facet.method=fcdebug=timing
 {
 
- responseHeader:
{
   - status: 0,
   - QTime: 395103
   },

[...]

 According to this faceting is taking 395036 time. Why its taking *395
 seconds* to just calculate facets of 137 records?

6½ minute is a long time, even for first call. Do you have tens to
hundreds of millions of documents in your index? Or do you have a
similiar amount of unique values in your facet?

Either way, subsequent faceting calls should be much faster and a switch
to DocValues should lower your first-call time significantly.

Toke Eskildsen, State and University Library, Denmark




Re: Query multiple collections together

2015-05-11 Thread Zheng Lin Edwin Yeo
Thank you for the query.

Just to confirm, for the 'gettingstarted' in the query, does it matter
which collection name I put?

Regards,
Edwin
 On 11 May 2015 15:51, Anshum Gupta ans...@anshumgupta.net wrote:

 You can query multiple collections by specifying the list of collections
 e.g.:

 http://hostname:port

 /solr/gettingstarted/select?q=testcollection=collection1,collection2,collection3

 On Sun, May 10, 2015 at 11:49 PM, Zheng Lin Edwin Yeo 
 edwinye...@gmail.com
 wrote:

  Hi,
 
  Would like to check, is there a way to query multiple collections
 together
  in a single query and return the results in one result set?
 
  For example, I have 2 collections and I want to search for records with
 the
  word 'solr' in both of the collections. Is there a query to do that, or
  must I query both collections separately, and get two different result
  sets?
 
  Regards,
  Edwin
 



 --
 Anshum Gupta



答复: 答复: How to get the docs id after commit

2015-05-11 Thread 李文
You are right. I get last commit time and current commit time in the 
newsearcher listener, then query from last commit time to current commit time 
that I can get the newest committed docs.Thanks.

Best,
WenLi

-邮件原件-
发件人: Erick Erickson [mailto:erickerick...@gmail.com] 
发送时间: 2015年5月11日 9:47
收件人: solr-user@lucene.apache.org
主题: Re: 答复: How to get the docs id after commit

Not something really built into Solr. It's easy enough, at least
conceptually, to build in a batch_id. The idea here would be that
every doc in each batch would have a unique id (really, something you
changed after each commit). That pretty much requires, though, that
you control the indexing carefully (we're probably talking SolrJ
here). There's no good way that I know to get this info after an
autocommit for instance. I suppose you could use a
TimestampUpdateProcessorFactory and keep high water marks so a query
like q=timestamp:[last_timestamp_I_checked TO most_recent_timestamp]
would do it. Even that, though, has some issues in SolrCloud because
each server's time may be slightly off. You can get around this by
placing the TimestampUpdateProcessorFactory in _front_ of the
distributed update processor in your update chain, but then you'd
really require that all updates be sent to the _same_ machine, or that
the commit intervals were guaranteed to be outside the clock skew on
your machines.

Bottom line is that you'd have to build it yourself, there's no OOB
functionality here. Even all the docs that last committed is
ambiguous. What about autocommits? Does last committed mean _just_
the ones between the last two autocommits? It seems like you really
want all the docs committed since last time I asked. And for that,
you really need to control the mechanism yourself. Not only does Solr
not provide this OOB, I'm not even sure what it could be implemented
in a general case unless Solr became transactional.

Best,
Erick

On Sun, May 10, 2015 at 5:38 PM, liwen(李文).apabi l@founder.com.cn wrote:
 Sorry. The newest means all the docs that last committed, I need to get ids 
 of these docs to trigger another server to do something.

 -邮件原件-
 发件人: Erick Erickson [mailto:erickerick...@gmail.com]
 发送时间: 2015年5月10日 23:22
 收件人: solr-user@lucene.apache.org
 主题: Re: How to get the docs id after commit

 Not really. It's an ambiguous thing though, what's a newest document
 when a whole batch is committed at once? And in distributed mode, you
 can fire docs to any node in the cloud and they'll get to the right
 shard, but order is not guaranteed so newest is a fuzzy concept.

 I'd put a counter in my docs that I guaranteed was increasing and just
 q=*:*rows=1sort=timestamp desc. That should give you the most recent
 doc. Beware using a timestamp though if you're not absolutely sure
 that the clock times you use are comparable!

 Best,
 Erick

 On Sun, May 10, 2015 at 12:57 AM, liwen(李文).apabi l@founder.com.cn 
 wrote:
 Hi, Solr Developers



   I want to get the newest commited docs in the postcommit event, then 
 nofity the other server which data can be used, but I can not find any way 
 to get the newest docs after commited, so is there any way to do this?



  Thank you.

  Wen Li







Re: Queries on SynonymFilterFactory

2015-05-11 Thread Alessandro Benedetti
2015-05-11 4:44 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

 I've managed to run the synonyms with 10 different synonyms file. Each of
 the synonym file size is 1MB, which consist of about 1000 tokens, and each
 token has about 40-50 words. These lists of files are more extreme, which I
 probably won't use for the real environment, except now for the testing
 purpose.

 The QTime is about 100-200, as compared to about 50 for collection without
 synonyms configured.

 Is this timing consider fast or slow? Although the synonyms files are big,
 there's not that many index in my collection yet. Just afraid the
 performance will be affected when more index comes in.


If it's fast or slow it depends on your requirements :)
For a human waiting for the response, I would say 100ms to be quite fast.
To understand what happens when the index scale up, you should prototype !
Anyway there are a lot of solution in Solr to scale up your system !

Cheers


 Regards,
 Edwin
  On 9 May 2015 00:14, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote:

  Thank you for your suggestions.
 
  I can't do a proper testing on that yet as I'm currently using a 4GB RAM
  normal PC machine, and all these probably requires more RAM that what I
  have.
  I've tried running the setup with 20 synonyms file, and the system went
  Out of Memory before I could test anything.
 
  For your option 2), do you mean that I'll need to download a synonym
  database (like the one with over 20MB in size which I have), and index
 them
  into an Ad Hoc Solr Core to manage them?
 
  I probably can only try them out properly when I can get the server
  machine with more RAM.
 
  Regards,
  Edwin
 
 
  On 8 May 2015 at 22:16, Alessandro Benedetti benedetti.ale...@gmail.com
 
  wrote:
 
  This is a quite big Sinonym corpus !
  If it's not feasible to have only 1 big synonym file ( I haven't
 checked,
  so I assume the 1 Mb limit is true, even if strange)
  I would do an experiment :
  1) testing query time with a Solr Classic config
  2) Use an Ad Hoc Solr Core to manage Synonyms ( in this way we can keep
 it
  updated and use it with a custom version of the Sysnonym filter that
 will
  get the Synonyms directly from another Solr instance).
  2b) develop a Solr plugin to provide this approach
 
  If the synonym thesaurus is really big, I guess managing them through
  another Solr Core ( or something similar) locally , will be better than
  managing it with an external web service.
 
  Cheers
 
  2015-05-08 12:16 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
 
   So it means like having more than 10 or 20 synonym files locally will
  still
   be faster than accessing external service?
  
   As I found out that zookeeper only allows the synonym.txt file to be a
   maximum of 1MB, and as my potential synonym file is more than 20MB,
 I'll
   need to split the file to more than 20 of them.
  
   Regards,
   Edwin
  
 
 
 
  --
  --
 
  Benedetti Alessandro
  Visiting card : http://about.me/alessandro_benedetti
 
  Tyger, tyger burning bright
  In the forests of the night,
  What immortal hand or eye
  Could frame thy fearful symmetry?
 
  William Blake - Songs of Experience -1794 England
 
 
 




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Query multiple collections together

2015-05-11 Thread Anshum Gupta
FWIR, you just need to make sure that it's a valid collection. It doesn't
have to be one from the list of collections that you want to query, but the
collection name you use in the URL should exist.
e.g, assuming you have 2 collections foo (10 docs) and bar (5 docs):

*/solr/foo/select?q=*:*collection=bar*  #results: 5

*/solr/xyz/select?q=*:*collection=bar* will lead to a HTTP 404 response

*/solr/foo/select?q=*:* *#results: 10


On Mon, May 11, 2015 at 12:59 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com
wrote:

 Thank you for the query.

 Just to confirm, for the 'gettingstarted' in the query, does it matter
 which collection name I put?

 Regards,
 Edwin
  On 11 May 2015 15:51, Anshum Gupta ans...@anshumgupta.net wrote:

  You can query multiple collections by specifying the list of collections
  e.g.:
 
  http://hostname:port
 
 
 /solr/gettingstarted/select?q=testcollection=collection1,collection2,collection3
 
  On Sun, May 10, 2015 at 11:49 PM, Zheng Lin Edwin Yeo 
  edwinye...@gmail.com
  wrote:
 
   Hi,
  
   Would like to check, is there a way to query multiple collections
  together
   in a single query and return the results in one result set?
  
   For example, I have 2 collections and I want to search for records with
  the
   word 'solr' in both of the collections. Is there a query to do that, or
   must I query both collections separately, and get two different result
   sets?
  
   Regards,
   Edwin
  
 
 
 
  --
  Anshum Gupta
 




-- 
Anshum Gupta


Re: Query multiple collections together

2015-05-11 Thread Zheng Lin Edwin Yeo
Ok, thank you so much.

Regards,
Edwin
On 11 May 2015 16:15, Anshum Gupta ans...@anshumgupta.net wrote:

 FWIR, you just need to make sure that it's a valid collection. It doesn't
 have to be one from the list of collections that you want to query, but the
 collection name you use in the URL should exist.
 e.g, assuming you have 2 collections foo (10 docs) and bar (5 docs):

 */solr/foo/select?q=*:*collection=bar*  #results: 5

 */solr/xyz/select?q=*:*collection=bar* will lead to a HTTP 404 response

 */solr/foo/select?q=*:* *#results: 10


 On Mon, May 11, 2015 at 12:59 AM, Zheng Lin Edwin Yeo 
 edwinye...@gmail.com
 wrote:

  Thank you for the query.
 
  Just to confirm, for the 'gettingstarted' in the query, does it matter
  which collection name I put?
 
  Regards,
  Edwin
   On 11 May 2015 15:51, Anshum Gupta ans...@anshumgupta.net wrote:
 
   You can query multiple collections by specifying the list of
 collections
   e.g.:
  
   http://hostname:port
  
  
 
 /solr/gettingstarted/select?q=testcollection=collection1,collection2,collection3
  
   On Sun, May 10, 2015 at 11:49 PM, Zheng Lin Edwin Yeo 
   edwinye...@gmail.com
   wrote:
  
Hi,
   
Would like to check, is there a way to query multiple collections
   together
in a single query and return the results in one result set?
   
For example, I have 2 collections and I want to search for records
 with
   the
word 'solr' in both of the collections. Is there a query to do that,
 or
must I query both collections separately, and get two different
 result
sets?
   
Regards,
Edwin
   
  
  
  
   --
   Anshum Gupta
  
 



 --
 Anshum Gupta



Re: Solr custom component issue

2015-05-11 Thread nutchsolruser
Thanks Upayavira,

I tried it by changing it to first-component in solrconfig.xml but no luck . 

Am I missing something here ? Here I want to add my own qf fields with boost
in query. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-custom-component-issue-tp4204799p4204810.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: indexing java byte code in classes / jars

2015-05-11 Thread Tomasz Borek
There's also Perl-backed ACK. http://beyondgrep.com/

Which does the job of searching code really well.

And I think at least once I came across something that stemmed from ACK and
claimed it was faster/better... googling... aah! The Silver Searcher it
was. :-)
http://betterthanack.com/

pozdrawiam,
LAFK

2015-05-09 12:40 GMT+02:00 Mark javam...@gmail.com:

 Hi  Alexandre,

 Solr  ASM is the extact poblem I'm looking to hack about with so I'm keen
 to consider any code no matter how ugly or broken

 Regards

 Mark

 On 9 May 2015 at 10:21, Alexandre Rafalovitch arafa...@gmail.com wrote:

  If you only have classes/jars, use ASM. I have done this before, have
 some
  ugly code to share if you want.
 
  If you have sources, javadoc 8 is a good way too. I am doing that now for
  solr-start.com, code on Github.
 
  Regards,
  Alex
  On 9 May 2015 7:09 am, Mark javam...@gmail.com wrote:
 
   To answer why bytecode - because mostly the use case I have is looking
 to
   index as much detail from jars/classes.
  
   extract class names,
   method names
   signatures
   packages / imports
  
   I am considering using ASM in order to generate an analysis view of the
   class
  
   The sort of usecases I have would be method / signature searches.
  
   For example;
  
   1) show any classes with a method named parse*
  
   2) show any classes with a method named parse that passes in a type
  *json*
  
   ...etc
  
   In the past I have written something to reverse out javadocs from just
  java
   bytecode, using solr would move this idea considerably much more
  powerful.
  
   Thanks for the suggestions so far
  
  
  
  
  
  
  
   On 8 May 2015 at 21:19, Erik Hatcher erik.hatc...@gmail.com wrote:
  
Oh, and sorry, I omitted a couple of details:
   
# creating the “java” core/collection
bin/solr create -c java
   
# I ran this from my Solr source code checkout, so that
SolrLogFormatter.class just happened to be handy
   
Erik
   
   
   
   
 On May 8, 2015, at 4:11 PM, Erik Hatcher erik.hatc...@gmail.com
   wrote:

 What kinds of searches do you want to run?  Are you trying to
 extract
class names, method names, and such and make those searchable?   If
   that’s
the case, you need some kind of “parser” to reverse engineer that
information from .class and .jar files before feeding it to Solr,
 which
would happen before analysis.   Java itself comes with a javap
 command
   that
can do this; whether this is the “best” way to go for your scenario I
   don’t
know, but here’s an interesting example pasted below (using Solr
 5.x).

 —
 Erik Hatcher, Senior Solutions Architect
 http://www.lucidworks.com


 javap
build/solr-core/classes/java/org/apache/solr/SolrLogFormatter.class 
test.txt
 bin/post -c java test.txt

 now search for coreInfoMap
http://localhost:8983/solr/java/browse?q=coreInfoMap

 I tried to be cleverer and use the stdin option of bin/post, like
  this:
 javap
build/solr-core/classes/java/org/apache/solr/SolrLogFormatter.class |
bin/post -c java -url http://localhost:8983/solr/java/update/extract
-type text/plain -params literal.id=SolrLogFormatter -out yes -d
 but something isn’t working right with the stdin detection like
 that
   (it
does work to `cat test.txt | bin/post…` though, hmmm)

 test.txt looks like this, `cat test.txt`:
 Compiled from SolrLogFormatter.java
 public class org.apache.solr.SolrLogFormatter extends
java.util.logging.Formatter {
  long startTime;
  long lastTime;
  java.util.Maporg.apache.solr.SolrLogFormatter$Method,
java.lang.String methodAlias;
  public boolean shorterFormat;
  java.util.Maporg.apache.solr.core.SolrCore,
org.apache.solr.SolrLogFormatter$CoreInfo coreInfoMap;
  public java.util.Mapjava.lang.String, java.lang.String
  classAliases;
  static java.lang.ThreadLocaljava.lang.String threadLocal;
  public org.apache.solr.SolrLogFormatter();
  public void setShorterFormat();
  public java.lang.String format(java.util.logging.LogRecord);
  public void appendThread(java.lang.StringBuilder,
java.util.logging.LogRecord);
  public java.lang.String _format(java.util.logging.LogRecord);
  public java.lang.String getHead(java.util.logging.Handler);
  public java.lang.String getTail(java.util.logging.Handler);
  public java.lang.String
 formatMessage(java.util.logging.LogRecord);
  public static void main(java.lang.String[]) throws
   java.lang.Exception;
  public static void go() throws java.lang.Exception;
  static {};
 }

 On May 8, 2015, at 3:31 PM, Mark javam...@gmail.com wrote:

 I looking to use Solr search over the byte code in Classes and
 Jars.

 Does anyone know or have experience of Analyzers, Tokenizers, and
   Token
 Filters for such a task?

 Regards

 Mark
   

Re: Slow highlighting on Solr 5.0.0

2015-05-11 Thread Ere Maijala
Thanks for the pointers. Using hl.usePhraseHighlighter=false does indeed 
make it a lot faster. Obviously it's not really a solution, though, 
since in 4.10 it wasn't a problem and turning it off has consequences. 
I'm looking forward for the improvements in the next releases.


--Ere

8.5.2015, 19.06, Matt Hilt kirjoitti:

I¹ve been looking into this again. The phrase highlighter is much slower
than the default highlighter, so you might be able to add
hl.usePhraseHighlighter=false to your query to make it faster. Note that
web interface will NOT help here, because that param is true by default,
and the checkbox is basically broken in that respect. Also, the default
highlighter doesn¹t seem to work in all case the phrase highlighter does
though.

Also, the current development branch of 5x is much better than 5.1, but
not as good as 4.10. This ticket seems to be hitting on some of the issues
at hand:
https://issues.apache.org/jira/browse/SOLR-5855


I think this means they are getting there, but the performance is really
still much worse than 4.10, and its not obvious why.


On 5/5/15, 2:06 AM, Ere Maijala ere.maij...@helsinki.fi wrote:


I'm seeing the same with Solr 5.1.0 after upgrading from 4.10.2. Here
are my timings:

4.10.2:
process: 1432.0
highlight: 723.0

5.1.0:
process: 9570.0
highlight: 8790.0

schema.xml and solrconfig.xml are available at
https://github.com/NatLibFi/NDL-VuFind-Solr/tree/master/vufind/biblio/conf
.

A couple of jstack outputs taken when the query was executing are
available at http://pastebin.com/eJrEy2Wb

Any suggestions would be appreciated. Or would it make sense to just
file a JIRA issue?

--Ere

3.3.2015, 0.48, Matt Hilt kirjoitti:

Short form:
While testing Solr 5.0.0 within our staging environment, I noticed that
highlight enabled queries are much slower than I saw with 4.10. Are
there any obvious reasons why this might be the case? As far as I can
tell, nothing has changed with the default highlight search component or
its parameters.


A little more detail:
The bulk of the collection config set was stolen from the basic 4.X
example config set. I changed my schema.xml and solrconfig.xml just
enough to get 5.0 to create a new collection (removed non-trie fields,
some other deprecated response handler definitions, etc). I can provide
my version of the solr.HighlightComponent config, but it is identical to
the sample_techproducts_configs example in 5.0.  Are there any other
config files I could provide that might be useful?


Number on ³much slower²:
I indexed a very small subset of my data into the new collection and
used the /select interface to do a simple debug query. Solr 4.10 gives
the following pertinent info:
response: { numFound: 72628,
...
debug: {
timing: { time: 95, process: { time: 94, query: { time: 6 },
highlight: { time: 84 }, debug: { time: 4 } }
---
Whereas solr 5.0 is:
response: { numFound: 1093,
...
debug: {
timing: { time: 6551, process: { time: 6549, query: { time:
0 }, highlight: { time: 6524 }, debug: { time: 25 }






--
Ere Maijala
Kansalliskirjasto / The National Library of Finland



--
Ere Maijala
Kansalliskirjasto / The National Library of Finland


Re: Solr custom component issue

2015-05-11 Thread Upayavira


On Mon, May 11, 2015, at 10:30 AM, nutchsolruser wrote:
 I can not set qf in solrconfig.xml file because my qf and boost values
 will
 be changing frequently . I am reading those values from external source. 
 
 Can we not set qf value from searchComponent? Or is there any other way
 to
 do this?

Changing frequently, but you can't pass them in via the search request?



Re: Solr custom component issue

2015-05-11 Thread Upayavira
You are adding a search component, and adding it as a
last-component, meaning, it will come after the Query component which
actually does the work.

Given the parameters you have set, you will be using the default Lucene
query parser which doesn't honour the qf parameter, so it isn't
surprising that the QueryComponent is ignoring qf.

What is it that you are trying to do?

Upayavira

On Mon, May 11, 2015, at 09:33 AM, nutchsolruser wrote:
 Hi ,
 
 I am trying to add my own query parameters in Solr query using solr
 component . In below example I am trying to add qf parameter in the
 query.
 Below is my prepare method of component. But Solr is not considering qf
 parameter while searching It is using df parameter that I have added in
 schema.xml file as default search field.
 
 
 
  @Override
public void prepare(ResponseBuilder rb) throws IOException {
  LOG.info(called Prepare );
  SolrQueryRequest req = rb.req;
  SolrQueryResponse rsp = rb.rsp;
  SolrParams params = req.getParams();
  ModifiableSolrParams modifiableSolrParams=new
 ModifiableSolrParams(params);
  modifiableSolrParams.set(qf, journal);
  rb.req.setParams(modifiableSolrParams);
  QParser parser;
  try {
 parser = QParser.getParser(rb.getQueryString(),
 edismax, req);
 rb.setQparser(parser) ;
  } catch (SyntaxError e) {
 e.printStackTrace();
  }
  LOG.info(Solr Request +rb.req.toString());
}
 
 relevanct request handler in solrconfig.xml file : 
   requestHandler name=/custom-api class=solr.SearchHandler 
  lst name=defaults
 str name=q.alt*:*/str
str name=dfdescription/str
 /lst
   arr name=last-components
   strcustom-component/str
 /arr
 /requestHandler
 
 
 How I can add qf param correctly in query so that solr can use this while
 searching ?
 
 
 
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-custom-component-issue-tp4204799.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr custom component issue

2015-05-11 Thread Upayavira
If all you want to do is to hardwire a qf, you can do that in your
requestHandler config in solrconfig.xml.

If you want to extend how the edismax query parser works, you may well
be better off subclassing the edismax query parser, and passing in
modified request parameters, but I'd explore getting your problem solved
without coding first.

Can you not set qf= in the request handler configuration? Make sure you
set defType=edismax if you want qf to have any effect at all.

Upayavira

On Mon, May 11, 2015, at 10:09 AM, nutchsolruser wrote:
 Thanks Upayavira,
 
 I tried it by changing it to first-component in solrconfig.xml but no
 luck . 
 
 Am I missing something here ? Here I want to add my own qf fields with
 boost
 in query. 
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-custom-component-issue-tp4204799p4204810.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Solr custom component issue

2015-05-11 Thread nutchsolruser
Hi ,

I am trying to add my own query parameters in Solr query using solr
component . In below example I am trying to add qf parameter in the query.
Below is my prepare method of component. But Solr is not considering qf
parameter while searching It is using df parameter that I have added in
schema.xml file as default search field.



 @Override
   public void prepare(ResponseBuilder rb) throws IOException {
 LOG.info(called Prepare );
 SolrQueryRequest req = rb.req;
 SolrQueryResponse rsp = rb.rsp;
 SolrParams params = req.getParams();
 ModifiableSolrParams modifiableSolrParams=new
ModifiableSolrParams(params);
 modifiableSolrParams.set(qf, journal);
 rb.req.setParams(modifiableSolrParams);
 QParser parser;
 try {
parser = QParser.getParser(rb.getQueryString(),
edismax, req);
rb.setQparser(parser) ;
 } catch (SyntaxError e) {
e.printStackTrace();
 }
 LOG.info(Solr Request +rb.req.toString());
   }

relevanct request handler in solrconfig.xml file : 
  requestHandler name=/custom-api class=solr.SearchHandler 
 lst name=defaults
str name=q.alt*:*/str
   str name=dfdescription/str
/lst
  arr name=last-components
  strcustom-component/str
/arr
/requestHandler


How I can add qf param correctly in query so that solr can use this while
searching ?






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-custom-component-issue-tp4204799.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr custom component issue

2015-05-11 Thread nutchsolruser
I can not set qf in solrconfig.xml file because my qf and boost values will
be changing frequently . I am reading those values from external source. 

Can we not set qf value from searchComponent? Or is there any other way to
do this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-custom-component-issue-tp4204799p4204815.html
Sent from the Solr - User mailing list archive at Nabble.com.


SOLR 4.10.4 - error creating document

2015-05-11 Thread Bernd Fehling
I'm getting the following error with 4.10.4

WARN  org.apache.solr.handler.dataimport.SolrWriter  – Error creating document :
SolrInputDocument(fields: [dcautoclasscode=310, dclang=unknown,

..., dcdocid=dd05ad427a58b49150a4ca36148187028562257a77643062382a1366250112ac])
org.apache.solr.common.SolrException: Exception writing document
id ftumdeepblue:oai:deepblue.lib.umich.edu:2027.42/79437 to the index; possible 
analysis error.
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
...
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
Caused by: java.lang.IllegalArgumentException: Document contains at least one 
immense term
in field=f_dcperson (whose UTF8 encoding is longer than the max length 
32766), all of which were skipped.
Please correct the analyzer to not produce such terms.  The prefix of the first 
immense
term is: '[102, 111, 114, 32, 97, 32, 114, 101, 118, 105, 101, 119, 32, 115, 
101, 101, 32, 66, 114,
111, 119, 110, 105, 110, 103, 32, 32, 32, 50, 48]...', original message:
bytes can be at most 32766 in length; got 38177
at 
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:687)
...


My huge field is dcdescription, with the following schema:

   field name=dccreator type=string indexed=true stored=true 
multiValued=true /
   field name=dcdescription type=string indexed=false stored=true /
   field name=f_dcperson type=string indexed=true stored=true 
multiValued=true /
...
  copyField source=dccreator dest=f_dcperson /
  copyField source=dccontributor dest=f_dcperson /


I guess I have to make dcdescription also multivalue=true?

But why is it complaining about f_dcperson which is already multivalue?

Second guess, dcdescription is not multivalue, but filled to max (32766).
Then it is UTF8 encoded and going beyond 32766 which is larger than a single 
subfield
of a multivaled field and therefore the error?

Any really explanation on this and how to prevent it?

Regards
Bernd


Re: SolrJ vs. plain old HTTP post

2015-05-11 Thread Emir Arnautovic

Hi Steve,
Main advantage is that it uses binary format so XML/JSON overhead is 
avoided.


You should also check out if SOLR's Data Import Handler is good fit for you.

Thanks,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


On 11.05.2015 14:21, Steven White wrote:

Hi Everyone,

If all that I need to do is send data to Solr to add / delete a Solr
document, which tool is better for the job: SolrJ or plain old HTTP post?

In other word, what are the advantages of using SolrJ when the need is to
push data to Solr for indexing?

Thanks,

Steve





SolrJ vs. plain old HTTP post

2015-05-11 Thread Steven White
Hi Everyone,

If all that I need to do is send data to Solr to add / delete a Solr
document, which tool is better for the job: SolrJ or plain old HTTP post?

In other word, what are the advantages of using SolrJ when the need is to
push data to Solr for indexing?

Thanks,

Steve


Re: SOLR 4.10.4 - error creating document

2015-05-11 Thread Emir Arnautovic

Hi Bernd,
Issue is with f_dcperson and what ends up in that field. It is 
configured to be string, which means it is not tokenized so if some huge 
value is in either dccreator or dccontributor it will end up as single 
term. Nemes suggest that it should not contain such values, but double 
check in your import code if you are reading wrong column or 
concatenating contributors or something else causing value to be to big. 
Also check if you have some copyField that should not be there.


Thanks,
Emir
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


On 11.05.2015 14:13, Bernd Fehling wrote:

I'm getting the following error with 4.10.4

WARN  org.apache.solr.handler.dataimport.SolrWriter  – Error creating document :
SolrInputDocument(fields: [dcautoclasscode=310, dclang=unknown,

..., dcdocid=dd05ad427a58b49150a4ca36148187028562257a77643062382a1366250112ac])
org.apache.solr.common.SolrException: Exception writing document
id ftumdeepblue:oai:deepblue.lib.umich.edu:2027.42/79437 to the index; possible 
analysis error.
 at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168)
 at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
 at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
...
 at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
 at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
Caused by: java.lang.IllegalArgumentException: Document contains at least one 
immense term
in field=f_dcperson (whose UTF8 encoding is longer than the max length 
32766), all of which were skipped.
Please correct the analyzer to not produce such terms.  The prefix of the first 
immense
term is: '[102, 111, 114, 32, 97, 32, 114, 101, 118, 105, 101, 119, 32, 115, 
101, 101, 32, 66, 114,
111, 119, 110, 105, 110, 103, 32, 32, 32, 50, 48]...', original message:
bytes can be at most 32766 in length; got 38177
 at 
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:687)
...


My huge field is dcdescription, with the following schema:

field name=dccreator type=string indexed=true stored=true 
multiValued=true /
field name=dcdescription type=string indexed=false stored=true /
field name=f_dcperson type=string indexed=true stored=true 
multiValued=true /
...
   copyField source=dccreator dest=f_dcperson /
   copyField source=dccontributor dest=f_dcperson /


I guess I have to make dcdescription also multivalue=true?

But why is it complaining about f_dcperson which is already multivalue?

Second guess, dcdescription is not multivalue, but filled to max (32766).
Then it is UTF8 encoded and going beyond 32766 which is larger than a single 
subfield
of a multivaled field and therefore the error?

Any really explanation on this and how to prevent it?

Regards
Bernd




Solr query which return only those docs whose all tokens are from given list

2015-05-11 Thread Naresh Yadav
Hi all,

Also asked this here : http://stackoverflow.com/questions/30166116

For example i have SOLR docs in which tags field is indexed :

Doc1 - tags:T1 T2

Doc2 - tags:T1 T3

Doc3 - tags:T1 T4

Doc4 - tags:T1 T2 T3

Query1 : get all docs with tags:T1 AND tags:T3 then it works and will
give Doc2 and Doc4

Query2 : get all docs whose tags must be one of these [T1, T2, T3] Expected
is : Doc1, Doc2, Doc4

How to model Query2 in Solr ?? Please help me on this ?


Re: Solr custom component issue

2015-05-11 Thread nutchsolruser
These boosting parameters will be configured outside Solr and there is
seperate module from which these values get populated , I am reading those
values from external datasource and I want to attach them to each request .



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-custom-component-issue-tp4204799p4204832.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ vs. plain old HTTP post

2015-05-11 Thread Erik Hatcher
Another advantage to SolrJ is with SolrCloud (ZK) awareness, and taking 
advantage of some routing optimizations client-side so the cluster has less 
hops to make.

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com http://www.lucidworks.com/




 On May 11, 2015, at 8:21 AM, Steven White swhite4...@gmail.com wrote:
 
 Hi Everyone,
 
 If all that I need to do is send data to Solr to add / delete a Solr
 document, which tool is better for the job: SolrJ or plain old HTTP post?
 
 In other word, what are the advantages of using SolrJ when the need is to
 push data to Solr for indexing?
 
 Thanks,
 
 Steve



storeOffsetsWithPositions does not reflect in the index

2015-05-11 Thread Dmitry Kan
Hi,

Using solr 4.10.2. Looks like storeOffsetsWithPositions has no effect, i.e.
it does not store offsets in addition to positions.

If we use termVectors=true termPositions=true termOffsets=true, then
offsets and positions are available fine.

Any ideas how to make storeOffsetsWithPositions work?

-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Re: SOLR 4.10.4 - error creating document

2015-05-11 Thread Bernd Fehling
After reading https://issues.apache.org/jira/browse/LUCENE-5472
one question still remains.

Why is it complaining about f_dcperson which is a copyField when the
origin problem field is dcdescription which definately is much larger
than 32766?

I would assume it complains about dcdescription field. Or not?

Bernd


Am 11.05.2015 um 14:13 schrieb Bernd Fehling:
 I'm getting the following error with 4.10.4
 
 WARN  org.apache.solr.handler.dataimport.SolrWriter  – Error creating 
 document :
 SolrInputDocument(fields: [dcautoclasscode=310, dclang=unknown,
 
 ..., 
 dcdocid=dd05ad427a58b49150a4ca36148187028562257a77643062382a1366250112ac])
 org.apache.solr.common.SolrException: Exception writing document
 id ftumdeepblue:oai:deepblue.lib.umich.edu:2027.42/79437 to the index; 
 possible analysis error.
 at 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168)
 at 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
 at 
 org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
 ...
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
 Caused by: java.lang.IllegalArgumentException: Document contains at least one 
 immense term
 in field=f_dcperson (whose UTF8 encoding is longer than the max length 
 32766), all of which were skipped.
 Please correct the analyzer to not produce such terms.  The prefix of the 
 first immense
 term is: '[102, 111, 114, 32, 97, 32, 114, 101, 118, 105, 101, 119, 32, 115, 
 101, 101, 32, 66, 114,
 111, 119, 110, 105, 110, 103, 32, 32, 32, 50, 48]...', original message:
 bytes can be at most 32766 in length; got 38177
 at 
 org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:687)
 ...
 
 
 My huge field is dcdescription, with the following schema:
 
field name=dccreator type=string indexed=true stored=true 
 multiValued=true /
field name=dcdescription type=string indexed=false stored=true /
field name=f_dcperson type=string indexed=true stored=true 
 multiValued=true /
 ...
   copyField source=dccreator dest=f_dcperson /
   copyField source=dccontributor dest=f_dcperson /
 
 
 I guess I have to make dcdescription also multivalue=true?
 
 But why is it complaining about f_dcperson which is already multivalue?
 
 Second guess, dcdescription is not multivalue, but filled to max (32766).
 Then it is UTF8 encoded and going beyond 32766 which is larger than a single 
 subfield
 of a multivaled field and therefore the error?
 
 Any really explanation on this and how to prevent it?
 
 Regards
 Bernd
 


Re: SOLR 4.10.4 - error creating document

2015-05-11 Thread Bernd Fehling
Hi Emir,

the dcdescription field is definately to big.
But why is it complaining about f_dcperson and not dcdescription?

Regards
Bernd


Am 11.05.2015 um 15:12 schrieb Emir Arnautovic:
 Hi Bernd,
 Issue is with f_dcperson and what ends up in that field. It is configured to 
 be string, which means it is not tokenized so if some huge value is
 in either dccreator or dccontributor it will end up as single term. Nemes 
 suggest that it should not contain such values, but double check in
 your import code if you are reading wrong column or concatenating 
 contributors or something else causing value to be to big. Also check if you
 have some copyField that should not be there.
 
 Thanks,
 Emir
 -- 
 Monitoring * Alerting * Anomaly Detection * Centralized Log Management
 Solr  Elasticsearch Support * http://sematext.com/
 
 
 On 11.05.2015 14:13, Bernd Fehling wrote:
 I'm getting the following error with 4.10.4

 WARN  org.apache.solr.handler.dataimport.SolrWriter  – Error creating 
 document :
 SolrInputDocument(fields: [dcautoclasscode=310, dclang=unknown,
 
 ..., 
 dcdocid=dd05ad427a58b49150a4ca36148187028562257a77643062382a1366250112ac])
 org.apache.solr.common.SolrException: Exception writing document
 id ftumdeepblue:oai:deepblue.lib.umich.edu:2027.42/79437 to the index; 
 possible analysis error.
  at 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168)
  at 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
  at 
 org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
 ...
  at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
  at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
 Caused by: java.lang.IllegalArgumentException: Document contains at least 
 one immense term
 in field=f_dcperson (whose UTF8 encoding is longer than the max length 
 32766), all of which were skipped.
 Please correct the analyzer to not produce such terms.  The prefix of the 
 first immense
 term is: '[102, 111, 114, 32, 97, 32, 114, 101, 118, 105, 101, 119, 32, 115, 
 101, 101, 32, 66, 114,
 111, 119, 110, 105, 110, 103, 32, 32, 32, 50, 48]...', original message:
 bytes can be at most 32766 in length; got 38177
  at 
 org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:687)
 ...


 My huge field is dcdescription, with the following schema:

 field name=dccreator type=string indexed=true stored=true 
 multiValued=true /
 field name=dcdescription type=string indexed=false stored=true 
 /
 field name=f_dcperson type=string indexed=true stored=true 
 multiValued=true /
 ...
copyField source=dccreator dest=f_dcperson /
copyField source=dccontributor dest=f_dcperson /


 I guess I have to make dcdescription also multivalue=true?

 But why is it complaining about f_dcperson which is already multivalue?

 Second guess, dcdescription is not multivalue, but filled to max (32766).
 Then it is UTF8 encoded and going beyond 32766 which is larger than a single 
 subfield
 of a multivaled field and therefore the error?

 Any really explanation on this and how to prevent it?

 Regards
 Bernd
 

-- 
*
Bernd FehlingBielefeld University Library
Dipl.-Inform. (FH)LibTec - Library Technology
Universitätsstr. 25  and Knowledge Management
33615 Bielefeld
Tel. +49 521 106-4060   bernd.fehling(at)uni-bielefeld.de

BASE - Bielefeld Academic Search Engine - www.base-search.net
*


Re: SOLR 4.10.4 - error creating document

2015-05-11 Thread Emir Arnautovic

Hi Bernrd,
dcdescription field is not indexed.

Thanks,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


On 11.05.2015 15:22, Bernd Fehling wrote:

Hi Emir,

the dcdescription field is definately to big.
But why is it complaining about f_dcperson and not dcdescription?

Regards
Bernd


Am 11.05.2015 um 15:12 schrieb Emir Arnautovic:

Hi Bernd,
Issue is with f_dcperson and what ends up in that field. It is configured to be 
string, which means it is not tokenized so if some huge value is
in either dccreator or dccontributor it will end up as single term. Nemes 
suggest that it should not contain such values, but double check in
your import code if you are reading wrong column or concatenating contributors 
or something else causing value to be to big. Also check if you
have some copyField that should not be there.

Thanks,
Emir
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


On 11.05.2015 14:13, Bernd Fehling wrote:

I'm getting the following error with 4.10.4

WARN  org.apache.solr.handler.dataimport.SolrWriter  – Error creating document :
SolrInputDocument(fields: [dcautoclasscode=310, dclang=unknown,

..., dcdocid=dd05ad427a58b49150a4ca36148187028562257a77643062382a1366250112ac])
org.apache.solr.common.SolrException: Exception writing document
id ftumdeepblue:oai:deepblue.lib.umich.edu:2027.42/79437 to the index; possible 
analysis error.
  at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168)
  at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
  at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
...
  at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
  at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
Caused by: java.lang.IllegalArgumentException: Document contains at least one 
immense term
in field=f_dcperson (whose UTF8 encoding is longer than the max length 
32766), all of which were skipped.
Please correct the analyzer to not produce such terms.  The prefix of the first 
immense
term is: '[102, 111, 114, 32, 97, 32, 114, 101, 118, 105, 101, 119, 32, 115, 
101, 101, 32, 66, 114,
111, 119, 110, 105, 110, 103, 32, 32, 32, 50, 48]...', original message:
bytes can be at most 32766 in length; got 38177
  at 
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:687)
...


My huge field is dcdescription, with the following schema:

 field name=dccreator type=string indexed=true stored=true 
multiValued=true /
 field name=dcdescription type=string indexed=false stored=true /
 field name=f_dcperson type=string indexed=true stored=true 
multiValued=true /
...
copyField source=dccreator dest=f_dcperson /
copyField source=dccontributor dest=f_dcperson /


I guess I have to make dcdescription also multivalue=true?

But why is it complaining about f_dcperson which is already multivalue?

Second guess, dcdescription is not multivalue, but filled to max (32766).
Then it is UTF8 encoded and going beyond 32766 which is larger than a single 
subfield
of a multivaled field and therefore the error?

Any really explanation on this and how to prevent it?

Regards
Bernd




Re: SOLR 4.10.4 - error creating document

2015-05-11 Thread Bernd Fehling
Hi Shawn,

that means if I set a length limit on dcdescription or make dcdescription 
multivalue
than the problem is solved because f_dcperson is already multivalue?

Regards
Bernd


Am 11.05.2015 um 15:17 schrieb Shawn Heisey:
 On 5/11/2015 6:13 AM, Bernd Fehling wrote:
 Caused by: java.lang.IllegalArgumentException: Document contains at least 
 one immense term
 in field=f_dcperson (whose UTF8 encoding is longer than the max length 
 32766), all of which were skipped.
 Please correct the analyzer to not produce such terms.  The prefix of the 
 first immense
 term is: '[102, 111, 114, 32, 97, 32, 114, 101, 118, 105, 101, 119, 32, 115, 
 101, 101, 32, 66, 114,
 111, 119, 110, 105, 110, 103, 32, 32, 32, 50, 48]...', original message:
 bytes can be at most 32766 in length; got 38177
 at 
 org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:687)
 ...
 
 The field in question is f_dcperson, which according to your schema is a
 string type.  If your schema follows the example fieldType
 definitions, then string is a solr.StrField, where the entire input is
 treated as one term.  The field is multiValued and a copyField
 destination, so each value that is sent is one term.
 
 I went looking for this message in the code.   It is logged when a
 MaxBytesLengthExceededException is thrown.
 
 This error is complaining that the size of the *term* (since it's a
 string type, likely the contents of an individual copyField source
 field) you are sending to the f_dcperson field has exceeded 32766, which
 is apparently the largest size for that field type.  You'll either need
 to fix your source data or pick a field type that can handle more data.
 
 Thanks,
 Shawn
 


Re: SOLR 4.10.4 - error creating document

2015-05-11 Thread Bernd Fehling
It turned out that I didn't recognized that dcdescription is not indexed,
only stored. So the next in chain ist f_dcperson where dccreator and
dcdescription is combined and indexed. And this is why the error
shows up on f_dcperson. (delay of error)

Thanks for your help, regards.
Bernd


Am 11.05.2015 um 15:35 schrieb Shawn Heisey:
 On 5/11/2015 7:19 AM, Bernd Fehling wrote:
 After reading https://issues.apache.org/jira/browse/LUCENE-5472
 one question still remains.

 Why is it complaining about f_dcperson which is a copyField when the
 origin problem field is dcdescription which definately is much larger
 than 32766?

 I would assume it complains about dcdescription field. Or not?
 
 If the value resulting in the error does come from a copyField source
 that also uses a string type, then my guess here is that Solr has some
 prioritization that causes the copyField destination to be indexed
 before the sources.  This ordering might make things go a little faster,
 because if it happens right after copying, all or most of the data for
 the destination field would already be sitting in one or more of the CPU
 caches.  Cache hits are wonderful things for performance.
 
 Thanks,
 Shawn
 


Re: SOLR 4.10.4 - error creating document

2015-05-11 Thread Bernd Fehling
Hi Emir,

ahhh, yes you're right. I missed that. Now I understand why it is not
complaining about dcdescription and the error shows up on f_dcperson.
delay of error ;-)

Thanks
Bernd



Am 11.05.2015 um 15:25 schrieb Emir Arnautovic:
 Hi Bernrd,
 dcdescription field is not indexed.
 
 Thanks,
 Emir
 
 -- 
 Monitoring * Alerting * Anomaly Detection * Centralized Log Management
 Solr  Elasticsearch Support * http://sematext.com/
 
 
 On 11.05.2015 15:22, Bernd Fehling wrote:
 Hi Emir,

 the dcdescription field is definately to big.
 But why is it complaining about f_dcperson and not dcdescription?

 Regards
 Bernd


 Am 11.05.2015 um 15:12 schrieb Emir Arnautovic:
 Hi Bernd,
 Issue is with f_dcperson and what ends up in that field. It is configured 
 to be string, which means it is not tokenized so if some huge value is
 in either dccreator or dccontributor it will end up as single term. Nemes 
 suggest that it should not contain such values, but double check in
 your import code if you are reading wrong column or concatenating 
 contributors or something else causing value to be to big. Also check if you
 have some copyField that should not be there.

 Thanks,
 Emir
 -- 
 Monitoring * Alerting * Anomaly Detection * Centralized Log Management
 Solr  Elasticsearch Support * http://sematext.com/


 On 11.05.2015 14:13, Bernd Fehling wrote:
 I'm getting the following error with 4.10.4

 WARN  org.apache.solr.handler.dataimport.SolrWriter  – Error creating 
 document :
 SolrInputDocument(fields: [dcautoclasscode=310, dclang=unknown,
 
 ..., 
 dcdocid=dd05ad427a58b49150a4ca36148187028562257a77643062382a1366250112ac])
 org.apache.solr.common.SolrException: Exception writing document
 id ftumdeepblue:oai:deepblue.lib.umich.edu:2027.42/79437 to the index; 
 possible analysis error.
   at 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168)
   at 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
   at 
 org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
 ...
   at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
   at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
 Caused by: java.lang.IllegalArgumentException: Document contains at least 
 one immense term
 in field=f_dcperson (whose UTF8 encoding is longer than the max length 
 32766), all of which were skipped.
 Please correct the analyzer to not produce such terms.  The prefix of the 
 first immense
 term is: '[102, 111, 114, 32, 97, 32, 114, 101, 118, 105, 101, 119, 32, 
 115, 101, 101, 32, 66, 114,
 111, 119, 110, 105, 110, 103, 32, 32, 32, 50, 48]...', original message:
 bytes can be at most 32766 in length; got 38177
   at 
 org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:687)
 ...


 My huge field is dcdescription, with the following schema:

  field name=dccreator type=string indexed=true stored=true 
 multiValued=true /
  field name=dcdescription type=string indexed=false 
 stored=true /
  field name=f_dcperson type=string indexed=true stored=true 
 multiValued=true /
 ...
 copyField source=dccreator dest=f_dcperson /
 copyField source=dccontributor dest=f_dcperson /


 I guess I have to make dcdescription also multivalue=true?

 But why is it complaining about f_dcperson which is already multivalue?

 Second guess, dcdescription is not multivalue, but filled to max (32766).
 Then it is UTF8 encoded and going beyond 32766 which is larger than a 
 single subfield
 of a multivaled field and therefore the error?

 Any really explanation on this and how to prevent it?

 Regards
 Bernd
 


Re: SOLR 4.10.4 - error creating document

2015-05-11 Thread Shawn Heisey
On 5/11/2015 6:13 AM, Bernd Fehling wrote:
 Caused by: java.lang.IllegalArgumentException: Document contains at least one 
 immense term
 in field=f_dcperson (whose UTF8 encoding is longer than the max length 
 32766), all of which were skipped.
 Please correct the analyzer to not produce such terms.  The prefix of the 
 first immense
 term is: '[102, 111, 114, 32, 97, 32, 114, 101, 118, 105, 101, 119, 32, 115, 
 101, 101, 32, 66, 114,
 111, 119, 110, 105, 110, 103, 32, 32, 32, 50, 48]...', original message:
 bytes can be at most 32766 in length; got 38177
 at 
 org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:687)
 ...

The field in question is f_dcperson, which according to your schema is a
string type.  If your schema follows the example fieldType
definitions, then string is a solr.StrField, where the entire input is
treated as one term.  The field is multiValued and a copyField
destination, so each value that is sent is one term.

I went looking for this message in the code.   It is logged when a
MaxBytesLengthExceededException is thrown.

This error is complaining that the size of the *term* (since it's a
string type, likely the contents of an individual copyField source
field) you are sending to the f_dcperson field has exceeded 32766, which
is apparently the largest size for that field type.  You'll either need
to fix your source data or pick a field type that can handle more data.

Thanks,
Shawn



Re: SOLR 4.10.4 - error creating document

2015-05-11 Thread Shawn Heisey
On 5/11/2015 7:19 AM, Bernd Fehling wrote:
 After reading https://issues.apache.org/jira/browse/LUCENE-5472
 one question still remains.

 Why is it complaining about f_dcperson which is a copyField when the
 origin problem field is dcdescription which definately is much larger
 than 32766?

 I would assume it complains about dcdescription field. Or not?

If the value resulting in the error does come from a copyField source
that also uses a string type, then my guess here is that Solr has some
prioritization that causes the copyField destination to be indexed
before the sources.  This ordering might make things go a little faster,
because if it happens right after copying, all or most of the data for
the destination field would already be sitting in one or more of the CPU
caches.  Cache hits are wonderful things for performance.

Thanks,
Shawn



Re: Solr query which return only those docs whose all tokens are from given list

2015-05-11 Thread Sujit Pal
Hi Naresh,

Couldn't you could just model this as an OR query since your requirement is
at least one (but can be more than one), ie:

tags:T1 tags:T2 tags:T3

-sujit


On Mon, May 11, 2015 at 4:14 AM, Naresh Yadav nyadav@gmail.com wrote:

 Hi all,

 Also asked this here : http://stackoverflow.com/questions/30166116

 For example i have SOLR docs in which tags field is indexed :

 Doc1 - tags:T1 T2

 Doc2 - tags:T1 T3

 Doc3 - tags:T1 T4

 Doc4 - tags:T1 T2 T3

 Query1 : get all docs with tags:T1 AND tags:T3 then it works and will
 give Doc2 and Doc4

 Query2 : get all docs whose tags must be one of these [T1, T2, T3] Expected
 is : Doc1, Doc2, Doc4

 How to model Query2 in Solr ?? Please help me on this ?



Re: SolrJ vs. plain old HTTP post

2015-05-11 Thread Steven White
Thanks Erik and Emir.

Erik: The fact that SolrJ is aware of SolrCloud is enough to put it over
plain old HTTP post.

Emir: I looked into Solr's data import handler, unfortunately, it won't
work for my need.

To close the loop on this question, I will need to enable Jetty's SSL (the
jetty that comes with Solr 5.1).  If I do so, will SolrJ still work, can I
assume that SolrJ supports SSL?

I Google'ed but cannot find the answer.

Thanks again.

Steve

On Mon, May 11, 2015 at 8:39 AM, Erik Hatcher erik.hatc...@gmail.com
wrote:

 Another advantage to SolrJ is with SolrCloud (ZK) awareness, and taking
 advantage of some routing optimizations client-side so the cluster has less
 hops to make.

 —
 Erik Hatcher, Senior Solutions Architect
 http://www.lucidworks.com http://www.lucidworks.com/




  On May 11, 2015, at 8:21 AM, Steven White swhite4...@gmail.com wrote:
 
  Hi Everyone,
 
  If all that I need to do is send data to Solr to add / delete a Solr
  document, which tool is better for the job: SolrJ or plain old HTTP post?
 
  In other word, what are the advantages of using SolrJ when the need is to
  push data to Solr for indexing?
 
  Thanks,
 
  Steve




Re: indexing java byte code in classes / jars

2015-05-11 Thread Walter Underwood
How about Krugle?

http://opensearch.krugle.org/

Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

On May 11, 2015, at 3:18 AM, Tomasz Borek tomasz.bo...@gmail.com wrote:

 There's also Perl-backed ACK. http://beyondgrep.com/
 
 Which does the job of searching code really well.
 
 And I think at least once I came across something that stemmed from ACK and
 claimed it was faster/better... googling... aah! The Silver Searcher it
 was. :-)
 http://betterthanack.com/
 
 pozdrawiam,
 LAFK
 
 2015-05-09 12:40 GMT+02:00 Mark javam...@gmail.com:
 
 Hi  Alexandre,
 
 Solr  ASM is the extact poblem I'm looking to hack about with so I'm keen
 to consider any code no matter how ugly or broken
 
 Regards
 
 Mark
 
 On 9 May 2015 at 10:21, Alexandre Rafalovitch arafa...@gmail.com wrote:
 
 If you only have classes/jars, use ASM. I have done this before, have
 some
 ugly code to share if you want.
 
 If you have sources, javadoc 8 is a good way too. I am doing that now for
 solr-start.com, code on Github.
 
 Regards,
Alex
 On 9 May 2015 7:09 am, Mark javam...@gmail.com wrote:
 
 To answer why bytecode - because mostly the use case I have is looking
 to
 index as much detail from jars/classes.
 
 extract class names,
 method names
 signatures
 packages / imports
 
 I am considering using ASM in order to generate an analysis view of the
 class
 
 The sort of usecases I have would be method / signature searches.
 
 For example;
 
 1) show any classes with a method named parse*
 
 2) show any classes with a method named parse that passes in a type
 *json*
 
 ...etc
 
 In the past I have written something to reverse out javadocs from just
 java
 bytecode, using solr would move this idea considerably much more
 powerful.
 
 Thanks for the suggestions so far
 
 
 
 
 
 
 
 On 8 May 2015 at 21:19, Erik Hatcher erik.hatc...@gmail.com wrote:
 
 Oh, and sorry, I omitted a couple of details:
 
 # creating the “java” core/collection
 bin/solr create -c java
 
 # I ran this from my Solr source code checkout, so that
 SolrLogFormatter.class just happened to be handy
 
Erik
 
 
 
 
 On May 8, 2015, at 4:11 PM, Erik Hatcher erik.hatc...@gmail.com
 wrote:
 
 What kinds of searches do you want to run?  Are you trying to
 extract
 class names, method names, and such and make those searchable?   If
 that’s
 the case, you need some kind of “parser” to reverse engineer that
 information from .class and .jar files before feeding it to Solr,
 which
 would happen before analysis.   Java itself comes with a javap
 command
 that
 can do this; whether this is the “best” way to go for your scenario I
 don’t
 know, but here’s an interesting example pasted below (using Solr
 5.x).
 
 —
 Erik Hatcher, Senior Solutions Architect
 http://www.lucidworks.com
 
 
 javap
 build/solr-core/classes/java/org/apache/solr/SolrLogFormatter.class 
 test.txt
 bin/post -c java test.txt
 
 now search for coreInfoMap
 http://localhost:8983/solr/java/browse?q=coreInfoMap
 
 I tried to be cleverer and use the stdin option of bin/post, like
 this:
 javap
 build/solr-core/classes/java/org/apache/solr/SolrLogFormatter.class |
 bin/post -c java -url http://localhost:8983/solr/java/update/extract
 -type text/plain -params literal.id=SolrLogFormatter -out yes -d
 but something isn’t working right with the stdin detection like
 that
 (it
 does work to `cat test.txt | bin/post…` though, hmmm)
 
 test.txt looks like this, `cat test.txt`:
 Compiled from SolrLogFormatter.java
 public class org.apache.solr.SolrLogFormatter extends
 java.util.logging.Formatter {
 long startTime;
 long lastTime;
 java.util.Maporg.apache.solr.SolrLogFormatter$Method,
 java.lang.String methodAlias;
 public boolean shorterFormat;
 java.util.Maporg.apache.solr.core.SolrCore,
 org.apache.solr.SolrLogFormatter$CoreInfo coreInfoMap;
 public java.util.Mapjava.lang.String, java.lang.String
 classAliases;
 static java.lang.ThreadLocaljava.lang.String threadLocal;
 public org.apache.solr.SolrLogFormatter();
 public void setShorterFormat();
 public java.lang.String format(java.util.logging.LogRecord);
 public void appendThread(java.lang.StringBuilder,
 java.util.logging.LogRecord);
 public java.lang.String _format(java.util.logging.LogRecord);
 public java.lang.String getHead(java.util.logging.Handler);
 public java.lang.String getTail(java.util.logging.Handler);
 public java.lang.String
 formatMessage(java.util.logging.LogRecord);
 public static void main(java.lang.String[]) throws
 java.lang.Exception;
 public static void go() throws java.lang.Exception;
 static {};
 }
 
 On May 8, 2015, at 3:31 PM, Mark javam...@gmail.com wrote:
 
 I looking to use Solr search over the byte code in Classes and
 Jars.
 
 Does anyone know or have experience of Analyzers, Tokenizers, and
 Token
 Filters for such a task?
 
 Regards
 
 Mark
 
 
 
 
 
 



PatternReplaceCharFilter + solr.WhitespaceTokenizerFactory behaviour

2015-05-11 Thread Mihran Shahinian
I must be missing something obvious.I have a simple regex that removes
spacehyphenspace pattern.

The unit test below works fine, but when I plug it into schema and query,
regex does not match, since input already gets split by space (further
below). My understanding that charFilter would operate on raw input string
and than pass it to the whitespace tokenizer which seems to be the case,
but I am not sure why I get already split token stream.

Analyzer analyzer = new Analyzer() {
@Override
protected TokenStreamComponents createComponents(String
fieldName,
 Reader reader)
{
Tokenizer tokenizer = new MockTokenizer(reader,

MockTokenizer.WHITESPACE,
false);
return new TokenStreamComponents(tokenizer,
 tokenizer);
}

@Override
protected Reader initReader(String fieldName,
Reader reader) {
return new
PatternReplaceCharFilter(pattern(\\s+[\u002d,\u2011,\u2012,\u2013,\u2014,\u2212]\\s+),
 ,
reader);
}
};

final TokenStream tokens = analyzer.tokenStream(,  new
StringReader(a - b));
tokens.reset();
final CharTermAttribute termAtt =
tokens.addAttribute(CharTermAttribute.class);
while (tokens.incrementToken()) {
System.out.println(===  +
   new String(Arrays.copyOf(termAtt.buffer(),
termAtt.length(;
}

I end up with:
=== a
=== b


Now I define the same in my schema:

fieldType name=text class=solr.TextField positionIncrementGap=100
 multiValued=true autoGeneratePhraseQueries=false
analyzer  type=index
 tokenizer class=solr.WhitespaceTokenizerFactory
/
/analyzer
analyzer  type=query
charFilter
class=solr.PatternReplaceCharFilterFactory
pattern=\s+[\u002d,\u2011,\u2012,\u2013,\u2014,\u2212]\s+ replacement= ;
 /
tokenizer class=solr.WhitespaceTokenizerFactory /
/analyzer
/fieldType

field name=myfield type=text indexed=true stored=false
multiValued=true/

When I query the input already comes in split into (e.g. a,-,b)
PatternReplaceCharFilter's processPattern method so regex would not match.
CharSequence processPattern(CharSequence input) ...
even though charFilter is defined before tokenizer.




Here is the query
SolrQuery solrQuery = new SolrQuery(a - b);
solrQuery.setRequestHandler(/select);
solrQuery.set(defType,
  edismax);
solrQuery.set(qf,
  myfield);
solrQuery.set(CommonParams.ROWS,
  0);
solrQuery.set(CommonParams.DEBUG,
  true);
solrQuery.set(CommonParams.DEBUG_QUERY,
  true);
QueryResponse response = solrSvr.query(solrQuery);

System.out.println(parsedQtoString  +
   response.getDebugMap()
   .get(parsedquery_toString));
System.out.println(parsedQ  +
   response.getDebugMap()
   .get(parsedquery));

Output is
parsedQtoString +((myfield:a) (myfield:-) (myfield:b))
parsedQ (+(DisjunctionMaxQuery((myfield:a))
DisjunctionMaxQuery((myfield:-)) DisjunctionMaxQuery((myfield:b/no_coord


Re: Completion Suggester in Solr

2015-05-11 Thread Pradeep Bhattiprolu
Bumping this thread again in the group, haven't received any responses for
this.
I am kind of stuck with this problem last week, any help is highly
appreciated.

Thanks
Pradeep

On Wed, May 6, 2015 at 5:00 PM, Pradeep Bhattiprolu pbhatt...@gmail.com
wrote:

 Hi

 Is there a equivalent of Completion suggester of ElasticSearch in Solr ?

 I am a user who uses both Solr and ES, in different projects.

 I am not able to find a solution in Solr, where i can use :

 1) FSA Structure
 2) multiple terms as synonyms
 3) assign a weight to each document based on certain hueristics, ex:
 popularity score, user search history etc.


 Any kind of help , pointers to relevant examples and documentation is
 highly appreciated.

 thanks in advance.

 Pradeep



Re: Solr custom component issue

2015-05-11 Thread j 90
unsubscribe

On Mon, May 11, 2015 at 6:58 PM, Upayavira u...@odoko.co.uk wrote:

 attaching them to each request, then just add qf= as a param to the URL,
 easy.

 On Mon, May 11, 2015, at 12:17 PM, nutchsolruser wrote:
  These boosting parameters will be configured outside Solr and there is
  seperate module from which these values get populated , I am reading
  those
  values from external datasource and I want to attach them to each request
  .
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Solr-custom-component-issue-tp4204799p4204832.html
  Sent from the Solr - User mailing list archive at Nabble.com.



SOLR plugin: Retrieve all values of multivalued field

2015-05-11 Thread Costi Muraru
Hi folks,

I'm playing with a custom SOLR plugin and I'm trying to retrieve the value
for a multivalued field, using the code below.

==
schema.xml:
field name=my_field_name type=string indexed=true stored=false
multiValued=true/
==
input data:

add

doc
  field name=id83127/field
  field name=my_field_namesomevalue/field
  field name=my_field_namesome other value/field
  field name=my_field_namesome other value 3/field
  field name=my_field_namesome other value 4/field
/doc

/add
==

plugin:

SortedDocValues termsIndex = FieldCache.DEFAULT.getTermsIndex(atomicReader,
my_field_name);
...
int document = 12;
BytesRef spare = termsIndex.get(document);
String value = new String(spare.bytes, spare.offset, spare.length);

--

*This only returns the value some other value 3*. Is there any way to
obtain the other values as well (eg. somevalue, some other value)?
Any help is gladly appreciated.

Thanks,
Costi


Help to index nested document

2015-05-11 Thread Vishal Swaroop
Need your valuable inputs...

I am indexing data from database (one table) which is in this example
format :
id name value
1 Joe 102724904
2 Joe 100996643

- id is primary/ unique key
- there can be same name but different value
- If I try name as unique key then SOLR removes duplicate and indexes 1
document

- I am getting the result in this format... Is there as way I can index
data in a way so that I can value can be child for name...
response: {
numFound: 2,
start: 0,
docs: [
  {
id: 1,
name: Joe,
value: [
  102724904
]
  },
  {
id: 2,
name: Joe,
value: [
  100996643
]
  }...

Expected format :
docs: [
  {
name: Joe,
value: [
  102724904,
  100996643
]
  }


Re: PatternReplaceCharFilter + solr.WhitespaceTokenizerFactory behaviour

2015-05-11 Thread Erick Erickson
This trips up _everybody_ at one point or other. The problem is that
the input goes through the query _parsing_ prior to getting to the
field analysis, and the parser is sensitive to spaces.

Consider the input (without quotes) of my dog. That gets broken up into
default_field:my default_field:dog
and only _then_ does the analysis chain, including your
PatternReplaceCharFilterFactory get applied to the individual tokens.

So, your query input needs to escape the spaces, as in whatever\ -\
somethingelse, or perhaps quote the input, although this latter has
other implications.

Best,
Erick

On Mon, May 11, 2015 at 2:00 PM, Mihran Shahinian slowmih...@gmail.com wrote:
 I must be missing something obvious.I have a simple regex that removes
 spacehyphenspace pattern.

 The unit test below works fine, but when I plug it into schema and query,
 regex does not match, since input already gets split by space (further
 below). My understanding that charFilter would operate on raw input string
 and than pass it to the whitespace tokenizer which seems to be the case,
 but I am not sure why I get already split token stream.

 Analyzer analyzer = new Analyzer() {
 @Override
 protected TokenStreamComponents createComponents(String
 fieldName,
  Reader reader)
 {
 Tokenizer tokenizer = new MockTokenizer(reader,

 MockTokenizer.WHITESPACE,
 false);
 return new TokenStreamComponents(tokenizer,
  tokenizer);
 }

 @Override
 protected Reader initReader(String fieldName,
 Reader reader) {
 return new
 PatternReplaceCharFilter(pattern(\\s+[\u002d,\u2011,\u2012,\u2013,\u2014,\u2212]\\s+),
  ,
 reader);
 }
 };

 final TokenStream tokens = analyzer.tokenStream(,  new
 StringReader(a - b));
 tokens.reset();
 final CharTermAttribute termAtt =
 tokens.addAttribute(CharTermAttribute.class);
 while (tokens.incrementToken()) {
 System.out.println(===  +
new String(Arrays.copyOf(termAtt.buffer(),
 termAtt.length(;
 }

 I end up with:
 === a
 === b


 Now I define the same in my schema:

 fieldType name=text class=solr.TextField positionIncrementGap=100
  multiValued=true autoGeneratePhraseQueries=false
 analyzer  type=index
  tokenizer class=solr.WhitespaceTokenizerFactory
 /
 /analyzer
 analyzer  type=query
 charFilter
 class=solr.PatternReplaceCharFilterFactory
 pattern=\s+[\u002d,\u2011,\u2012,\u2013,\u2014,\u2212]\s+ replacement= ;
  /
 tokenizer class=solr.WhitespaceTokenizerFactory /
 /analyzer
 /fieldType

 field name=myfield type=text indexed=true stored=false
 multiValued=true/

 When I query the input already comes in split into (e.g. a,-,b)
 PatternReplaceCharFilter's processPattern method so regex would not match.
 CharSequence processPattern(CharSequence input) ...
 even though charFilter is defined before tokenizer.




 Here is the query
 SolrQuery solrQuery = new SolrQuery(a - b);
 solrQuery.setRequestHandler(/select);
 solrQuery.set(defType,
   edismax);
 solrQuery.set(qf,
   myfield);
 solrQuery.set(CommonParams.ROWS,
   0);
 solrQuery.set(CommonParams.DEBUG,
   true);
 solrQuery.set(CommonParams.DEBUG_QUERY,
   true);
 QueryResponse response = solrSvr.query(solrQuery);

 System.out.println(parsedQtoString  +
response.getDebugMap()
.get(parsedquery_toString));
 System.out.println(parsedQ  +
response.getDebugMap()
.get(parsedquery));

 Output is
 parsedQtoString +((myfield:a) (myfield:-) (myfield:b))
 parsedQ (+(DisjunctionMaxQuery((myfield:a))
 DisjunctionMaxQuery((myfield:-)) DisjunctionMaxQuery((myfield:b/no_coord


Re: SOLR 4.10.4 - error creating document

2015-05-11 Thread Erick Erickson
I've got to ask _how_ are you intending to search this field? On the
surface, this feels like an XY problem.
It's a string type. Therefore, if this is the input:

102, 111, 114, 32, 97, 32, 114, 101, 118, 105, 101, 119, 32, 115, 101,
101, 32, 66, 114

you'll only ever get a match if you search exactly:
102, 111, 114, 32, 97, 32, 114, 101, 118, 105, 101, 119, 32, 115, 101,
101, 32, 66, 114

None of these will match
102
102,
32
32,
119, 32, 115

etc.

The idea of doing a match on a single _token_ that's over 32K long is
pretty far out there, thus
the check.

The entire multiValued discussion is _probably_ a red herring and
won't help you. multiValued
has nothing to do with multiple terms, that's all up to your field type.

So back up and tell us _how_ you intend to search this field. I'm
guessing you really want
to make it a text-based type instead. But that's just a guess.

Best,
Erick.

On Mon, May 11, 2015 at 8:43 AM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de wrote:
 It turned out that I didn't recognized that dcdescription is not indexed,
 only stored. So the next in chain ist f_dcperson where dccreator and
 dcdescription is combined and indexed. And this is why the error
 shows up on f_dcperson. (delay of error)

 Thanks for your help, regards.
 Bernd


 Am 11.05.2015 um 15:35 schrieb Shawn Heisey:
 On 5/11/2015 7:19 AM, Bernd Fehling wrote:
 After reading https://issues.apache.org/jira/browse/LUCENE-5472
 one question still remains.

 Why is it complaining about f_dcperson which is a copyField when the
 origin problem field is dcdescription which definately is much larger
 than 32766?

 I would assume it complains about dcdescription field. Or not?

 If the value resulting in the error does come from a copyField source
 that also uses a string type, then my guess here is that Solr has some
 prioritization that causes the copyField destination to be indexed
 before the sources.  This ordering might make things go a little faster,
 because if it happens right after copying, all or most of the data for
 the destination field would already be sitting in one or more of the CPU
 caches.  Cache hits are wonderful things for performance.

 Thanks,
 Shawn



Re: SolrJ vs. plain old HTTP post

2015-05-11 Thread Shalin Shekhar Mangar
On Mon, May 11, 2015 at 8:20 PM, Steven White swhite4...@gmail.com wrote:

 Thanks Erik and Emir.

 snip/


 To close the loop on this question, I will need to enable Jetty's SSL (the
 jetty that comes with Solr 5.1).  If I do so, will SolrJ still work, can I
 assume that SolrJ supports SSL?


Yes, SolrJ can work with SSL enabled on the server as long as you pass the
same JVM parameters on the client side to enable SSL e.g.

-Djavax.net.ssl.keyStore=
-Djavax.net.ssl.keyStorePassword=
-Djavax.net.ssl.trustStore=
-Djavax.net.ssl.trustStorePassword=

See
https://cwiki.apache.org/confluence/display/solr/Enabling+SSL#EnablingSSL-IndexadocumentusingCloudSolrClient


 I Google'ed but cannot find the answer.

 Thanks again.

 Steve

 On Mon, May 11, 2015 at 8:39 AM, Erik Hatcher erik.hatc...@gmail.com
 wrote:

  Another advantage to SolrJ is with SolrCloud (ZK) awareness, and taking
  advantage of some routing optimizations client-side so the cluster has
 less
  hops to make.
 
  —
  Erik Hatcher, Senior Solutions Architect
  http://www.lucidworks.com http://www.lucidworks.com/
 
 
 
 
   On May 11, 2015, at 8:21 AM, Steven White swhite4...@gmail.com
 wrote:
  
   Hi Everyone,
  
   If all that I need to do is send data to Solr to add / delete a Solr
   document, which tool is better for the job: SolrJ or plain old HTTP
 post?
  
   In other word, what are the advantages of using SolrJ when the need is
 to
   push data to Solr for indexing?
  
   Thanks,
  
   Steve
 
 




-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr query which return only those docs whose all tokens are from given list

2015-05-11 Thread Naresh Yadav
Thanks Andrew, You got my problem precisely But solutions you suggested may
not work for me.

In my API i get only list of tags authorized i.e [T1, T2, T3] and based on
that only i need to construct my Solr query.
So first solution with NOT (T4 OR T5) will not work.

In real case tag ids T1, T2 are UUID's, so range query also will not work
as i have no control on order of these ids.

Looking for more suggestions ??

Thanks
Naresh

On Mon, May 11, 2015 at 10:05 PM, Andrew Chillrud achill...@opentext.com
wrote:

 Based on his example, it sounds like Naresh not only wants the tags field
 to contain at least one of the values [T1, T2, T3] but also wants to
 exclude documents that contain a tag other than T1, T2, or T3 (Doc3 should
 not be retrieved).

 If the set of possible values in the tags field is limited and known, you
 could use a NOT (or '-') clause to accomplish this. If there were 5
 possible tag values:

 tags:(( T1 OR T2 OR T3) NOT (T4 OR T5))

 However this doesn't seem practical if the number of possible values is
 large or unlimited. Perhaps something could be done with range queries:

 tags:(( T1 OR T2 OR T3) NOT ([* TO T1} OR {T1 TO T2} OR {T3 to * ]))

 however this would require whatever is constructing the query to be aware
 of the lexical ordering of the terms in the index. Maybe there are more
 elegant solutions, but I am not aware of them.

 - Andy -

 -Original Message-
 From: sujitatgt...@gmail.com [mailto:sujitatgt...@gmail.com] On Behalf Of
 Sujit Pal
 Sent: Monday, May 11, 2015 10:40 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr query which return only those docs whose all tokens are
 from given list

 Hi Naresh,

 Couldn't you could just model this as an OR query since your requirement
 is at least one (but can be more than one), ie:

 tags:T1 tags:T2 tags:T3

 -sujit


 On Mon, May 11, 2015 at 4:14 AM, Naresh Yadav nyadav@gmail.com
 wrote:

  Hi all,
 
  Also asked this here : http://stackoverflow.com/questions/30166116
 
  For example i have SOLR docs in which tags field is indexed :
 
  Doc1 - tags:T1 T2
 
  Doc2 - tags:T1 T3
 
  Doc3 - tags:T1 T4
 
  Doc4 - tags:T1 T2 T3
 
  Query1 : get all docs with tags:T1 AND tags:T3 then it works and
  will give Doc2 and Doc4
 
  Query2 : get all docs whose tags must be one of these [T1, T2, T3]
  Expected is : Doc1, Doc2, Doc4
 
  How to model Query2 in Solr ?? Please help me on this ?
 



Re: Best way to backup and restore an index for a cloud setup in 4.6.1?

2015-05-11 Thread Shalin Shekhar Mangar
Hi John,

There are a few HTTP APIs for replication, one of which can let you take a
backup of the index. Restoring can be as simple as just copying over the
index in the right location on the disk. A new restore API will be released
with the next version of Solr which will make some of these tasks easier.

See
https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexReplication-HTTPAPICommandsfortheReplicationHandler

On Fri, May 8, 2015 at 10:26 PM, John Smith g10vstmo...@gmail.com wrote:

 All,

 With a cloud setup for a collection in 4.6.1, what is the most elegant way
 to backup and restore an index?

 We are specifically looking into the application of when doing a full
 reindex, with the idea of building an index on one set of servers, backing
 up the index, and then restoring that backup on another set of servers. Is
 there a better way to rebuild indexes on another set of servers?

 We are not sharding if that makes any difference.

 Thanks,
 g10vstmoney




-- 
Regards,
Shalin Shekhar Mangar.


Re: Queries on SynonymFilterFactory

2015-05-11 Thread Zheng Lin Edwin Yeo
Yes sure, thanks for your advice.

I'm still waiting for my server to come before I can scale up my system and
do the testing. Now the Solr running on my 4GB RAM system will crash if I
try to scale up my system as there's not enough memory to support it.

Regards,
Edwin


On 11 May 2015 at 19:11, Alessandro Benedetti benedetti.ale...@gmail.com
wrote:

 2015-05-11 4:44 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

  I've managed to run the synonyms with 10 different synonyms file. Each of
  the synonym file size is 1MB, which consist of about 1000 tokens, and
 each
  token has about 40-50 words. These lists of files are more extreme,
 which I
  probably won't use for the real environment, except now for the testing
  purpose.
 
  The QTime is about 100-200, as compared to about 50 for collection
 without
  synonyms configured.
 
  Is this timing consider fast or slow? Although the synonyms files are
 big,
  there's not that many index in my collection yet. Just afraid the
  performance will be affected when more index comes in.
 

 If it's fast or slow it depends on your requirements :)
 For a human waiting for the response, I would say 100ms to be quite fast.
 To understand what happens when the index scale up, you should prototype !
 Anyway there are a lot of solution in Solr to scale up your system !

 Cheers

 
  Regards,
  Edwin
   On 9 May 2015 00:14, Zheng Lin Edwin Yeo edwinye...@gmail.com
 wrote:
 
   Thank you for your suggestions.
  
   I can't do a proper testing on that yet as I'm currently using a 4GB
 RAM
   normal PC machine, and all these probably requires more RAM that what I
   have.
   I've tried running the setup with 20 synonyms file, and the system went
   Out of Memory before I could test anything.
  
   For your option 2), do you mean that I'll need to download a synonym
   database (like the one with over 20MB in size which I have), and index
  them
   into an Ad Hoc Solr Core to manage them?
  
   I probably can only try them out properly when I can get the server
   machine with more RAM.
  
   Regards,
   Edwin
  
  
   On 8 May 2015 at 22:16, Alessandro Benedetti 
 benedetti.ale...@gmail.com
  
   wrote:
  
   This is a quite big Sinonym corpus !
   If it's not feasible to have only 1 big synonym file ( I haven't
  checked,
   so I assume the 1 Mb limit is true, even if strange)
   I would do an experiment :
   1) testing query time with a Solr Classic config
   2) Use an Ad Hoc Solr Core to manage Synonyms ( in this way we can
 keep
  it
   updated and use it with a custom version of the Sysnonym filter that
  will
   get the Synonyms directly from another Solr instance).
   2b) develop a Solr plugin to provide this approach
  
   If the synonym thesaurus is really big, I guess managing them through
   another Solr Core ( or something similar) locally , will be better
 than
   managing it with an external web service.
  
   Cheers
  
   2015-05-08 12:16 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com
 :
  
So it means like having more than 10 or 20 synonym files locally
 will
   still
be faster than accessing external service?
   
As I found out that zookeeper only allows the synonym.txt file to
 be a
maximum of 1MB, and as my potential synonym file is more than 20MB,
  I'll
need to split the file to more than 20 of them.
   
Regards,
Edwin
   
  
  
  
   --
   --
  
   Benedetti Alessandro
   Visiting card : http://about.me/alessandro_benedetti
  
   Tyger, tyger burning bright
   In the forests of the night,
   What immortal hand or eye
   Could frame thy fearful symmetry?
  
   William Blake - Songs of Experience -1794 England
  
  
  
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England



Re: Queries on SynonymFilterFactory

2015-05-11 Thread Zheng Lin Edwin Yeo
Yes sure, thanks for your advice.

I'm still waiting for my server to come before I can scale up my system and
do the testing. Now the Solr running on my 4GB RAM system will crash if I
try to scale up my system as there's not enough memory to support it.

Regards,
Edwin


On 11 May 2015 at 19:11, Alessandro Benedetti benedetti.ale...@gmail.com
wrote:

 2015-05-11 4:44 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

  I've managed to run the synonyms with 10 different synonyms file. Each of
  the synonym file size is 1MB, which consist of about 1000 tokens, and
 each
  token has about 40-50 words. These lists of files are more extreme,
 which I
  probably won't use for the real environment, except now for the testing
  purpose.
 
  The QTime is about 100-200, as compared to about 50 for collection
 without
  synonyms configured.
 
  Is this timing consider fast or slow? Although the synonyms files are
 big,
  there's not that many index in my collection yet. Just afraid the
  performance will be affected when more index comes in.
 

 If it's fast or slow it depends on your requirements :)
 For a human waiting for the response, I would say 100ms to be quite fast.
 To understand what happens when the index scale up, you should prototype !
 Anyway there are a lot of solution in Solr to scale up your system !

 Cheers

 
  Regards,
  Edwin
   On 9 May 2015 00:14, Zheng Lin Edwin Yeo edwinye...@gmail.com
 wrote:
 
   Thank you for your suggestions.
  
   I can't do a proper testing on that yet as I'm currently using a 4GB
 RAM
   normal PC machine, and all these probably requires more RAM that what I
   have.
   I've tried running the setup with 20 synonyms file, and the system went
   Out of Memory before I could test anything.
  
   For your option 2), do you mean that I'll need to download a synonym
   database (like the one with over 20MB in size which I have), and index
  them
   into an Ad Hoc Solr Core to manage them?
  
   I probably can only try them out properly when I can get the server
   machine with more RAM.
  
   Regards,
   Edwin
  
  
   On 8 May 2015 at 22:16, Alessandro Benedetti 
 benedetti.ale...@gmail.com
  
   wrote:
  
   This is a quite big Sinonym corpus !
   If it's not feasible to have only 1 big synonym file ( I haven't
  checked,
   so I assume the 1 Mb limit is true, even if strange)
   I would do an experiment :
   1) testing query time with a Solr Classic config
   2) Use an Ad Hoc Solr Core to manage Synonyms ( in this way we can
 keep
  it
   updated and use it with a custom version of the Sysnonym filter that
  will
   get the Synonyms directly from another Solr instance).
   2b) develop a Solr plugin to provide this approach
  
   If the synonym thesaurus is really big, I guess managing them through
   another Solr Core ( or something similar) locally , will be better
 than
   managing it with an external web service.
  
   Cheers
  
   2015-05-08 12:16 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com
 :
  
So it means like having more than 10 or 20 synonym files locally
 will
   still
be faster than accessing external service?
   
As I found out that zookeeper only allows the synonym.txt file to
 be a
maximum of 1MB, and as my potential synonym file is more than 20MB,
  I'll
need to split the file to more than 20 of them.
   
Regards,
Edwin
   
  
  
  
   --
   --
  
   Benedetti Alessandro
   Visiting card : http://about.me/alessandro_benedetti
  
   Tyger, tyger burning bright
   In the forests of the night,
   What immortal hand or eye
   Could frame thy fearful symmetry?
  
   William Blake - Songs of Experience -1794 England
  
  
  
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England



Solr Multiword Synonym Problem

2015-05-11 Thread solrnovice
Hi all, 

I am trying to solve the solr multiword synonym issue at our installation, I
am currently using SOLR-4.9.x version. I used the
com.lucidworks.analysis.AutoPhrasingTokenFilterFactory from Lucidworks git
repo and used this in my schema.xml and also used their
com.lucidworks.analysis.AutoPhrasingQParserPlugin in the solrconfig.xml.
To make testing easier for the solr community, i used the autophrases.txt as
below. 
big apple
new york city
city of new york
new york new york
new york ny
ny city
ny ny
new york

When i run a query for big+apple my parsedQuery converts perfectly.

parsedquery:(+DisjunctionMaxQuery((searchField:big_apple)))/no_coord,
parsedquery_toString:+(searchField:big_apple),
..

but when i search for new+york+city, it converts to 

   parsedquery:(+(DisjunctionMaxQuery((searchField:new_york_city))
DisjunctionMaxQuery((searchField:city/no_coord,
parsedquery_toString:+((searchField:new_york_city)
(searchField:city)),
explain:{},

Why is it trying to parse the word city separately. I thought when it
finds an exact match new york city in the auto phrases.txt it should just
replace the white space with underscore ( which is what i choose) in my
solrconfig. 
But if i comment out the following in my autophrases.txt  

#city of new york

it works, fine, it doesn't perform a DisjunctionMaxQuery on city. 

Same with New york Ny, since there is an entry in auto phrases.txt
beginning with Ny , its searching for NY as well. 

Its like an overlap causing this problem.  

Did anybody face this problem, if so could you please throw some light on
how you solved this?  I used the branch from git for lucid works, that was
10 months old.

Any help is highly appreciated.

this is my solrconfig.xml

--
queryParser name=autophrasingParser
class=com.lucidworks.analysis.AutoPhrasingQParserPlugin  
  str name=phrasesautophrases.txt/str
  str name=replaceWhitespaceWith_/str 
  str name=defTypeedismax/str
/queryParser
  requestHandler name=/select class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
   str name=dftext/str
  str name=defTypeautophrasingParser/str
 /lst
/requestHandler
--
This is my setting from schema.xml
--

  fieldType name=text_autophrase class=solr.TextField 
positionIncrementGap=100
 analyzer type=index
   tokenizer class=solr.WhitespaceTokenizerFactory /
   filter class=solr.LowerCaseFilterFactory /
   filter
class=com.lucidworks.analysis.AutoPhrasingTokenFilterFactory
phrases=autophrases.txt includeTokens=true replaceWhitespaceWith=_ /
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
 /analyzer
/fieldType
--





thanks
SolrUser








--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Multiword-Synonym-Problem-tp4204979.html
Sent from the Solr - User mailing list archive at Nabble.com.


boolean operators OR/NOT get highlighted by solr

2015-05-11 Thread Tang, Rebecca
Hi,

We have a SOLR query like this

q=ddmdate%3A2012-05-01T00%3A00%3A00Z+NOT+dddate%3A2010-06-11T00%3A00%3A00Zwt=jsonindent=truehl=truehl.simple.pre=%3Ch1%3Ehl.simple.post=%3C%2Fh1%3Ehl.requireFieldMatch=truehl.preserveMulti=truehl.fl=otf.ot.hl.fragsize=300f.ot.hl.alternateField=otf.ot.hl.maxAlternateFieldLength=300fl=id

And the response looks like this and notice the work not is highlighted by solr:

{
  responseHeader:{
status:0,
QTime:5,
params:{
  f.ot.hl.maxAlternateFieldLength:300,
  hl.requireFieldMatch:true,
  fl:id,
  f.ot.hl.alternateField:ot,
  indent:true,
  q:ddmdate:2012-05-01T00:00:00Z NOT dddate:2010-06-11T00:00:00Z,
  f.ot.hl.fragsize:300,
  hl.preserveMulti:true,
  hl.simple.pre:h1,
  hl.simple.post:/h1,
  hl.fl:ot,
  wt:json,
  hl:true}},
  response:{numFound:1,start:0,docs:[
  {
id:xrbw0180}]
  },
  highlighting:{
xrbw0180:{
  ot:[ of this info getting out to consumers and others, therefore, 
please do h1not/h1 forward or provide copies to others. Hope this 
helps...\n\nKevin\n\nl\n\n5027717374#12;pgNbr=1\n]}}}

This happens with OR as well:
q=ddmdate%3A2012-05-01T00%3A00%3A00Z+OR+dddate%3A2010-06-11T00%3A00%3A00Zwt=jsonindent=truehl=truehl.simple.pre=%3Ch1%3Ehl.simple.post=%3C%2Fh1%3Ehl.requireFieldMatch=truehl.preserveMulti=truehl.fl=otf.ot.hl.fragsize=300f.ot.hl.alternateField=otf.ot.hl.maxAlternateFieldLength=300fl=id


{
  responseHeader:{
status:0,
QTime:4,
params:{
  f.ot.hl.maxAlternateFieldLength:300,
  hl.requireFieldMatch:true,
  fl:id,
  f.ot.hl.alternateField:ot,
  indent:true,
  q:ddmdate:2012-05-01T00:00:00Z OR dddate:2010-06-11T00:00:00Z,
  f.ot.hl.fragsize:300,
  hl.preserveMulti:true,
  hl.simple.pre:h1,
  hl.simple.post:/h1,
  hl.fl:ot,
  wt:json,
  hl:true}},
  response:{numFound:1,start:0,docs:[
  {
id:xrbw0180}]
  },
  highlighting:{
xrbw0180:{
  ot:[ of this info getting out to consumers and others, therefore, 
please do not forward h1or/h1 provide copies to others. Hope this 
helps...\n\nKevin\n\nl\n\n5027717374#12;pgNbr=1\n]}}}

This does not happen with the AND operator.

Is this a bug in solr?  Or is it a feature that I can turn off?



Rebecca Tang
Applications Developer, UCSF CKM
Industry Documents Digital Libraries
E: rebecca.t...@ucsf.edu



RE: Solr query which return only those docs whose all tokens are from given list

2015-05-11 Thread Andrew Chillrud
Based on his example, it sounds like Naresh not only wants the tags field to 
contain at least one of the values [T1, T2, T3] but also wants to exclude 
documents that contain a tag other than T1, T2, or T3 (Doc3 should not be 
retrieved).

If the set of possible values in the tags field is limited and known, you could 
use a NOT (or '-') clause to accomplish this. If there were 5 possible tag 
values:

tags:(( T1 OR T2 OR T3) NOT (T4 OR T5))

However this doesn't seem practical if the number of possible values is large 
or unlimited. Perhaps something could be done with range queries:

tags:(( T1 OR T2 OR T3) NOT ([* TO T1} OR {T1 TO T2} OR {T3 to * ]))

however this would require whatever is constructing the query to be aware of 
the lexical ordering of the terms in the index. Maybe there are more elegant 
solutions, but I am not aware of them.

- Andy -

-Original Message-
From: sujitatgt...@gmail.com [mailto:sujitatgt...@gmail.com] On Behalf Of Sujit 
Pal
Sent: Monday, May 11, 2015 10:40 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr query which return only those docs whose all tokens are from 
given list

Hi Naresh,

Couldn't you could just model this as an OR query since your requirement is at 
least one (but can be more than one), ie:

tags:T1 tags:T2 tags:T3

-sujit


On Mon, May 11, 2015 at 4:14 AM, Naresh Yadav nyadav@gmail.com wrote:

 Hi all,

 Also asked this here : http://stackoverflow.com/questions/30166116

 For example i have SOLR docs in which tags field is indexed :

 Doc1 - tags:T1 T2

 Doc2 - tags:T1 T3

 Doc3 - tags:T1 T4

 Doc4 - tags:T1 T2 T3

 Query1 : get all docs with tags:T1 AND tags:T3 then it works and 
 will give Doc2 and Doc4

 Query2 : get all docs whose tags must be one of these [T1, T2, T3] 
 Expected is : Doc1, Doc2, Doc4

 How to model Query2 in Solr ?? Please help me on this ?



Re: Solr query which return only those docs whose all tokens are from given list

2015-05-11 Thread Alessandro Benedetti
A simple OR query should be fine :

tags:(T1 T2 T3)

Cheers

2015-05-11 15:39 GMT+01:00 Sujit Pal sujit@comcast.net:

 Hi Naresh,

 Couldn't you could just model this as an OR query since your requirement is
 at least one (but can be more than one), ie:

 tags:T1 tags:T2 tags:T3

 -sujit


 On Mon, May 11, 2015 at 4:14 AM, Naresh Yadav nyadav@gmail.com
 wrote:

  Hi all,
 
  Also asked this here : http://stackoverflow.com/questions/30166116
 
  For example i have SOLR docs in which tags field is indexed :
 
  Doc1 - tags:T1 T2
 
  Doc2 - tags:T1 T3
 
  Doc3 - tags:T1 T4
 
  Doc4 - tags:T1 T2 T3
 
  Query1 : get all docs with tags:T1 AND tags:T3 then it works and will
  give Doc2 and Doc4
 
  Query2 : get all docs whose tags must be one of these [T1, T2, T3]
 Expected
  is : Doc1, Doc2, Doc4
 
  How to model Query2 in Solr ?? Please help me on this ?
 




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: schema modification issue

2015-05-11 Thread Steve Rowe
Hi,

Thanks for reporting, I’m working a test to reproduce.  

Can you please create a Solr JIRA issue for this?:  
https://issues.apache.org/jira/browse/SOLR/

Thanks,
Steve

 On May 7, 2015, at 5:40 AM, User Zolr zolr.u...@gmail.com wrote:
 
 Hi there,
 
 I have come accross a problem that  when using managed schema in SolrCloud,
 adding fields into schema would SOMETIMES end up prompting Can't find
 resource 'schema.xml' in classpath or '/configs/collectionName',
 cwd=/export/solr/solr-5.1.0/server, there is of course no schema.xml in
 configs, but 'schema.xml.bak' and 'managed-schema'
 
 i use solrj to create a collection:
 
Path tempPath = getConfigPath();
 client.uploadConfig(tempPath, name); //customized configs with
 solrconfig.xml using ManagedIndexSchemaFactory
 if(numShards==0){
 numShards = getNumNodes(client);
 }
 Create request = new CollectionAdminRequest.Create();
 request.setCollectionName(name);
 request.setNumShards(numShards);
 replicationFactor =
 (replicationFactor==0?DEFAULT_REPLICA_FACTOR:replicationFactor);
 request.setReplicationFactor(replicationFactor);
 request.setMaxShardsPerNode(maxShardsPerNode==0?replicationFactor:maxShardsPerNode);
 CollectionAdminResponse response = request.process(client);
 
 
 and adding fields to schema, either by curl or by httpclient,  would
 sometimes yield the following error, but the error can be fixed by
 RELOADING the newly created collection once or several times:
 
 INFO  - [{  responseHeader:{status:500,QTime:5},
 errors:[Error reading input String Can't find resource 'schema.xml' in
 classpath or '/configs/collectionName',
 cwd=/export/solr/solr-5.1.0/server],  error:{msg:Can't find
 resource 'schema.xml' in classpath or '/configs/collectionName',
 cwd=/export/solr/solr-5.1.0/server,trace:java.io.IOException: Can't
 find resource 'schema.xml' in classpath or '/configs/collectionName',
 cwd=/export/solr/solr-5.1.0/server
 
 at
 org.apache.solr.cloud.ZkSolrResourceLoader.openResource(ZkSolrResourceLoader.java:98)
 at
 org.apache.solr.schema.SchemaManager.getFreshManagedSchema(SchemaManager.java:421)
 at org.apache.solr.schema.SchemaManager.doOperations(SchemaManager.java:104)
 at
 org.apache.solr.schema.SchemaManager.performOperations(SchemaManager.java:94)
 at
 org.apache.solr.handler.SchemaHandler.handleRequestBody(SchemaHandler.java:57)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
 at
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
 at
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
 at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:745)\n,code:500}}]



Re: Solr custom component issue

2015-05-11 Thread Upayavira
attaching them to each request, then just add qf= as a param to the URL,
easy.

On Mon, May 11, 2015, at 12:17 PM, nutchsolruser wrote:
 These boosting parameters will be configured outside Solr and there is
 seperate module from which these values get populated , I am reading
 those
 values from external datasource and I want to attach them to each request
 .
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-custom-component-issue-tp4204799p4204832.html
 Sent from the Solr - User mailing list archive at Nabble.com.