Replication

2012-06-05 Thread William Bell
We are using SOLR 1.4, and we are experiencing full index replication
every 15 minutes.

I have checked the solrconfig and it has maxsegments set to 20. It
appears like it is indexing a segment, but replicating the whole
index.

How can I verify it and possibly fix the issue?

-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Hiring multiple Lucene/Solr Search Engineers

2012-06-05 Thread SV
Hi,

We are hiring multiple Lucene/Solr engineers, tech leads, architects based
in Minneapolis - both full time and consulting for developing new search
platform.

Please reach out to me - svamb...@gmail.com

Thanks,
Venkat Ambati
Sr. Manager, Best Buy


Re: Solr, I have perfomance problem for indexing.

2012-06-05 Thread Lance Norskog
Which Solr do you run?

On Tue, Jun 5, 2012 at 8:02 PM, Jack Krupansky  wrote:
> You wrote "3,5000", but is that 35 hundred (3,500) or 35 thousand (35,000)??
>
> Your numbers seem far worse than what many people typically see with Solr
> and DIH.
>
> Is the database running on the same machine?
>
> Check the Solr log file to see if some errors (or warnings) might be
> occurring frequently.
>
> Check the log for the first table from when it starts to when it ends. How
> often is it committing (according to the log)? Does there seem to be any odd
> activity during that period?
>
> -- Jack Krupansky
>
> -Original Message- From: Jihyun Suh
> Sent: Tuesday, June 05, 2012 9:25 PM
> To: solr-user-h...@lucene.apache.org ; solr-user@lucene.apache.org
> Subject: Solr, I have perfomance problem for indexing.
>
>
> I have 128 tables of mysql 5.x and each table have 3,5000 rows.
> When I start dataimport(indexing) in Solr, it takes 5 minutes for one
> table.
> But When Solr indexs 20th table, it takes around 10 minutes for one table.
> And then When it indexs 40th table, it takes around 20 minutes for one
> table.
>
> Solr has some performance problem for too many documents?
> Should I set some configuration?



-- 
Lance Norskog
goks...@gmail.com


Re: Solr, I have perfomance problem for indexing.

2012-06-05 Thread Jack Krupansky

You wrote "3,5000", but is that 35 hundred (3,500) or 35 thousand (35,000)??

Your numbers seem far worse than what many people typically see with Solr 
and DIH.


Is the database running on the same machine?

Check the Solr log file to see if some errors (or warnings) might be 
occurring frequently.


Check the log for the first table from when it starts to when it ends. How 
often is it committing (according to the log)? Does there seem to be any odd 
activity during that period?


-- Jack Krupansky

-Original Message- 
From: Jihyun Suh

Sent: Tuesday, June 05, 2012 9:25 PM
To: solr-user-h...@lucene.apache.org ; solr-user@lucene.apache.org
Subject: Solr, I have perfomance problem for indexing.

I have 128 tables of mysql 5.x and each table have 3,5000 rows.
When I start dataimport(indexing) in Solr, it takes 5 minutes for one
table.
But When Solr indexs 20th table, it takes around 10 minutes for one table.
And then When it indexs 40th table, it takes around 20 minutes for one
table.

Solr has some performance problem for too many documents?
Should I set some configuration? 



Re: index special characters solr

2012-06-05 Thread Jack Krupansky

Thanks. I'm sure someone else will have the same issue at some point.

-- Jack Krupansky

-Original Message- 
From: KPK

Sent: Tuesday, June 05, 2012 9:51 PM
To: solr-user@lucene.apache.org
Subject: Re: index special characters solr

Thanks Jack for your help!
I found my mistake, rather than classifying those special characters as
ALPHA , I classified it as a DIGIT. Also I missed the same entry for search
analyzer. So probably that was the reason for not getting relevant results.

I spent a lot of time figuring this out. So I'll paste my code snippet of
schema.xml which was changed for newbies so that they dont waste so much
time in this.
I classified my field as text in which I wanted to search for keywords
including special characters. In fieldType definition modify the filter
class="solr.WordDelimiterFilterFactory"



in BOTH   and  

And make a new characters.txt in the same folder as schema.xml and add the
content :

$ => ALPHA
% => ALPHA


(i wanted $ and % to behave as alphabets so that they could be searched)


Then restart jetty/tomcat

This is how i solved this problem.
Hope this would help someone :)


--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-special-characters-solr-tp3987157p3987891.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: index special characters solr

2012-06-05 Thread KPK
Thanks Jack for your help!
I found my mistake, rather than classifying those special characters as
ALPHA , I classified it as a DIGIT. Also I missed the same entry for search
analyzer. So probably that was the reason for not getting relevant results.

I spent a lot of time figuring this out. So I'll paste my code snippet of
schema.xml which was changed for newbies so that they dont waste so much
time in this.
I classified my field as text in which I wanted to search for keywords
including special characters. In fieldType definition modify the filter
class="solr.WordDelimiterFilterFactory"

 
 
in BOTH   and  

And make a new characters.txt in the same folder as schema.xml and add the
content :

 $ => ALPHA
 % => ALPHA


(i wanted $ and % to behave as alphabets so that they could be searched)


Then restart jetty/tomcat

This is how i solved this problem.
Hope this would help someone :)


--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-special-characters-solr-tp3987157p3987891.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr, I have perfomance problem for indexing.

2012-06-05 Thread Jihyun Suh
I have 128 tables of mysql 5.x and each table have 3,5000 rows.
When I start dataimport(indexing) in Solr, it takes 5 minutes for one
table.
But When Solr indexs 20th table, it takes around 10 minutes for one table.
And then When it indexs 40th table, it takes around 20 minutes for one
table.

Solr has some performance problem for too many documents?
Should I set some configuration?


Re: I got ERROR, Unable to execute query

2012-06-05 Thread Jihyun Suh
I used 3.x mysql.
After I migrate to 5.x mysql, I don't get same error just like ' Unable to
execute query'.
Maybe low version of mysql and Solr have some problems, I don't know
exactly.


2012/6/5 Jihyun Suh 

> That's why I made a new DB for dataimport test. So my tables have no
> access or activity.
> Those are just dormant ones.
>
>
> --
>
> My current suspicion is that there is activity in that table that is
> preventing DIH access. I mean, like maybe the table is being updated when
> DIH is failing. Maybe somebody is emptying the table and then regenerating
> it and your DIH run is catching the table when it is being emptied. Or
> something like that.
>
> -- Jack Krupansky
>
>
> 2012/6/4 Jihyun Suh 
>
>> I read your answer. Thank you.
>>
>> But I don't get that error from same table. This time I get error from
>> test_5. but when I try to dataimport again, I can index test_5, but from
>> test_7 I get that error.
>>
>> I don't know the reason. Could you help me?
>>
>>
>> --
>>
>> Is test_5 created by a stored procedure? If so, is there a possibility
>> that
>> the stored procedure may have done an update and not returned data - but
>> just sometimes?
>>
>> -- Jack Krupansky
>>
>>
>> 2012/6/2 Jihyun Suh 
>>
>>> I use many tables for indexing.
>>>
>>> During dataimport, I get errors for some tables like "Unable to execute
>>> query". But next time, when I try to dataimport for that table, I can do
>>> successfully without any error.
>>>
>>> [Thread-17] ERROR o.a.s.h.d.EntityProcessorWrapper - Exception in
>>> entity :
>>> test_5:org.apache.solr.handler.dataimport.DataImportHandlerException:
>>> Unable to execute query:
>>> SELECT Title, url, synonym, description FROM test_5 WHERE status in
>>> ('1','s') Processing Document # 11046
>>>
>>> at
>>> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
>>> at
>>> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253)
>>> at
>>> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
>>> at
>>> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
>>> at
>>> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
>>> at
>>> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
>>> at
>>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
>>> at
>>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596)
>>> at
>>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
>>> at
>>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
>>> at
>>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
>>> at
>>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
>>> at
>>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
>>>
>>> I use many tables for indexing.
>>>
>>> During dataimport, I get errors for some tables like "Unable to execute
>>> query". But next time, when I try to dataimport for that table, I can do
>>> successfully without any error.
>>>
>>> [Thread-17] ERROR o.a.s.h.d.EntityProcessorWrapper - Exception in
>>> entity :
>>> test_5:org.apache.solr.handler.dataimport.DataImportHandlerException:
>>> Unable to execute query:
>>> SELECT Title, url, synonym, description FROM test_5 WHERE status in
>>> ('1','s') Processing Document # 11046
>>>
>>> at
>>> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
>>> at
>>> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253)
>>> at
>>> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
>>> at
>>> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
>>> at
>>> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
>>> at
>>> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
>>> at
>>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
>>> at
>>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596)
>>> at
>>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
>>> at
>>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
>>> at
>>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
>>> at
>>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)

Re: index special characters solr

2012-06-05 Thread KPK
Thanks for your reply!

I tried using the types field in WordDelimiterFilterFactory wherein I was
passing a text file which contained % $ as alphabets. But even then it didnt
get indexed and neither did it show up in search results.
Am I missing something?

Thanks,
Kushal

--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-special-characters-solr-tp3987157p3987888.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr instances: many singles vs multi-core

2012-06-05 Thread Jack Krupansky
It probably can work out reasonably well in both scenarios, but you do get 
some additional flexibility with multiple Tomcat instances:


1. Any "per-instance" Tomcat limits become per-core rather than for all 
cores on that machine.

2. If you have to restart Tomcat, only a single shard is impacted.
3. There are probably a fair number of little details that work better and 
with more parallelism if each Solr core is a separate JVM. E.g. 
BooleanQuery.maxTerms is across the whole JVM; PDFBox for Tika in SolrCell 
can have threads blocked due to a shared resource that is shared across 
cores in the JVM (was an issue - not sure if still an issue). But of course 
your usage may not run into any of them. It will depend a lot as well on how 
many CPU "cores" you have.


-- Jack Krupansky

-Original Message- 
From: Christian von Wendt-Jensen

Sent: Tuesday, June 05, 2012 7:22 AM
To: solr-user@lucene.apache.org
Subject: Solr instances: many singles vs multi-core

Hi,

I'm runing a cluster of Solr serveres for an index split up in a lot of 
shards. Each shard is replicated. Current setup is one Tomcat instance per 
shard, even if the Tomcats are running on the same machine.


My question is this:

Would it be more advisable to run one Tomcat per machine with all the shards 
as cores, or is the current setup the best, where each shard is running in 
its own Tomcat.


As I see it, i would think that One Tomcat running multiple cores is better 
as it reduces the overhead of having many Tomcat instances, and it there is 
the possibility to let the cores share all available memory after how much 
they actually need. In the One Shard/One Tomcat scenario, each instance must 
have it predefined memory settings wether or not it needs more or less.


Any opinions on the matter?



Med venlig hilsen / Best Regards

Christian von Wendt-Jensen



Re: using Tika (ExtractingRequestHandler)

2012-06-05 Thread Jack Krupansky

Hoss,

In your edit, I noticed that the wiki makes "SolrPlugin" a link, but to a 
nonexistent page, although the page "SolrPlugins" does exist.


See: "it is provided as a SolrPlugin,"
http://wiki.apache.org/solr/ExtractingRequestHandler

I also noticed a few other things:

1. Reference to the "/site" directory that does not exist. So, the statement 
"Note, the /site directory in the solr download contains some nice example 
docs to try" is not terribly useful.

2. The path to tutorial.html should be "../../docs/api/doc-files"
3. There is no tutorial.pdf file as referenced in the curl examples.

-- Jack Krupansky

-Original Message- 
From: Chris Hostetter

Sent: Tuesday, June 05, 2012 6:47 PM
To: solr-user@lucene.apache.org
Subject: Re: using Tika (ExtractingRequestHandler)


I've updated the wiki to try and fill in some of these holes...

http://wiki.apache.org/solr/ExtractingRequestHandler

: i'm looking at using Tika to index a bunch of documents. the wiki page 
seems to be a little bit out of date ("// TODO: this is out of date as of 
Solr 1.4 - dist/apache-solr-cell-1.4.jar and all of contrib/extraction/lib 
are needed") and it also looks a little incomplete.

:
: is there an actual list of all the required jar files? i'm not sure they 
are in the same place in the 3.6.0 distribution as they were in 1.4, and 
having an actual list would be very helpful in figuring out where they are.

:
: as for "Sending Documents to Solr", is there any plan to address this 
todo: "// TODO: describe the different ways to send the documents to solr 
(POST body, form encoded, remoteStreaming)". this is really just a nice to 
have, i can see how to accomplish my goals using a method that is currently 
documented.

:
: thanks,
:richard
:

-Hoss 



Re: TermComponent and Optimize

2012-06-05 Thread Chris Hostetter

: It seems that TermComponent is looking at all versions of documents in the 
index.
: 
: Does this is the expected behavior for TermComponent? Any suggestion about 
how to solve this?

Yes...

http://wiki.apache.org/solr/TermsComponent
"The doc frequencies returned are the number of documents that match the 
term, including any documents that have been marked for deletion but not 
yet removed from the index."

If you delete/replace a document in the index, it still contributes to 
the doc freq for that term until the "deletion" is expunged (either 
because of a natural segment merge, or forced merging due to optimize)

The reason TermsComponent is so fast, is because it only looks at the raw 
terms, if you want to "fix" the counts to represent visible documents, you 
have to use something like faceting, which will be slower becuase it 
checks the actual (live) document counts.


-Hoss


Re: Correct way to deal with source data that may include a multivalued field that needs to be used for sorting?

2012-06-05 Thread Chris Hostetter

: The real issue here is that the docs are created externally, and the
: producer won't (yet) guarantee that fields that should appear once will
: actually appear once. Because of this, I don't want to declare the field as
: multiValued="false" as I don't want to cause indexing errors. It would be
: great for me (and apparently many others after searching) if there were an
: option as simple as forceSingleValued="true" - where some deterministic
: behavior such as "use first field encountered, ignore all others", would
: occur.

This will be trivial in Solr 4.0, using one of the new 
"FieldValueSubsetUpdateProcessorFactory" classes that are now available -- 
just pick your rule... 

https://builds.apache.org/view/G-L/view/Lucene/job/Solr-trunk/javadoc/org/apache/solr/update/processor/FieldValueSubsetUpdateProcessorFactory.html
Direct Known Subclasses:
FirstFieldValueUpdateProcessorFactory, 
LastFieldValueUpdateProcessorFactory, 
MaxFieldValueUpdateProcessorFactory, 
MinFieldValueUpdateProcessorFactory 

-Hoss


Re: using Tika (ExtractingRequestHandler)

2012-06-05 Thread Chris Hostetter

I've updated the wiki to try and fill in some of these holes...

http://wiki.apache.org/solr/ExtractingRequestHandler

: i'm looking at using Tika to index a bunch of documents. the wiki page seems 
to be a little bit out of date ("// TODO: this is out of date as of Solr 1.4 - 
dist/apache-solr-cell-1.4.jar and all of contrib/extraction/lib are needed") 
and it also looks a little incomplete.
: 
: is there an actual list of all the required jar files? i'm not sure they are 
in the same place in the 3.6.0 distribution as they were in 1.4, and having an 
actual list would be very helpful in figuring out where they are.
: 
: as for "Sending Documents to Solr", is there any plan to address this todo: 
"// TODO: describe the different ways to send the documents to solr (POST body, 
form encoded, remoteStreaming)". this is really just a nice to have, i can see 
how to accomplish my goals using a method that is currently documented.
: 
: thanks,
:richard
: 

-Hoss


Re: Solr 4.0 Clean Commit for production use

2012-06-05 Thread Chris Hostetter

: The Nightly Build wiki still says it is "4.x" even though it is now 5.x.
: See:
: https://wiki.apache.org/solr/NightlyBuilds
: 
: AFAIK, there isn't a 4.x nightly build running. (Is that going to happen
: soon??)

Yes...

http://mail-archives.apache.org/mod_mbox/lucene-dev/201205.mbox/%3c3fd307e7-7cd2-4042-8ba7-8a4561dbf...@email.android.com%3E


-Hoss


Re: Solr 4.0 Clean Commit for production use

2012-06-05 Thread Jack Krupansky

The Nightly Build wiki still says it is "4.x" even though it is now 5.x.
See:
https://wiki.apache.org/solr/NightlyBuilds

AFAIK, there isn't a 4.x nightly build running. (Is that going to happen 
soon??)


You can checkout the repo for the 4x branch:
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x

My (limited) understanding is that 4.x can read and write 3.x indexes, but 
any new/modified indexes will be incompatable with 3.x. And you have to be 
careful upgrading master/slave configurations, as noted in CHANGES.txt.


-- Jack Krupansky

-Original Message- 
From: Chris Hostetter

Sent: Tuesday, June 05, 2012 5:37 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.0 Clean Commit for production use


: Hey guys, I am trying to upgrade to Solr 4.0. Do you know where I get a 
clean


Clarification: 4.0 does not exist yet.  What does exist is the 4x branch,
from which you can build snapshots that should be very similar to what
will eventually be released as 4.0.

: http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/
: and it looks like they have migrated to 5.0. From the link below it looks

Correct, a 4x branch has been created off of trunk in anticipation of the
4.0 release process, so that more agressive experimental work beyond hte
scope of 4.0 can continue on trunk.

I've update the wiki to try and outline this based on the discussion from
previous dev@lucene threads...

https://wiki.apache.org/solr/Solr4.0

: My second question would be: Are there any known compatibility
: issues/restrictions with previous versions of Lucene? (I just want to make
: sure I can still use my data indexed with previous Solr/Lucene versions).

The best thing to do is review the "Upgrade" instructions in CHANGES.txt,
however those instructions hsould not be consdiered "Final" untill the
final release is voted on -- there may be mistakes/ommissions, but the
best way to help find those mistakes/ommisions is for users to help try
out nightly builds and point them out when you notice them.



-Hoss 



Re: Solr 4.0 Clean Commit for production use

2012-06-05 Thread Chris Hostetter

: Hey guys, I am trying to upgrade to Solr 4.0. Do you know where I get a clean

Clarification: 4.0 does not exist yet.  What does exist is the 4x branch, 
from which you can build snapshots that should be very similar to what 
will eventually be released as 4.0.

: http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/ 
: and it looks like they have migrated to 5.0. From the link below it looks

Correct, a 4x branch has been created off of trunk in anticipation of the 
4.0 release process, so that more agressive experimental work beyond hte 
scope of 4.0 can continue on trunk.

I've update the wiki to try and outline this based on the discussion from 
previous dev@lucene threads...

https://wiki.apache.org/solr/Solr4.0

: My second question would be: Are there any known compatibility
: issues/restrictions with previous versions of Lucene? (I just want to make
: sure I can still use my data indexed with previous Solr/Lucene versions). 

The best thing to do is review the "Upgrade" instructions in CHANGES.txt, 
however those instructions hsould not be consdiered "Final" untill the 
final release is voted on -- there may be mistakes/ommissions, but the 
best way to help find those mistakes/ommisions is for users to help try 
out nightly builds and point them out when you notice them.



-Hoss


Solr 4.0 Clean Commit for production use

2012-06-05 Thread TheNova
Hey guys, I am trying to upgrade to Solr 4.0. Do you know where I get a clean
4.0 commit for production use?
I did an SVN checkout from
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/ 
and it looks like they have migrated to 5.0. From the link below it looks
like that happened by the end of May.
http://svn.apache.org/viewvc/lucene/dev/trunk/solr/

My second question would be: Are there any known compatibility
issues/restrictions with previous versions of Lucene? (I just want to make
sure I can still use my data indexed with previous Solr/Lucene versions). 

Thanks.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-Clean-Commit-for-production-use-tp3987852.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search timeout for Solrcloud

2012-06-05 Thread arin_g
for example when we set the start parameter to 1000, 2000 or higher (page
100, 200 ...), it takes very long (20, 30 seconds, sometimes even 100
seconds).
this usually happens when there is a big gap between pages, mostly hit by
web crawlers (when they crawl the last page link on our website).


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-timeout-for-Solrcloud-tp3987716p3987834.html
Sent from the Solr - User mailing list archive at Nabble.com.


Is FileFloatSource's WeakHashMap cache only cleaned by GC?

2012-06-05 Thread Gregg Donovan
We've encountered GC spikes at Etsy after adding new
ExternalFileFields a decent number of times. I was always a little
confused by this behavior -- isn't it just one big float[]? why does
that cause problems for the GC? -- but looking at the FileFloatSource
code a little more carefully, I wonder if this is due to using a
WeakHashMap that is only cleaned by GC or manual invocation of a
request handler.

FileFloatSource stores a WeakHashMap containing >. In the code[1], it mentions that the
implementation is modeled after the FieldCache implementation.
However, the FieldCacheImpl adds listeners for IndexReader close
events and uses those to purge its caches. [2] Should we be doing the
same in FileFloatSource?

Here's a mostly untested patch[3] with a possible implementation.
There are probably better ways to do it (e.g. I don't love using
another WeakHashMap), but I found it tough to hook into the
IndexReader lifecycle without a) relying on classes other than
FileFloatSource b) changing the public API of FIleFloatSource or c)
changing the implementation too much.

There is a RequestHandler inside of FileFloatSource
(ReloadCacheRequestHandler) that can be used to clear the cache
entirely[4], but this is sub-optimal for us for a few reasons:

--It clears the entire cache. ExternalFileFields often take some
non-trivial time to load and we prefer to do so during SolrCore
warmups. Clearing the entire cache while serving traffic would likely
cause user-facing requests to timeout.
--It forces an extra commit with its consequent cache cycling, etc..

I'm thinking of ways to monitor the size of FileFloatSource's cache to
track its size against GC pause times, but it seems tricky because
even calling WeakHashMap#size() has side-effects. Any ideas?

Overall, what do you think? Does relying on GC to clean this cache
make sense as a possible cause of GC spikiness? If so, does the patch
[3] look like a decent approach?

Thanks!

--Gregg

[1] https://github.com/apache/lucene-solr/blob/a3914cb5c0243913b827762db2d616ad7cc6801d/solr/core/src/java/org/apache/solr/search/function/FileFloatSource.java#L135
[2] https://github.com/apache/lucene-solr/blob/1c0eee5c5cdfddcc715369dad9d35c81027bddca/lucene/core/src/java/org/apache/lucene/search/FieldCacheImpl.java#L166
[3] https://gist.github.com/2876371
[4] https://github.com/apache/lucene-solr/blob/a3914cb5c0243913b827762db2d616ad7cc6801d/solr/core/src/java/org/apache/solr/search/function/FileFloatSource.java#L310


Boost by Nested Query / Join Needed?

2012-06-05 Thread naleiden
Hi,

First off, I'm about a week into all things Solr, and still trying to figure
out how to fit my relational-shaped peg through a denormalized hole. Please
forgive my ignorance below :-D

I have the need store a One-to-N type relationship, and perform a boost a
related field.

Let's say I want to index a number of different types of candy, and also a
customer's preference for each type of candy (which I index/update when a
customer makes a purchase), and then boost by that preference on search.

Here is my paired-down attempt at a denormalized schema:





 








I am indexing 'candy' and 'preferences' separately, and when indexing one, I
leave the fields of the other empty (with the exception of the required 'id'
and 'type').

Ignoring the query score, this is effectively what I'm looking to do in SQL:

SELECT candy.id, candy.name, candy.description FROM candy
LEFT JOIN preference ON (preference.candy = candy.id AND preference.customer
= 'someCustomerID')
// Where some match is made on query against candy.name or candy.description
ORDER BY preference.weight DESC

My questions are:

1.) Am I making any assumptions with respect to what are effectively
different document types in the schema that will not scale well? I don't
think I want to be duplicating each 'candy' entry for every customer, or
maybe that wouldn't be such a big deal in Solr.

2.) Can someone point me in the right direction on how to perform this type
of boost in a Solr query?

Thanks in advance,
Nick


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boost-by-Nested-Query-Join-Needed-tp3987818.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search timeout for Solrcloud

2012-06-05 Thread Jack Krupansky
I'm curious... how deep is it that is becoming problematic? Tens of pages, 
hundreds, thousands, millions?


And when you say deep paging, are you incrementing through all pages down to 
the depth or "gapping" to some very large depth outright? If the former, I 
am wondering if the Solr cache is building up with all those previous 
results.


And is it that the time is simply moderately beyond expectations (e.g. 10 or 
30 seconds or a minute compared to 1 second), or... are we talking about a 
situation where a core is terminally "thrashing" with garbage collection/OOM 
issues?


-- Jack Krupansky

-Original Message- 
From: arin_g

Sent: Tuesday, June 05, 2012 1:34 AM
To: solr-user@lucene.apache.org
Subject: Search timeout for Solrcloud

Hi,
We use solrcloud in production, and we are facing some issues with queries
that take very long specially deep paging queries, these queries keep our
servers very busy. i am looking for a way to stop (kill) queries taking
longer than a specific amount of time (say 5 seconds), i checked timeAllowed
but it doesn't work (again query  runs completely). Also i noticed that
there are connTimeout and socketTimeout for distributed searches, but i am
not sure if they kill the thread (i want to save resources by killing the
query, not just returning a timeout). Also, if i could get partial results
that would be ideal. Any suggestions?

Thanks,
arin

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-timeout-for-Solrcloud-tp3987716.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Is it faster to search over many different fields or one field that combines the values of all those other fields?

2012-06-05 Thread Gora Mohanty
On 5 June 2012 22:05, Mikhail Khludnev  wrote:
> IRC, Lucene in Action book loops around this point almost every chapter:
> multifield query is faster.
[...]

Surely this is dependent on the type, and volume of one's
data? As with many issues, isn't the answer that "it depends",
i.e., one should prototype, and have objective measures on
one's own data-sets.

Would love to be educated otherwise.

Regards,
Gora

P.S. Have to get that book.


Re: Is it faster to search over many different fields or one field that combines the values of all those other fields?

2012-06-05 Thread Mikhail Khludnev
IRC, Lucene in Action book loops around this point almost every chapter:
multifield query is faster.

On Tue, Jun 5, 2012 at 7:04 PM, Jack Krupansky wrote:

> There may be a raw performance advantage to having all values in a single
> combined field, but then you loose the opportunity to boost title and tag
> field hits.
>
> With the extended dismax query parser you have the ability to specify the
> field list in the "qf" request parameter so that the query can simply be
> the keywords and operators without all of the extra "OR" operators. qf also
> lets you specify the boost for each field.
>
> -- Jack Krupansky
>
> -Original Message- From: santamaria2
> Sent: Tuesday, June 05, 2012 8:50 AM
> To: solr-user@lucene.apache.org
> Subject: Is it faster to search over many different fields or one field
> that combines the values of all those other fields?
>
>
> Say I have various categories of 'tags'. I want a keyword search to search
> through my index of articles. So I search over:
> 1) the title.
> 2) the body
> 3) about 10 of these tag-categories. Each tag category is multivalued with
> a
> few words per value.
>
> Without considering the affect on 'relevance', and using the standard
> lucene
> query parser, would it be faster to specify each of these 10 fields in q (q
> = cat1:keyword OR cat2:keyword OR ... ), or to copyfield the stuff in those
> 10 fields into one combined field?
>
> Or is it such that I should be slapped in the face for even thinking about
> performance in this scenario?
>
> --
> View this message in context: http://lucene.472066.n3.**
> nabble.com/Is-it-faster-to-**search-over-many-different-**
> fields-or-one-field-that-**combines-the-values-of-all-**
> those-tp3987766.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics


 


Re: Correct way to deal with source data that may include a multivalued field that needs to be used for sorting?

2012-06-05 Thread Aaron Daubman
Thanks for the responses,

By saying "dirty data" you imply that only one of the values is "good" or
> "clean" and that the others can be safely discarded/ignored, as opposed to
> true multi-valued data where each value is there for good reason and needs
> to be preserved. In any case, how do you know/decide which value should be
> used for sorting - and did you just get lucky that Solr happened to use the
> right one?
>

I haven't gone back and checked the old version's docs where this was
"working", however, I suspect that either the field never ended up
appearing in docs more than once, or if it did, it had the same value
repeated...

The real issue here is that the docs are created externally, and the
producer won't (yet) guarantee that fields that should appear once will
actually appear once. Because of this, I don't want to declare the field as
multiValued="false" as I don't want to cause indexing errors. It would be
great for me (and apparently many others after searching) if there were an
option as simple as forceSingleValued="true" - where some deterministic
behavior such as "use first field encountered, ignore all others", would
occur.


The preferred technique would be the preprocess and "clean" the data before
> it is handed to Solr or SolrJ, even if the source must remain "dirty".
> Baring that a preprocessor or a custom update processor certainly.
>

I could write preprocessors (this is really what will eventually happen
when the producer cleans their data),  custom processors, etc... however,
for something this simple it would be great not to be producing more code
that would have to be maintained.



> Please clarify exactly how the data is being fed into Solr.
>

 I am using "generic" code to read from a key/value store and compose
documents. This is another reason fixing the data at this point would not
be desirable, the currently generic code would need to be made specific to
look for these particular fields and then coerce them to single values...

Thanks again,
  Aaron


Re: London OSS search social - meetup 6th June

2012-06-05 Thread Richard Marr
Quick reminder, we're meeting at The Plough in Bloomsbury tomorrow night. 
Details and RSVP on the meetup page:

http://www.meetup.com/london-search-social/events/65873032/

--
Richard Marr

On 3 Jun 2012, at 00:29, Richard Marr  wrote:

> 
> Apologies for the short notice guys, we're meeting up at The Plough in 
> Bloomsbury on Wednesday 6th June.
> 
> As usual the format is open and there's a healthy mix of experience and 
> backgrounds. Please come and share wisdom, ask questions, geek out, etc. in 
> the presence of beverages.
> 
> -- 
> Richard Marr


Re: Is it faster to search over many different fields or one field that combines the values of all those other fields?

2012-06-05 Thread Jack Krupansky
There may be a raw performance advantage to having all values in a single 
combined field, but then you loose the opportunity to boost title and tag 
field hits.


With the extended dismax query parser you have the ability to specify the 
field list in the "qf" request parameter so that the query can simply be the 
keywords and operators without all of the extra "OR" operators. qf also lets 
you specify the boost for each field.


-- Jack Krupansky

-Original Message- 
From: santamaria2

Sent: Tuesday, June 05, 2012 8:50 AM
To: solr-user@lucene.apache.org
Subject: Is it faster to search over many different fields or one field that 
combines the values of all those other fields?


Say I have various categories of 'tags'. I want a keyword search to search
through my index of articles. So I search over:
1) the title.
2) the body
3) about 10 of these tag-categories. Each tag category is multivalued with a
few words per value.

Without considering the affect on 'relevance', and using the standard lucene
query parser, would it be faster to specify each of these 10 fields in q (q
= cat1:keyword OR cat2:keyword OR ... ), or to copyfield the stuff in those
10 fields into one combined field?

Or is it such that I should be slapped in the face for even thinking about
performance in this scenario?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-faster-to-search-over-many-different-fields-or-one-field-that-combines-the-values-of-all-those-tp3987766.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Can't index sub-entitties in DIH

2012-06-05 Thread Gora Mohanty
On 5 June 2012 20:08, Rafael Taboada  wrote:
> Hi Gora,
>
> Yes, I restart Solr for each change I do.
>
> Thanks for your help...
>
> An small question Is DIH work well with Oracle database? Using all the
> features It can do?

Unfortunately, I have never used DIH with Oracle. However,
this should be a simple enough use case that it should just
work. I think that we must be missing something obvious.

For the sub-entity with Oracle case, what message do you
get when the data-import concludes? Is the number of
indexed documents correct? Are there any relevant
messages in the Solr log files?

Regards,
Gora


Re: Can't index sub-entitties in DIH

2012-06-05 Thread Gora Mohanty
On 5 June 2012 20:05, Rafael Taboada  wrote:
> Hi James.
>
> Thanks for your advice.
>
> As I said, alias works for me. I use joins instead of sub-entities...
> Heavily...
> These config files work for me...
[...]

How about NULL values in the column that
you are doing a left outer join on? Cannot
test this right now, but I believe that a left
outer join behaves differently from a DIH
entity/sub-entity when it comes to NULLs.

Regards,
Gora


Re: Can't index sub-entitties in DIH

2012-06-05 Thread Rafael Taboada
Hi Gora,

Yes, I restart Solr for each change I do.

Thanks for your help...

An small question Is DIH work well with Oracle database? Using all the
features It can do?


On Tue, Jun 5, 2012 at 9:32 AM, Gora Mohanty  wrote:

> Hi,
>
> Sorry, I am stumped, and cannot help further without
> access to Oracle. Please disregard the bit about the
> quotes: I was reading a single quote followed by a
> double quote as three single quotes. There was no
> issue there.
>
> Since your configurations for Oracle, and mysql are
> different, are you using different Solr cores/instances,
> or making sure to restart Solr in between configuration
> changes?
>
> Regards,
> Gora
>



-- 
Rafael Taboada

/*
 * Phone >> 992 741 026
 */


Re: filtering number and repeated contents

2012-06-05 Thread Jack Krupansky
My (very limited) understanding of "boilerpipe" in Tika is that it strips 
out "short text", which is great for all the menu and navigation text, but 
the typical disclaimer at the bottom of an email is not very short and 
frequently can be longer than the email message body itself. You may have to 
resort to a custom update processor that is programmed with some disclaimer 
signature text strings to be removed from field values.


-- Jack Krupansky

-Original Message- 
From: Mark , N

Sent: Tuesday, June 05, 2012 8:28 AM
To: solr-user@lucene.apache.org
Subject: filtering number and repeated contents

Is it possible to filter out numbers and disclaimer ( repeated contents)
while indexing to SOLR?
These are all surplus information and do not want to index it

I have tried using boilerpipe algorithm as well to remove surplus
infromation from web pages such as navigational elements, templates, and
advertisements , I think it works well but looking forward to see If I
could filter out  "disclaimer" information too mainly in email texts.
--
Thanks,

*Nipen Mark * 



Re: Can't index sub-entitties in DIH

2012-06-05 Thread Rafael Taboada
Hi James.

Thanks for your advice.

As I said, alias works for me. I use joins instead of sub-entities...
Heavily...
These config files work for me...

db-data-config.xml

   
   
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
   


schema.xml


   
  
  
  
  
 



 
 




 
  
   

   
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
   

   
   iddocumento

   
   nroexpediente

   
   



solrconfig.xml


  LUCENE_36
  
  


  
  

  

  

  
  
  
  

db-data-config.xml

  
  
  
solr
  





On Tue, Jun 5, 2012 at 9:22 AM, Dyer, James wrote:

> I sucessfully use Oracle with DIH although none of my imports have
> sub-entities.  (slight difference, I'm on ojdbc5.jar w/10g...).  It may be
> you have a driver that doesn't play well with DIH in some cases.  You might
> want to try these possible workarounds:
>
> - rename the columns in SELECT with "AS" clauses.
> - in cases the columns are the same in SELECT as what you have in
> schema.xml, omit the  tags  (see
> http://wiki.apache.org/solr/DataImportHandler#A_shorter_data-config)
>
> These are shot-in-the-dark guesses.  I wouldn't expect this to matter but
> you might as well try it.
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Rafael Taboada [mailto:kaliman.fore...@gmail.com]
> Sent: Tuesday, June 05, 2012 8:58 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Can't index sub-entitties in DIH
>
> Hi Gora,
>
>
> > Your configuration files look fine. It would seem that something
> > is going wrong with the SELECT in Oracle, or with the JDBC
> > driver used to access Oracle. Could you try:
>
> * Manually doing the SELECT for the entity, and sub-entity
> >  to ensure that things are working.
> >
>
> The SELECTs are working OK.
>
>
>
> > * Check the JDBC settings.
> >
>
> I'm using tha last version of jdbc6.jar for Oracle 11g. It seems JDBC
> setting is OK because solr brings data.
>
>
>
> > Sorry, I do not have access to Oracle so that I cannot try this
> > out myself.
> >
> > Also, have you checked the Solr logs for any error messages?
> > Finally, I just noticed that you have extra quotes in:
> > ...where usuario_idusuario = '${usuario.idusuario}'"
> > I doubt that is the cause of your problem, but you could try
> > removing them.
> >
>
> If I remove quotes, there is an error about this:
>
> SEVERE: Full Import failed:java.lang.RuntimeException:
> java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
> execute query: SELECT nombre FROM tipodocumento WHERE idtipodocumento =
>  Processing Document # 1
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:264)
> at
>
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375)
> at
>
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445)
> at
>
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426)
> Caused by: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
> execute query: SELECT nombre FROM tipodocumento WHERE idtipodocumento =
>  Processing Document # 1
> at
>
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:621)
> at
>
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327)
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225)
> ... 3 more
> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
> Unable to execute query: SELECT nombre FROM tipodocumento WHERE
> idtipodocumento =  Processing Document # 1
> at
>
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
> at
>
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253)
> at
>
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
> at
>
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
> at
>
> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
> at
>
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
> at
>
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(EntityProcessorWrapper.java:330)
> at
>
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:296)
> at
>
> or

Re: Can't index sub-entitties in DIH

2012-06-05 Thread Gora Mohanty
Hi,

Sorry, I am stumped, and cannot help further without
access to Oracle. Please disregard the bit about the
quotes: I was reading a single quote followed by a
double quote as three single quotes. There was no
issue there.

Since your configurations for Oracle, and mysql are
different, are you using different Solr cores/instances,
or making sure to restart Solr in between configuration
changes?

Regards,
Gora


Re: Is it faster to search over many different fields or one field that combines the values of all those other fields?

2012-06-05 Thread Michael Della Bitta
I don't have the answer to your question, but I certainly don't think
anybody should be slapped in the face for asking a question!

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Tue, Jun 5, 2012 at 8:50 AM, santamaria2  wrote:
> Say I have various categories of 'tags'. I want a keyword search to search
> through my index of articles. So I search over:
> 1) the title.
> 2) the body
> 3) about 10 of these tag-categories. Each tag category is multivalued with a
> few words per value.
>
> Without considering the affect on 'relevance', and using the standard lucene
> query parser, would it be faster to specify each of these 10 fields in q (q
> = cat1:keyword OR cat2:keyword OR ... ), or to copyfield the stuff in those
> 10 fields into one combined field?
>
> Or is it such that I should be slapped in the face for even thinking about
> performance in this scenario?
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Is-it-faster-to-search-over-many-different-fields-or-one-field-that-combines-the-values-of-all-those-tp3987766.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Can't index sub-entitties in DIH

2012-06-05 Thread Rahul Warawdekar
Hi,

One of the possibilities for this kind of issue to occur may be the case
sensitivity of column names in Oracle.
Can you apply a transformer and check the entity map which actually
contains the keys and their values ?
Also, please try specifying upper case field names for Oracle and try if
that works.
something like



 

On Tue, Jun 5, 2012 at 9:57 AM, Rafael Taboada wrote:

> Hi Gora,
>
>
> > Your configuration files look fine. It would seem that something
> > is going wrong with the SELECT in Oracle, or with the JDBC
> > driver used to access Oracle. Could you try:
>
> * Manually doing the SELECT for the entity, and sub-entity
> >  to ensure that things are working.
> >
>
> The SELECTs are working OK.
>
>
>
> > * Check the JDBC settings.
> >
>
> I'm using tha last version of jdbc6.jar for Oracle 11g. It seems JDBC
> setting is OK because solr brings data.
>
>
>
> > Sorry, I do not have access to Oracle so that I cannot try this
> > out myself.
> >
> > Also, have you checked the Solr logs for any error messages?
> > Finally, I just noticed that you have extra quotes in:
> > ...where usuario_idusuario = '${usuario.idusuario}'"
> > I doubt that is the cause of your problem, but you could try
> > removing them.
> >
>
> If I remove quotes, there is an error about this:
>
> SEVERE: Full Import failed:java.lang.RuntimeException:
> java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
> execute query: SELECT nombre FROM tipodocumento WHERE idtipodocumento =
>  Processing Document # 1
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:264)
> at
>
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375)
> at
>
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445)
> at
>
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426)
> Caused by: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
> execute query: SELECT nombre FROM tipodocumento WHERE idtipodocumento =
>  Processing Document # 1
> at
>
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:621)
> at
>
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327)
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225)
> ... 3 more
> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
> Unable to execute query: SELECT nombre FROM tipodocumento WHERE
> idtipodocumento =  Processing Document # 1
> at
>
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
> at
>
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253)
> at
>
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
> at
>
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
> at
>
> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
> at
>
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
> at
>
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(EntityProcessorWrapper.java:330)
> at
>
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:296)
> at
>
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:683)
> at
>
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:709)
> at
>
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619)
> ... 5 more
> Caused by: java.sql.SQLSyntaxErrorException: ORA-00936: missing expression
>
> at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)
> at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
> at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:879)
> at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:450)
> at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)
> at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)
> at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:193)
> at
> oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:873)
> at
>
> oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1167)
> at
>
> oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1289)
> at
>
> oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.java:1909)
> at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1871)
> at
>
> oracle.jdbc.driver.OracleStatementWrapper.execute(OracleStatementWrapper.java:318)
> at
>
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:246
> My config files using Oracle are:
>
>
> db-data-config.xml
> 
> url="jdbc:oracle:thin:@localhost:1521:solr" user="solr" password="s

Re: score filter

2012-06-05 Thread debdoot
Hello Grant,

I need to frame a query that is a combination of two query parts and I use a
'function' query to prepare the same. Something like:
q={!type=func q.op=AND df=text}product(query($uq,0.0),query($cq,0.1))

where $uq and $cq are two queries.

Now, I want a search result returned only if I get a hit on $uq. So, I
specify default value of $uq query as 0.0 in order for the final score to be
zero in cases where $uq doesn't record a hit. Even though, the scoring works
as expected (i.e, document that don't match $uq have a score of zero), all
the documents are returned as search results. Is there a way to filter
search results that have a score of zero?

Thanks for your help,

Debdoot

--
View this message in context: 
http://lucene.472066.n3.nabble.com/score-filter-tp493438p3987791.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Can't index sub-entitties in DIH

2012-06-05 Thread Dyer, James
I sucessfully use Oracle with DIH although none of my imports have 
sub-entities.  (slight difference, I'm on ojdbc5.jar w/10g...).  It may be you 
have a driver that doesn't play well with DIH in some cases.  You might want to 
try these possible workarounds:

- rename the columns in SELECT with "AS" clauses.
- in cases the columns are the same in SELECT as what you have in schema.xml, 
omit the  tags  (see 
http://wiki.apache.org/solr/DataImportHandler#A_shorter_data-config)

These are shot-in-the-dark guesses.  I wouldn't expect this to matter but you 
might as well try it.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Rafael Taboada [mailto:kaliman.fore...@gmail.com] 
Sent: Tuesday, June 05, 2012 8:58 AM
To: solr-user@lucene.apache.org
Subject: Re: Can't index sub-entitties in DIH

Hi Gora,


> Your configuration files look fine. It would seem that something
> is going wrong with the SELECT in Oracle, or with the JDBC
> driver used to access Oracle. Could you try:

* Manually doing the SELECT for the entity, and sub-entity
>  to ensure that things are working.
>

The SELECTs are working OK.



> * Check the JDBC settings.
>

I'm using tha last version of jdbc6.jar for Oracle 11g. It seems JDBC
setting is OK because solr brings data.



> Sorry, I do not have access to Oracle so that I cannot try this
> out myself.
>
> Also, have you checked the Solr logs for any error messages?
> Finally, I just noticed that you have extra quotes in:
> ...where usuario_idusuario = '${usuario.idusuario}'"
> I doubt that is the cause of your problem, but you could try
> removing them.
>

If I remove quotes, there is an error about this:

SEVERE: Full Import failed:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: SELECT nombre FROM tipodocumento WHERE idtipodocumento =
 Processing Document # 1
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:264)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426)
Caused by: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: SELECT nombre FROM tipodocumento WHERE idtipodocumento =
 Processing Document # 1
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:621)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225)
... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to execute query: SELECT nombre FROM tipodocumento WHERE
idtipodocumento =  Processing Document # 1
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(EntityProcessorWrapper.java:330)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:296)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:683)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:709)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619)
... 5 more
Caused by: java.sql.SQLSyntaxErrorException: ORA-00936: missing expression

at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:879)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:450)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)
at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:193)
at oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:873)
at
oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1167)
at
oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1289)
at
oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.java:1909)
at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1871)
at
oracle.jdbc.driver.OracleStatementWrapper.execute(OracleStatementWrapper.java:318)
at
org.apache.solr.handler.dataim

HypericHQ plugins?

2012-06-05 Thread Paul Libbrecht
Hello SOLR users,

is there someone who wrote plugins for HypericHQ to monitor the very many 
metrics SOLR exposes through JMX?
I am a kind of newbie to JMX and the tutorials of Hyperic aren't simple enough 
to my taste... so I'd be helped if someone did it already.

thanks in advance

Paul

Re: random results at specific slots

2012-06-05 Thread Jack Krupansky
Take a look at "query elevation". It may do exactly want you want, but at a 
minimum, it would show you how this kind of thing can be done.


See:
http://wiki.apache.org/solr/QueryElevationComponent

-- Jack Krupansky

-Original Message- 
From: srinir

Sent: Tuesday, June 05, 2012 3:08 AM
To: solr-user@lucene.apache.org
Subject: random results at specific slots

Hi,

I would like to return results sorted by score (desc), but i would like to
insert random results into some predefined slots (lets say 10, 14 and 18).
The reason I want to do that is I boost click-through rate based features
significantly and i want to give a chance to documents which doesnt have
enough click through rate data. This would help the results stay fresh.

I looked into solr code and it looks like i need a custom QueryComponent
where once the top results are ordered, i can insert some random results at
my predefined slots and then return. I am wondering whether there is any
other way I can achieve the same?

Thanks
Srini

--
View this message in context: 
http://lucene.472066.n3.nabble.com/random-results-at-specific-slots-tp3987719.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Can't index sub-entitties in DIH

2012-06-05 Thread Rafael Taboada
Hi Gora,


> Your configuration files look fine. It would seem that something
> is going wrong with the SELECT in Oracle, or with the JDBC
> driver used to access Oracle. Could you try:

* Manually doing the SELECT for the entity, and sub-entity
>  to ensure that things are working.
>

The SELECTs are working OK.



> * Check the JDBC settings.
>

I'm using tha last version of jdbc6.jar for Oracle 11g. It seems JDBC
setting is OK because solr brings data.



> Sorry, I do not have access to Oracle so that I cannot try this
> out myself.
>
> Also, have you checked the Solr logs for any error messages?
> Finally, I just noticed that you have extra quotes in:
> ...where usuario_idusuario = '${usuario.idusuario}'"
> I doubt that is the cause of your problem, but you could try
> removing them.
>

If I remove quotes, there is an error about this:

SEVERE: Full Import failed:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: SELECT nombre FROM tipodocumento WHERE idtipodocumento =
 Processing Document # 1
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:264)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426)
Caused by: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: SELECT nombre FROM tipodocumento WHERE idtipodocumento =
 Processing Document # 1
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:621)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225)
... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to execute query: SELECT nombre FROM tipodocumento WHERE
idtipodocumento =  Processing Document # 1
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(EntityProcessorWrapper.java:330)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:296)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:683)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:709)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619)
... 5 more
Caused by: java.sql.SQLSyntaxErrorException: ORA-00936: missing expression

at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:879)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:450)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)
at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:193)
at oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:873)
at
oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1167)
at
oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1289)
at
oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.java:1909)
at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1871)
at
oracle.jdbc.driver.OracleStatementWrapper.execute(OracleStatementWrapper.java:318)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:246
My config files using Oracle are:


db-data-config.xml










   





schema.xml


  


 

 
   
   
   
   
   
   
 

 
 iddocumento

 
 nrodocumento

 
 



solrconfig.xml


  LUCENE_36
  
  


  
   
 3
   
  

  

  

  
  
  
  

db-data-config.xml

  
  
  
solr
  



Thanks for your help.

-- 
Rafael Taboada

/*
 * Phone >> 992 741 026
 */


Re: Correct way to deal with source data that may include a multivalued field that needs to be used for sorting?

2012-06-05 Thread Jack Krupansky
By saying "dirty data" you imply that only one of the values is "good" or 
"clean" and that the others can be safely discarded/ignored, as opposed to 
true multi-valued data where each value is there for good reason and needs 
to be preserved. In any case, how do you know/decide which value should be 
used for sorting - and did you just get lucky that Solr happened to use the 
right one?


The preferred technique would be the preprocess and "clean" the data before 
it is handed to Solr or SolrJ, even if the source must remain "dirty". 
Baring that a preprocessor or a custom update processor certainly.


Please clarify exactly how the data is being fed into Solr.

And if you really do need to preserve the multiple values, simply store them 
in a separate field that is not sorted. An update processor can do this as 
well.


-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Tuesday, June 05, 2012 6:34 AM
To: solr-user@lucene.apache.org
Subject: Re: Correct way to deal with source data that may include a 
multivalued field that needs to be used for sorting?


Older versions of Solr didn't really sort correctly on multivalued fields, 
they

just didn't complain .

Hmmm. Off the top of my head, you can:
1> You don't say what the documents to be indexed are. Are they Solr-style
documents on disk or do you process them with, say, a SolrJ program?
If the latter, you can simply inspect them as you construct them and 
decide

which of the multi-valued field values you want to use to sort
and copy that
single value into a new field and sort on that.
2> You could write a custom 
UpdateRequestProcessorFactory/UpdateRequestProcessor

pair and do the same thing in the processAdd method.

Best
Erick

On Mon, Jun 4, 2012 at 10:17 PM, Aaron Daubman  wrote:

Greetings,

I have "dirty" source data where some documents being indexed, although
unlikely, may contain multivalued fields that are also required for
sorting. In previous versions of Solr, sorting on this field worked fine
(possibly because few or no multivalued fields were ever encountered?),
however, as of 3.6.0, thanks to
https://issues.apache.org/jira/browse/SOLR-2339 attempting to sort on this
field now throws an error:

[2012-06-04 17:20:01,691] ERROR org.apache.solr.common.SolrException
org.apache.solr.common.SolrException: can not sort on multivalued field:
f_normalizedValue

The relevant bits of the schema.xml are:



Assuming that the source documents being indexed cannot be changed (which,
at least for now, they cannot), what would be the next best way to allow
for both the possibility of multiple f_normalizedValue fields appearing in
indexed documents, as wel as being able to sort by f_normalizedValue?

Thank you,
Aaron 




Re: Strip html

2012-06-05 Thread Tigunn
I resolve my problem:

I had to specify the field to return with my query.


Thanks A LOT for your help !

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strip-html-tp3987051p398.html
Sent from the Solr - User mailing list archive at Nabble.com.


Is it faster to search over many different fields or one field that combines the values of all those other fields?

2012-06-05 Thread santamaria2
Say I have various categories of 'tags'. I want a keyword search to search
through my index of articles. So I search over:
1) the title.
2) the body
3) about 10 of these tag-categories. Each tag category is multivalued with a
few words per value.

Without considering the affect on 'relevance', and using the standard lucene
query parser, would it be faster to specify each of these 10 fields in q (q
= cat1:keyword OR cat2:keyword OR ... ), or to copyfield the stuff in those
10 fields into one combined field?

Or is it such that I should be slapped in the face for even thinking about
performance in this scenario?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-faster-to-search-over-many-different-fields-or-one-field-that-combines-the-values-of-all-those-tp3987766.html
Sent from the Solr - User mailing list archive at Nabble.com.


filtering number and repeated contents

2012-06-05 Thread Mark , N
Is it possible to filter out numbers and disclaimer ( repeated contents)
while indexing to SOLR?
These are all surplus information and do not want to index it

I have tried using boilerpipe algorithm as well to remove surplus
infromation from web pages such as navigational elements, templates, and
advertisements , I think it works well but looking forward to see If I
could filter out  "disclaimer" information too mainly in email texts.
-- 
Thanks,

*Nipen Mark *


RE: Search timeout for Solrcloud

2012-06-05 Thread Markus Jelsma
There's an open issue for improving deep paging performance:
https://issues.apache.org/jira/browse/SOLR-1726
 
 
-Original message-
> From:arin_g 
> Sent: Tue 05-Jun-2012 12:03
> To: solr-user@lucene.apache.org
> Subject: Search timeout for Solrcloud
> 
> Hi, 
> We use solrcloud in production, and we are facing some issues with queries
> that take very long specially deep paging queries, these queries keep our
> servers very busy. i am looking for a way to stop (kill) queries taking
> longer than a specific amount of time (say 5 seconds), i checked timeAllowed
> but it doesn't work (again query  runs completely). Also i noticed that
> there are connTimeout and socketTimeout for distributed searches, but i am
> not sure if they kill the thread (i want to save resources by killing the
> query, not just returning a timeout). Also, if i could get partial results
> that would be ideal. Any suggestions?
> 
> Thanks,
>  arin
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Search-timeout-for-Solrcloud-tp3987716.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: Search timeout for Solrcloud

2012-06-05 Thread Jason Rutherglen
There isn't a solution for killing long running queries that works.

On Tue, Jun 5, 2012 at 1:34 AM, arin_g  wrote:
> Hi,
> We use solrcloud in production, and we are facing some issues with queries
> that take very long specially deep paging queries, these queries keep our
> servers very busy. i am looking for a way to stop (kill) queries taking
> longer than a specific amount of time (say 5 seconds), i checked timeAllowed
> but it doesn't work (again query  runs completely). Also i noticed that
> there are connTimeout and socketTimeout for distributed searches, but i am
> not sure if they kill the thread (i want to save resources by killing the
> query, not just returning a timeout). Also, if i could get partial results
> that would be ideal. Any suggestions?
>
> Thanks,
>  arin
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Search-timeout-for-Solrcloud-tp3987716.html
> Sent from the Solr - User mailing list archive at Nabble.com.


SolrDispatchFilter, no hits in response NamedList if distrib=true

2012-06-05 Thread Markus Jelsma
Hi,

I'm adding the numFound to the HTTP response header in a custom 
SolrDispatchFilter in the writeResponse() method, similar to the commented code 
in doFilter(). This works just fine but not for distributed requests. I'm 
trying to read "hits" from the SolrQueryResponse but it is not there for 
distrib=true requests. Any idea what i'm doing wrong? 

Thanks,
Markus


RE: maxScore always returned

2012-06-05 Thread Markus Jelsma
Hi.

We set fl in the request handler's default without score.

thanks

 
-Original message-
> From:darul 
> Sent: Tue 05-Jun-2012 12:05
> To: solr-user@lucene.apache.org
> Subject: Re: maxScore always returned
> 
> maybe look into your solrconfig.xml file whether fl not set by default on
> your request handler 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/maxScore-always-returned-tp3987727p3987733.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Solr instances: many singles vs multi-core

2012-06-05 Thread Christian von Wendt-Jensen
Hi,

I'm runing a cluster of Solr serveres for an index split up in a lot of shards. 
Each shard is replicated. Current setup is one Tomcat instance per shard, even 
if the Tomcats are running on the same machine.

My question is this:

Would it be more advisable to run one Tomcat per machine with all the shards as 
cores, or is the current setup the best, where each shard is running in its own 
Tomcat.

As I see it, i would think that One Tomcat running multiple cores is better as 
it reduces the overhead of having many Tomcat instances, and it there is the 
possibility to let the cores share all available memory after how much they 
actually need. In the One Shard/One Tomcat scenario, each instance must have it 
predefined memory settings wether or not it needs more or less.

Any opinions on the matter?



Med venlig hilsen / Best Regards

Christian von Wendt-Jensen




ReadTimeout on commit

2012-06-05 Thread spring
Hi,

I'm indexing documents in batches of 100 docs. Then commit.

Sometimes I get this exception:

org.apache.solr.client.solrj.SolrServerException:
java.net.SocketTimeoutException: Read timed out
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
olrServer.java:475)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
olrServer.java:249)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractU
pdateRequest.java:105)
at
org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:178)


I found some similar postings in the web, all recommending autocommit. This
is unfortunately not an option for me, because I have to know whether solr
committed or not.

What is causing this timeout?

I'm using these settings in solrj:

server.setSoTimeout(1000);
  server.setConnectionTimeout(100);
  server.setDefaultMaxConnectionsPerHost(100);
  server.setMaxTotalConnections(100);
  server.setFollowRedirects(false);
  server.setAllowCompression(true);
  server.setMaxRetries(1);

Thank you



Re: Correct way to deal with source data that may include a multivalued field that needs to be used for sorting?

2012-06-05 Thread Erick Erickson
Older versions of Solr didn't really sort correctly on multivalued fields, they
just didn't complain .

Hmmm. Off the top of my head, you can:
1> You don't say what the documents to be indexed are. Are they Solr-style
 documents on disk or do you process them with, say, a SolrJ program?
 If the latter, you can simply inspect them as you construct them and decide
 which of the multi-valued field values you want to use to sort
and copy that
 single value into a new field and sort on that.
2> You could write a custom UpdateRequestProcessorFactory/UpdateRequestProcessor
 pair and do the same thing in the processAdd method.

Best
Erick

On Mon, Jun 4, 2012 at 10:17 PM, Aaron Daubman  wrote:
> Greetings,
>
> I have "dirty" source data where some documents being indexed, although
> unlikely, may contain multivalued fields that are also required for
> sorting. In previous versions of Solr, sorting on this field worked fine
> (possibly because few or no multivalued fields were ever encountered?),
> however, as of 3.6.0, thanks to
> https://issues.apache.org/jira/browse/SOLR-2339 attempting to sort on this
> field now throws an error:
>
> [2012-06-04 17:20:01,691] ERROR org.apache.solr.common.SolrException
> org.apache.solr.common.SolrException: can not sort on multivalued field:
> f_normalizedValue
>
> The relevant bits of the schema.xml are:
>  positionIncrementGap="0" sortMissingLast="true"/>
>  required="false" multiValued="true"/>
>
> Assuming that the source documents being indexed cannot be changed (which,
> at least for now, they cannot), what would be the next best way to allow
> for both the possibility of multiple f_normalizedValue fields appearing in
> indexed documents, as wel as being able to sort by f_normalizedValue?
>
> Thank you,
>     Aaron


Re: maxScore always returned

2012-06-05 Thread darul
maybe look into your solrconfig.xml file whether fl not set by default on
your request handler 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/maxScore-always-returned-tp3987727p3987733.html
Sent from the Solr - User mailing list archive at Nabble.com.


Search timeout for Solrcloud

2012-06-05 Thread arin_g
Hi, 
We use solrcloud in production, and we are facing some issues with queries
that take very long specially deep paging queries, these queries keep our
servers very busy. i am looking for a way to stop (kill) queries taking
longer than a specific amount of time (say 5 seconds), i checked timeAllowed
but it doesn't work (again query  runs completely). Also i noticed that
there are connTimeout and socketTimeout for distributed searches, but i am
not sure if they kill the thread (i want to save resources by killing the
query, not just returning a timeout). Also, if i could get partial results
that would be ideal. Any suggestions?

Thanks,
 arin

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-timeout-for-Solrcloud-tp3987716.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multi-words synonyms matching

2012-06-05 Thread Bernd Fehling
Do you have test cases?

What are you sending to your SynonymFilterFactory?

What are you expecting it should return?

What is it returning when setting to Version.LUCENE_33?

What is it returning when setting to Version.LUCENE_36?



Am 05.06.2012 10:56, schrieb O. Klein:
> The reason multi word synonyms work better if you use LUCENE_33 is because
> then Solr uses the SlowSynonymFilter instead of SynonymFilterFactory
> (FSTSynonymFilterFactory).
> 
> But I don't know if the difference between them is a bug or not. Maybe
> someone has more insight?
> 
> 
> 
> 
> Bernd Fehling-2 wrote
>>
>> Are you sure with LUCENE_33 (Use of BitVector)?
>>
>>
>> Am 31.05.2012 17:20, schrieb O. Klein:
>>> I have been struggling with this as well and found that using LUCENE_33
>>> gives
>>> the best results.
>>>
>>> But as it will be deprecated this is no everlasting solution. May
>>> somebody
>>> knows one?
>>>
>>
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Multi-words-synonyms-matching-tp3898950p3987728.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Strip html

2012-06-05 Thread Tigunn

Hello,

I advanced on my problem.
The index and fieldtype are good :

I forgot copyfield "body_strip_html" on text, the defaultSearchField.
Newbie's mistake.

Now, solr returns all xml files i want. 
But, in php, the text isn't displayed for 2 xml files (with term "castor"
snipped by html or xml tags like exemple). Look:
http://lucene.472066.n3.nabble.com/file/n3987731/recherche_solr_tei.jpg 

The php file:

Thanks you for your help.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strip-html-tp3987051p3987731.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multi-words synonyms matching

2012-06-05 Thread O. Klein
The reason multi word synonyms work better if you use LUCENE_33 is because
then Solr uses the SlowSynonymFilter instead of SynonymFilterFactory
(FSTSynonymFilterFactory).

But I don't know if the difference between them is a bug or not. Maybe
someone has more insight?




Bernd Fehling-2 wrote
> 
> Are you sure with LUCENE_33 (Use of BitVector)?
> 
> 
> Am 31.05.2012 17:20, schrieb O. Klein:
>> I have been struggling with this as well and found that using LUCENE_33
>> gives
>> the best results.
>> 
>> But as it will be deprecated this is no everlasting solution. May
>> somebody
>> knows one?
>>
> 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-words-synonyms-matching-tp3898950p3987728.html
Sent from the Solr - User mailing list archive at Nabble.com.


maxScore always returned

2012-06-05 Thread Markus Jelsma
Hi,

On trunk the maxScore response attribute is always returned even if score is 
not part of fl. Is this intentional?

Thanks,


Re: random results at specific slots

2012-06-05 Thread srinir
Other option I could think of is to write a custom component which implements
handleResponses, where i can pick random documents from across shards and
insert it into the ResponseBuilder's resultIds ? I would place this
component at the end (or after QueryCOmponent). will that work ? is there a
better solution ?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/random-results-at-specific-slots-tp3987719p3987725.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Add HTTP-header from ResponseWriter

2012-06-05 Thread Markus Jelsma
Thanks, i'll check the issues. 
 
-Original message-
> From:Jack Krupansky 
> Sent: Mon 04-Jun-2012 17:19
> To: solr-user@lucene.apache.org
> Subject: Re: Add HTTP-header from ResponseWriter
> 
> There is some commented-out code in SolrDispatchFilter.doFilter:
> 
> // add info to http headers
> //TODO: See SOLR-232 and SOLR-267.
>   /*try {
> NamedList solrRspHeader = solrRsp.getResponseHeader();
>for (int i=0; i  ((javax.servlet.http.HttpServletResponse) response).addHeader(("Solr-" 
> + solrRspHeader.getName(i)), String.valueOf(solrRspHeader.getVal(i)));
>}
>   } catch (ClassCastException cce) {
> log.log(Level.WARNING, "exception adding response header log 
> information", cce);
>   }*/
> 
> And there is a comment from Grant on SOLR-267 that "The changes to 
> SolrDispatchFilter can screw up SolrJ when you have explicit=all ... so I'm 
> going to ... comment out #2 and put a TODO: there and someone can address it 
> on SOLR-232".
> 
> I did not see a separate Jira issue for arbitrarily setting HTTP headers 
> from response writers.
> 
> -- Jack Krupansky
> 
> -Original Message- 
> From: Markus Jelsma
> Sent: Monday, June 04, 2012 7:10 AM
> To: solr-user@lucene.apache.org
> Subject: Add HTTP-header from ResponseWriter
> 
> Hi,
> 
> There has been discussion before on how to add/set a HTTP-header from a 
> ResponseWriter. That was about adding the number of found documents for a 
> CSVResponseWriter. We also need to set the number of found documents, in 
> this case for the JSONResponseWriter. or any ResponseWriter. Is there any 
> progress or open issue i am not aware of? Can the current (trunk) response 
> framework already set or add an HTTP-header?
> 
> Thanks,
> Markus 
> 
> 


random results at specific slots

2012-06-05 Thread srinir
Hi,

I would like to return results sorted by score (desc), but i would like to
insert random results into some predefined slots (lets say 10, 14 and 18).
The reason I want to do that is I boost click-through rate based features
significantly and i want to give a chance to documents which doesnt have
enough click through rate data. This would help the results stay fresh.  

I looked into solr code and it looks like i need a custom QueryComponent
where once the top results are ordered, i can insert some random results at
my predefined slots and then return. I am wondering whether there is any
other way I can achieve the same?

Thanks
Srini

--
View this message in context: 
http://lucene.472066.n3.nabble.com/random-results-at-specific-slots-tp3987719.html
Sent from the Solr - User mailing list archive at Nabble.com.