Re: Flexible search field analyser/tokenizer configuration

2014-10-03 Thread PeterKerk
Ok, that field now totally works, thanks again!

I've removed the wildcard to benefit from ranking and boosting and am now
trying to combine this field with another, but I have some difficulties
figuring out the right query.

I want to search on the occurence of the keyword in the title field
(title_search_global) of a document OR in the description field
(description_search)
and if it occurs in the title field give that the largest boost, over a
minor boost in the description_search field.

Here's what I have now on query "Ballonnenboog"

http://localhost:8983/solr/tt-shop/select?q=(title_search_global%3A(Ballonnenboog)+OR+title_search_global%3A%22Ballonnenboog%22%5E100)+OR+description_search%3A(Ballonnenboog)&fq=title_search_global%5E10.0%2Bdescription_search%5E0.3&fl=id%2Ctitle&wt=xml&indent=true

But it returns 0 results, even though there are results that have
"Ballonnenboog" in the title_search_global field.

What am I missing?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Flexible-search-field-analyser-tokenizer-configuration-tp4161624p4162638.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Inconsistent response time

2014-10-03 Thread Michael Ryan
It could be due to the minimum timer resolution on Windows. Do a search for 
"windows 15ms" and you'll find a lot of information about it. Though, I'm not 
sure which versions of Windows and/or Java have that problem. You could test it 
out by timing things other than Solr and see if they also take 15ms. I often 
see 15ms, 16ms, 31ms, and 32ms when timing stuff on Windows.

-Michael

-Original Message-
From: Scott Johnson [mailto:sjohn...@dag.com] 
Sent: Friday, October 03, 2014 5:58 PM
To: solr-user@lucene.apache.org
Subject: RE: Inconsistent response time

Thanks for the recommendation, but that is not making a difference here.

-Original Message-
From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
Sent: Friday, October 03, 2014 2:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Inconsistent response time

Hi Scott,

Any chance this could be an IPv6 thing? What if you start both server and 
client with this flag:

-Djava.net.preferIPv4Stack=true



Michael Della Bitta
Senior Software Engineer
o: +1 646 532 3062

appinions inc.
"The Science of Influence Marketing"

18 East 41st Street
New York, NY 10017
t: @appinions | g+: plus.google.com/appinions
w: appinions.com

On Oct 3, 2014, at 15:08, Scott Johnson  wrote:

> We are attempting to improve our Solr response time as our application 
> uses Solr for large and time consuming queries. We have found a very 
> inconsistent result in the time elapsed when pinging Solr. If we ping 
> Solr from a desktop Windows 7 machine, there is usually a 5 ms elapsed 
> time. But if we ping the same Solr instance from a Windows Server 2008
machine, it takes about 15 ms.
> This could be the difference between a 1 hour process and a 3 hour 
> process, so it is something we would like to debug and fix if possible.
> 
> 
> 
> Does anybody have any ideas about why this might be? We get these same 
> results pretty consistently (testing on multiple desktops and 
> servers). One thing that seemed to have an impact is removing various 
> additional JDKs that had been installed, and JDK 1.7u67 specifically
seemed to make a difference.
> 
> 
> 
> Finally, the code we are suing to test this is below. If there is a 
> better test I would be curious to hear that as well.
> 
> 
> 
> Thanks,
> 
> 
> Scott
> 
> 
> 
> 
> 
> package solr;
> 
> 
> 
> import org.apache.commons.lang.StringUtils;
> 
> import org.apache.solr.client.solrj.SolrQuery;
> 
> import org.apache.solr.client.solrj.SolrRequest.METHOD;
> 
> import org.apache.solr.client.solrj.impl.BinaryRequestWriter;
> 
> import org.apache.solr.client.solrj.impl.BinaryResponseParser;
> 
> import org.apache.solr.client.solrj.impl.HttpSolrServer;
> 
> import org.apache.solr.client.solrj.response.QueryResponse;
> 
> import org.apache.solr.client.solrj.response.SolrPingResponse;
> 
> import org.apache.solr.common.SolrDocumentList;
> 
> 
> 
> public class SolrTest {
> 
> 
> 
>private HttpSolrServer server;
> 
> 
> 
>/**
> 
>* @param args
> 
>* @throws Exception
> 
> */
> 
>public static void main(String[] args) throws Exception 
> {
> 
>SolrTest solr = new SolrTest(args);
> 
>// Run it a few times, the second time 
> runs a lot faster.
> 
>for (int i=0; i<3; i++) {
> 
>solr.execute();
> 
>}
> 
>}
> 
> 
> 
>public SolrTest(String[] args) throws Exception {
> 
>String targetUrl = args[0];
> 
> 
> 
>System.out.println("=System
> properties=");
> 
>System.out.println("Start solr test 
> " + targetUrl);
> 
> 
> 
>server = new HttpSolrServer("http://"; +
> targetUrl + ":8111/solr/search/");   
> 
>server.setRequestWriter(new 
> BinaryRequestWriter());
> 
>server.setParser(new 
> BinaryResponseParser());
> 
>server.setAllowCompression(true);
> 
>
> server.setDefaultMaxConnectionsPerHost(128);
> 
>server.setMaxTotalConnections(128);
> 
> 
> 
>SolrPingResponse response = 
> server.ping();
> 
>System.out.println("Ping time: " +
> response.getElapsedTime() + " ms");
> 
>System.out.println("Ping time: " +
> response.getElapsedTime() + " ms");
> 
>}
> 
> 
> 
>private void execute() throws Exception {
> 
>SolrQuery query = new SolrQuery();
> 
>query.setParam("start", "0");
> 
> 

Re: Question about filter cache size

2014-10-03 Thread Yonik Seeley
On Fri, Oct 3, 2014 at 6:38 PM, Peter Keegan  wrote:
>> it will be cached as hidden:true and then inverted
> Inverted at query time, so for best query performance use fq=hidden:false,
> right?

Yep.

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data


Re: Question about filter cache size

2014-10-03 Thread Peter Keegan
> it will be cached as hidden:true and then inverted
Inverted at query time, so for best query performance use fq=hidden:false,
right?

On Fri, Oct 3, 2014 at 3:57 PM, Yonik Seeley  wrote:

> On Fri, Oct 3, 2014 at 3:42 PM, Peter Keegan 
> wrote:
> > Say I have a boolean field named 'hidden', and less than 1% of the
> > documents in the index have hidden=true.
> > Do both these filter queries use the same docset cache size? :
> > fq=hidden:false
> > fq=!hidden:true
>
> Nope... !hidden:true will be smaller in the cache (it will be cached
> as hidden:true and then inverted)
> The downside is that you'll pay the cost of that inversion.
>
> -Yonik
> http://heliosearch.org - native code faceting, facet functions,
> sub-facets, off-heap data
>


Re: Inconsistent response time

2014-10-03 Thread Chris Hostetter
: Solr for large and time consuming queries. We have found a very inconsistent
: result in the time elapsed when pinging Solr. If we ping Solr from a desktop
: Windows 7 machine, there is usually a 5 ms elapsed time. But if we ping the
: same Solr instance from a Windows Server 2008 machine, it takes about 15 ms.

: Does anybody have any ideas about why this might be? We get these same
: results pretty consistently (testing on multiple desktops and servers). One

the devil is really in the details here ... and you haven't provided very 
many.

Define "multiple desktops and servers" ... do you specifically mean 
multiple *local* desktop machines as the client and multiple *remote* 
servers running solr, or are you also refering to "windows server as 
client talking solr server" situations? ("server is vague and ambiguious 
in your email, particularly since one of hte variables is "Windows Server 
2008")

what is your net architecture?

how/where are all of the various machines physically located/connected on 
the network?

Hypothetical cause of your problem that wold fit all of the info you have 
provided: your solr server is a test machine sitting hte same building as 
all of your "desktop" Window 7 machines, and so it's just a local ethernet 
hop away, but your "Windows Server 2008" machine is a "server" in some 
remote data center and has more network hops to go through.

see what i mean about hte details mattering?

when you run this code, is the command line arg you specify an IP addr o a 
hostname? have you ruled out DNS lookup discrepencies between the diff 
client operating systems?

: thing that seemed to have an impact is removing various additional JDKs that
: had been installed, and JDK 1.7u67 specifically seemed to make a difference.

impact how?  Does JDK 1.7u67 make your tests go faster or slower? faster 
or slower then what? what other java versions did you try?

: Finally, the code we are suing to test this is below. If there is a better
: test I would be curious to hear that as well.

well, since the problem you are describing doesn't seem to have anything 
to do with solr, removing solr & solrj completely from the equation would 
be the first thing i would test.

do you have a "curl" equivilent on these windows machines?  have you tried 
just fetching the ping URL w/o using the SolrJ code and comparing hte 
response times that way?

the XML format of the ping response is also really trivial -- you could 
save it to a plain XML file, toss it up in a directory on your favorite 
webserver, and point either curl (or the solrj SolrServer class) at that 
to sanity check wether anything in Solr (or the solrj SolrServer class) is 
actually causing the discrepency you are seeing.


My best guess is the discrepencies you are seeing are entirely network/os 
based and have nothing to do with solr.



-Hoss
http://www.lucidworks.com/


RE: Inconsistent response time

2014-10-03 Thread Scott Johnson
Thanks for the recommendation, but that is not making a difference here.

-Original Message-
From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] 
Sent: Friday, October 03, 2014 2:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Inconsistent response time

Hi Scott,

Any chance this could be an IPv6 thing? What if you start both server and
client with this flag:

-Djava.net.preferIPv4Stack=true



Michael Della Bitta
Senior Software Engineer
o: +1 646 532 3062

appinions inc.
"The Science of Influence Marketing"

18 East 41st Street
New York, NY 10017
t: @appinions | g+: plus.google.com/appinions
w: appinions.com

On Oct 3, 2014, at 15:08, Scott Johnson  wrote:

> We are attempting to improve our Solr response time as our application 
> uses Solr for large and time consuming queries. We have found a very 
> inconsistent result in the time elapsed when pinging Solr. If we ping 
> Solr from a desktop Windows 7 machine, there is usually a 5 ms elapsed 
> time. But if we ping the same Solr instance from a Windows Server 2008
machine, it takes about 15 ms.
> This could be the difference between a 1 hour process and a 3 hour 
> process, so it is something we would like to debug and fix if possible.
> 
> 
> 
> Does anybody have any ideas about why this might be? We get these same 
> results pretty consistently (testing on multiple desktops and 
> servers). One thing that seemed to have an impact is removing various 
> additional JDKs that had been installed, and JDK 1.7u67 specifically
seemed to make a difference.
> 
> 
> 
> Finally, the code we are suing to test this is below. If there is a 
> better test I would be curious to hear that as well.
> 
> 
> 
> Thanks,
> 
> 
> Scott
> 
> 
> 
> 
> 
> package solr;
> 
> 
> 
> import org.apache.commons.lang.StringUtils;
> 
> import org.apache.solr.client.solrj.SolrQuery;
> 
> import org.apache.solr.client.solrj.SolrRequest.METHOD;
> 
> import org.apache.solr.client.solrj.impl.BinaryRequestWriter;
> 
> import org.apache.solr.client.solrj.impl.BinaryResponseParser;
> 
> import org.apache.solr.client.solrj.impl.HttpSolrServer;
> 
> import org.apache.solr.client.solrj.response.QueryResponse;
> 
> import org.apache.solr.client.solrj.response.SolrPingResponse;
> 
> import org.apache.solr.common.SolrDocumentList;
> 
> 
> 
> public class SolrTest {
> 
> 
> 
>private HttpSolrServer server;
> 
> 
> 
>/**
> 
>* @param args
> 
>* @throws Exception
> 
> */
> 
>public static void main(String[] args) throws Exception 
> {
> 
>SolrTest solr = new SolrTest(args);
> 
>// Run it a few times, the second time 
> runs a lot faster.
> 
>for (int i=0; i<3; i++) {
> 
>solr.execute();
> 
>}
> 
>}
> 
> 
> 
>public SolrTest(String[] args) throws Exception {
> 
>String targetUrl = args[0];
> 
> 
> 
>System.out.println("=System
> properties=");
> 
>System.out.println("Start solr test 
> " + targetUrl);
> 
> 
> 
>server = new HttpSolrServer("http://"; +
> targetUrl + ":8111/solr/search/");   
> 
>server.setRequestWriter(new 
> BinaryRequestWriter());
> 
>server.setParser(new 
> BinaryResponseParser());
> 
>server.setAllowCompression(true);
> 
>
> server.setDefaultMaxConnectionsPerHost(128);
> 
>server.setMaxTotalConnections(128);
> 
> 
> 
>SolrPingResponse response = 
> server.ping();
> 
>System.out.println("Ping time: " +
> response.getElapsedTime() + " ms");
> 
>System.out.println("Ping time: " +
> response.getElapsedTime() + " ms");
> 
>}
> 
> 
> 
>private void execute() throws Exception {
> 
>SolrQuery query = new SolrQuery();
> 
>query.setParam("start", "0");
> 
>query.setParam("rows", "1");
> 
> 
> 
>long startTime = 
> System.currentTimeMillis();
> 
> 
> 
>QueryResponse queryResponse = 
> server.query(query, METHOD.POST);
> 
> 
> 
>long elapsedTime =
> (System.currentTimeMillis() - startTime);
> 
> 
> 
>SolrDocumentList results = 
> queryResponse.getResults();
> 
>long totalHits = results.getNumFound();
> 
> 
> 
>System.out.prin

Re: Solr + Federated Search Question

2014-10-03 Thread Jack Krupansky

Yes, either term can be used to confuse people equally well!

-- Jack Krupansky

-Original Message- 
From: Alejandro Calbazana

Sent: Thursday, October 2, 2014 3:28 PM
To: solr-user@lucene.apache.org ; Ahmet Arslan
Subject: Re: Solr + Federated Search Question

Thanks Ahmet.  Yay!  New term :)  Although it does look like "federated"
and "metasearch" can be  used interchangeably.

Alejandro

On Thu, Oct 2, 2014 at 2:37 PM, Ahmet Arslan 
wrote:


Hi Alejandro,

So your example is better called as "metasearch". Here a quotation from a
book.

"Instead of retrieving information from a single information source using
one search engine, one can utilize multiple search engines or a single
search engine retrieving documents from a plethora of document 
collections.

A scenario where multiple engines are used is known as metasearch, while
the scenario where a single engine retrieves from multiple collections is
known as federation. In both these scenarios, the final result of the
retrieval effort needs to be a single, unified ranking of documents, based
on several ranked lists."

Ahmet


On Thursday, October 2, 2014 7:29 PM, Alejandro Calbazana <
acalbaz...@gmail.com> wrote:
Ahmet,Jeff,

Thanks.  Some terms are a bit overloaded.  By "federated", I do mean the
ability to query multiple, disparate, repositories.  So, no.  All of my
data would not necessarily be in Solr.  Solr would be one of several -
databases, filesystems, document stores, etc...  that I would like to
"plug-in".  The content in each repository would be of different types 
(the

shape/schema of the content would differ significantly).

Thanks,

Alejandro




On Wed, Oct 1, 2014 at 9:47 AM, Jack Krupansky 
wrote:

> Alejandro, you'll have to clarify how you are using the term "federated
> search". I mean, technically Ahmet is correct in that Solr queries can 
> be

> fanned out to shards and the results from each shard aggregated
> ("federated") into a single result list, but... more traditionally,
> "federated" refers to "disparate" databases or search engines.
>
> See:
> http://en.wikipedia.org/wiki/Federated_search
>
> So, please tell us a little more about what you are really trying to do.
>
> I mean, is all of your data in Solr, in multiple collections, or on
> multiple Solr servers, or... is only some of your data in Solr and some
is
> in other search engines?
>
> Another approach taken with Solr is that indeed all of your source data
> may be in "disparate databases", but you perform an ETL (Extract,
> Transform, and Load) process to ingest all of that data into Solr and
then
> simply directly search the data within Solr.
>
> -- Jack Krupansky
>
> -Original Message- From: Ahmet Arslan
> Sent: Wednesday, October 1, 2014 9:35 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr + Federated Search Question
>
> Hi,
>
> Federation is possible. Solr has distributed search support with shards
> parameter.
>
> Ahmet
>
>
>
> On Wednesday, October 1, 2014 4:29 PM, Alejandro Calbazana <
> acalbaz...@gmail.com> wrote:
> Hello,
>
> I have a general question about Solr in a federated search context.  I
> understand that Solr does not do federated search and that  different
tools
> are often used to incorporate Solr indexes into a federated/enterprise
> search solution.  Does anyone have recommendations on any products (open
> source or otherwise) that addresses this space?
>
> Thanks,
>
> Alejandro
>






Re: Regarding Default Scoring For Solr

2014-10-03 Thread Jack Krupansky
That's a reasonable description for Solr/Lucene scoring, but use the latest 
release:

http://lucene.apache.org/core/4_10_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

-- Jack Krupansky

-Original Message- 
From: mdemarco123

Sent: Thursday, October 2, 2014 6:06 PM
To: solr-user@lucene.apache.org
Subject: Regarding Default Scoring For Solr

If i add this to the end of my query string I get a score back. &fl=*,score"
Is this the default score? I did read some info on scoring and it is
detailed
and granular and conceptual but because of limited time I can't go into
the how's at the moment of the score calculation.  Are the links below a
good start
as to the default calculation or can it be put any more into a tutorial
fashion

http://www.lucenetutorial.com/advanced-topics/scoring.html
http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regarding-Default-Scoring-For-Solr-tp4162411.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Inconsistent response time

2014-10-03 Thread Michael Della Bitta
Hi Scott,

Any chance this could be an IPv6 thing? What if you start both server and 
client with this flag:

-Djava.net.preferIPv4Stack=true



Michael Della Bitta
Senior Software Engineer
o: +1 646 532 3062

appinions inc.
“The Science of Influence Marketing”

18 East 41st Street
New York, NY 10017
t: @appinions | g+: plus.google.com/appinions
w: appinions.com

On Oct 3, 2014, at 15:08, Scott Johnson  wrote:

> We are attempting to improve our Solr response time as our application uses
> Solr for large and time consuming queries. We have found a very inconsistent
> result in the time elapsed when pinging Solr. If we ping Solr from a desktop
> Windows 7 machine, there is usually a 5 ms elapsed time. But if we ping the
> same Solr instance from a Windows Server 2008 machine, it takes about 15 ms.
> This could be the difference between a 1 hour process and a 3 hour process,
> so it is something we would like to debug and fix if possible.
> 
> 
> 
> Does anybody have any ideas about why this might be? We get these same
> results pretty consistently (testing on multiple desktops and servers). One
> thing that seemed to have an impact is removing various additional JDKs that
> had been installed, and JDK 1.7u67 specifically seemed to make a difference.
> 
> 
> 
> Finally, the code we are suing to test this is below. If there is a better
> test I would be curious to hear that as well.
> 
> 
> 
> Thanks,
> 
> 
> Scott
> 
> 
> 
> 
> 
> package solr;
> 
> 
> 
> import org.apache.commons.lang.StringUtils;
> 
> import org.apache.solr.client.solrj.SolrQuery;
> 
> import org.apache.solr.client.solrj.SolrRequest.METHOD;
> 
> import org.apache.solr.client.solrj.impl.BinaryRequestWriter;
> 
> import org.apache.solr.client.solrj.impl.BinaryResponseParser;
> 
> import org.apache.solr.client.solrj.impl.HttpSolrServer;
> 
> import org.apache.solr.client.solrj.response.QueryResponse;
> 
> import org.apache.solr.client.solrj.response.SolrPingResponse;
> 
> import org.apache.solr.common.SolrDocumentList;
> 
> 
> 
> public class SolrTest {
> 
> 
> 
>private HttpSolrServer server;
> 
> 
> 
>/**
> 
>* @param args
> 
>* @throws Exception 
> 
> */
> 
>public static void main(String[] args) throws Exception {
> 
>SolrTest solr = new SolrTest(args);
> 
>// Run it a few times, the second time runs
> a lot faster.
> 
>for (int i=0; i<3; i++) {
> 
>solr.execute();
> 
>}
> 
>}
> 
> 
> 
>public SolrTest(String[] args) throws Exception {
> 
>String targetUrl = args[0];
> 
> 
> 
>System.out.println("=System
> properties=");
> 
>System.out.println("Start solr test " +
> targetUrl);
> 
> 
> 
>server = new HttpSolrServer("http://"; +
> targetUrl + ":8111/solr/search/");   
> 
>server.setRequestWriter(new
> BinaryRequestWriter());
> 
>server.setParser(new
> BinaryResponseParser());
> 
>server.setAllowCompression(true);
> 
>server.setDefaultMaxConnectionsPerHost(128);
> 
>server.setMaxTotalConnections(128);
> 
> 
> 
>SolrPingResponse response = server.ping();
> 
>System.out.println("Ping time: " +
> response.getElapsedTime() + " ms");
> 
>System.out.println("Ping time: " +
> response.getElapsedTime() + " ms");
> 
>}
> 
> 
> 
>private void execute() throws Exception {
> 
>SolrQuery query = new SolrQuery();
> 
>query.setParam("start", "0");
> 
>query.setParam("rows", "1");
> 
> 
> 
>long startTime = System.currentTimeMillis();
> 
> 
> 
>QueryResponse queryResponse =
> server.query(query, METHOD.POST);
> 
> 
> 
>long elapsedTime =
> (System.currentTimeMillis() - startTime);
> 
> 
> 
>SolrDocumentList results =
> queryResponse.getResults();
> 
>long totalHits = results.getNumFound();
> 
> 
> 
>System.out.println("Search hits:" +
> totalHits
> 
>+ ". Total
> elapsed time:" + elapsedTime + " ms"
> 
>+ ". Solr
> elapsed time:" + queryResponse.getElapsedTime() + " ms"
> 
> 

Re: Question about filter cache size

2014-10-03 Thread Yonik Seeley
On Fri, Oct 3, 2014 at 4:35 PM, Shawn Heisey  wrote:
> On 10/3/2014 1:57 PM, Yonik Seeley wrote:
>> On Fri, Oct 3, 2014 at 3:42 PM, Peter Keegan  wrote:
>>> Say I have a boolean field named 'hidden', and less than 1% of the
>>> documents in the index have hidden=true.
>>> Do both these filter queries use the same docset cache size? :
>>> fq=hidden:false
>>> fq=!hidden:true
>>
>> Nope... !hidden:true will be smaller in the cache (it will be cached
>> as hidden:true and then inverted)
>> The downside is that you'll pay the cost of that inversion.
>
> I would think that unless it's using hashDocSet, the cached data for
> every filter would always be the same size.  The wiki says that
> hashDocSet is no longer used for filter caching as of 1.4.0.  Is that
> actually true?

Yes, SortedIntDocSet is used instead.  It stores an int per match
(i.e. 4 bytes per match).  This change was made so in-order traversal
could be done efficiently.

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data


Re: Question about filter cache size

2014-10-03 Thread Shawn Heisey
On 10/3/2014 1:57 PM, Yonik Seeley wrote:
> On Fri, Oct 3, 2014 at 3:42 PM, Peter Keegan  wrote:
>> Say I have a boolean field named 'hidden', and less than 1% of the
>> documents in the index have hidden=true.
>> Do both these filter queries use the same docset cache size? :
>> fq=hidden:false
>> fq=!hidden:true
> 
> Nope... !hidden:true will be smaller in the cache (it will be cached
> as hidden:true and then inverted)
> The downside is that you'll pay the cost of that inversion.

I would think that unless it's using hashDocSet, the cached data for
every filter would always be the same size.  The wiki says that
hashDocSet is no longer used for filter caching as of 1.4.0.  Is that
actually true?  Is my understanding of filterCache completely out of
touch with reality?

https://wiki.apache.org/solr/SolrCaching#The_hashDocSet_Max_Size

This does bring to mind an optimization that might help memory usage in
cases where either a very small or very large percentage of documents
match the filter: do run-length encoding on the bitset.  If the RLE
representation is at least N percent smaller than the bitset, use that
representation instead.

I think the first iteration of an RLE option would have it always on or
always off, controlled in solrconfig.xml.  A config mode where Solr
attempts RLE on every bitset and periodically reports efficiency
statistics would be pretty nice.  That data might be useful to define
default thresholds for a future automatic mode.

Thanks,
Shawn



Re: Question about filter cache size

2014-10-03 Thread Yonik Seeley
On Fri, Oct 3, 2014 at 3:42 PM, Peter Keegan  wrote:
> Say I have a boolean field named 'hidden', and less than 1% of the
> documents in the index have hidden=true.
> Do both these filter queries use the same docset cache size? :
> fq=hidden:false
> fq=!hidden:true

Nope... !hidden:true will be smaller in the cache (it will be cached
as hidden:true and then inverted)
The downside is that you'll pay the cost of that inversion.

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data


Question about filter cache size

2014-10-03 Thread Peter Keegan
Say I have a boolean field named 'hidden', and less than 1% of the
documents in the index have hidden=true.
Do both these filter queries use the same docset cache size? :
fq=hidden:false
fq=!hidden:true

Peter


Solrj: Cannot retry request with a non-repeatable request entity.

2014-10-03 Thread Jamie Jackson
Please help me get over this obstacle:

Stack trace and (ColdFusion) snippet: https://gist.github.com/jamiejackson

(This is an attempt to get Solrj to perform a deleteByQuery
with ConcurrentUpdateSolrServer using basic authentication. I am a noob
with the entire Solrj and HTTPClient stack. )

Please let me know if you need more info.

FWIW, here's a message I got on the HttpClient mailing list:

Jamie
This question should have probably been addressed to Solrj community.
>From the HC standpoint you have three options
(1) make request entity repeatable
(2) force authentication by doing a GET prior to doing PUT or POST
(3) use 'expect-continue' handshake on the client side if the handshake
is supported by the server side.

Oleg


You'll notice that I tried to set "expect-continue" (commented out in my
snipppet), but that didn't seem to work.

Thanks,
Jamie


Inconsistent response time

2014-10-03 Thread Scott Johnson
We are attempting to improve our Solr response time as our application uses
Solr for large and time consuming queries. We have found a very inconsistent
result in the time elapsed when pinging Solr. If we ping Solr from a desktop
Windows 7 machine, there is usually a 5 ms elapsed time. But if we ping the
same Solr instance from a Windows Server 2008 machine, it takes about 15 ms.
This could be the difference between a 1 hour process and a 3 hour process,
so it is something we would like to debug and fix if possible.

 

Does anybody have any ideas about why this might be? We get these same
results pretty consistently (testing on multiple desktops and servers). One
thing that seemed to have an impact is removing various additional JDKs that
had been installed, and JDK 1.7u67 specifically seemed to make a difference.

 

Finally, the code we are suing to test this is below. If there is a better
test I would be curious to hear that as well.

 

Thanks,


Scott

 

 

package solr;

 

import org.apache.commons.lang.StringUtils;

import org.apache.solr.client.solrj.SolrQuery;

import org.apache.solr.client.solrj.SolrRequest.METHOD;

import org.apache.solr.client.solrj.impl.BinaryRequestWriter;

import org.apache.solr.client.solrj.impl.BinaryResponseParser;

import org.apache.solr.client.solrj.impl.HttpSolrServer;

import org.apache.solr.client.solrj.response.QueryResponse;

import org.apache.solr.client.solrj.response.SolrPingResponse;

import org.apache.solr.common.SolrDocumentList;

 

public class SolrTest {

 

private HttpSolrServer server;



/**

* @param args

* @throws Exception 

 */

public static void main(String[] args) throws Exception {

SolrTest solr = new SolrTest(args);

// Run it a few times, the second time runs
a lot faster.

for (int i=0; i<3; i++) {

solr.execute();

}

}



public SolrTest(String[] args) throws Exception {

String targetUrl = args[0];



System.out.println("=System
properties=");

System.out.println("Start solr test " +
targetUrl);



server = new HttpSolrServer("http://"; +
targetUrl + ":8111/solr/search/");   

server.setRequestWriter(new
BinaryRequestWriter());

server.setParser(new
BinaryResponseParser());

server.setAllowCompression(true);

server.setDefaultMaxConnectionsPerHost(128);

server.setMaxTotalConnections(128);



SolrPingResponse response = server.ping();

System.out.println("Ping time: " +
response.getElapsedTime() + " ms");

System.out.println("Ping time: " +
response.getElapsedTime() + " ms");

}



private void execute() throws Exception {

SolrQuery query = new SolrQuery();

query.setParam("start", "0");

query.setParam("rows", "1");

 

long startTime = System.currentTimeMillis();



QueryResponse queryResponse =
server.query(query, METHOD.POST);



long elapsedTime =
(System.currentTimeMillis() - startTime);

 

SolrDocumentList results =
queryResponse.getResults();

long totalHits = results.getNumFound();



System.out.println("Search hits:" +
totalHits

+ ". Total
elapsed time:" + elapsedTime + " ms"

+ ". Solr
elapsed time:" + queryResponse.getElapsedTime() + " ms"

+ ". Solr
query time:" + queryResponse.getQTime() + " ms"

+ ". Params:
" + getSearchParams(query));

}



 

/**

 * Formats solr query parameters so that we know what's passed to solr.

 * @param query

 * @return

 */

private String getSearchParams(SolrQuery query) {

  

RE: Boosting Top selling items

2014-10-03 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Hi Bob, Yes you can have attribute TOP and make it as binary. Later in the = 
Query you can boost the product with TOP

Something like (ProductType:"Product" AND TOPSelling:true)^10 OR (ProductTy= 
pe:"Product" AND -TOPSelling:true) as a part of the query , If your search 
return products then they are in the first in the search res= ults and matching 
parts down in the search list. But if you are searching f= or parts , If the 
text matches any of the Products then products will be in=  the top..=20

We are telling for the search text,  if it results has products show in fir= st 
otherwise do nothing and show as it is which is nothing but parts.=20

I don't think that search text can determine is it product or part..

-Original Message-
From: Bob Laferriere [mailto:spongeb...@icloud.com] 
Sent: Friday, October 03, 2014 11:44 AM
To: solr-user@lucene.apache.org
Subject: Re: Boosting Top selling items

Thanks Ravi. Do you tag the product as TOP as a binary flag? My marketing team 
wants to use the number of orders but that screws up the relevance horribly. My 
thought is to tag a product with a product attribute (as you suggest) and tag 
it as TOP selling. Then I have pure relevance, but can give a small boost to 
TOP products. 

Do you determine when they are searching for a product? Otherwise, if I search 
for a part and you boost products that would be frustrating to a user.

Thanks for the discussion.

-Bob

On Oct 3, 2014, at 9:38 AM, EXTERNAL Taminidi Ravi (ETI, 
Automotive-Service-Solutions)  wrote:

> Hi Bob,  I tried using a product type attribute which separates the 
> products/parts and boost the product in TOP with OR condition for productype 
> as parts. This way you get all the products /parts related to your search and 
> always keeping the Products in the Top and Parts next to Products. This is a 
> kind of playing with the query so your relevance won't break.
> 
> Regards
> 
> Ravi
> 
> -Original Message-
> From: Bob Laferriere [mailto:spongeb...@icloud.com] 
> Sent: Thursday, October 02, 2014 10:47 PM
> To: solr-user@lucene.apache.org
> Subject: Boosting Top selling items
> 
> I have been working to try and identify top selling items in an eCommerce app 
> and boost those in the results. The struggle I am having is that our catalog 
> stores products and parts in the same taxonomy. Since parts are ordered more 
> frequently when you search for something like TV you see cables and antennas 
> first. My theory is that someone needs to tag products as Top Selling as a 
> facet then use faceted search to avoid an artificial boost which screws up 
> document relevance. Anyone fight with anything similar? Interested in 
> discussing with other eCommerce search developers.
> 
> Regards,
> 
> Bob



Re: Errors on index in SolrCloud: ConcurrentUpdateSolrServer$Runner.run()

2014-10-03 Thread vidit.asthana
How you fixed this? I am also getting same error with 4.10?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Errors-on-index-in-SolrCloud-ConcurrentUpdateSolrServer-Runner-run-tp4107661p4162544.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Boosting Top selling items

2014-10-03 Thread Bob Laferriere
Thanks Ravi. Do you tag the product as TOP as a binary flag? My marketing team 
wants to use the number of orders but that screws up the relevance horribly. My 
thought is to tag a product with a product attribute (as you suggest) and tag 
it as TOP selling. Then I have pure relevance, but can give a small boost to 
TOP products. 

Do you determine when they are searching for a product? Otherwise, if I search 
for a part and you boost products that would be frustrating to a user.

Thanks for the discussion.

-Bob

On Oct 3, 2014, at 9:38 AM, EXTERNAL Taminidi Ravi (ETI, 
Automotive-Service-Solutions)  wrote:

> Hi Bob,  I tried using a product type attribute which separates the 
> products/parts and boost the product in TOP with OR condition for productype 
> as parts. This way you get all the products /parts related to your search and 
> always keeping the Products in the Top and Parts next to Products. This is a 
> kind of playing with the query so your relevance won't break.
> 
> Regards
> 
> Ravi
> 
> -Original Message-
> From: Bob Laferriere [mailto:spongeb...@icloud.com] 
> Sent: Thursday, October 02, 2014 10:47 PM
> To: solr-user@lucene.apache.org
> Subject: Boosting Top selling items
> 
> I have been working to try and identify top selling items in an eCommerce app 
> and boost those in the results. The struggle I am having is that our catalog 
> stores products and parts in the same taxonomy. Since parts are ordered more 
> frequently when you search for something like TV you see cables and antennas 
> first. My theory is that someone needs to tag products as Top Selling as a 
> facet then use faceted search to avoid an artificial boost which screws up 
> document relevance. Anyone fight with anything similar? Interested in 
> discussing with other eCommerce search developers.
> 
> Regards,
> 
> Bob



Re: Determining which field caused a document to not be imported

2014-10-03 Thread Shawn Heisey
On 10/3/2014 8:13 AM, Tom Evans wrote:
> Caused by: java.lang.ClassCastException: java.lang.Integer cannot be
> cast to java.lang.Long
> at java.lang.Long.compareTo(Long.java:50)
> at java.util.TreeMap.getEntry(TreeMap.java:346)
> at java.util.TreeMap.get(TreeMap.java:273)
> at 
> org.apache.solr.handler.dataimport.SortedMapBackedCache.iterator(SortedMapBackedCache.java:147)
> at 
> org.apache.solr.handler.dataimport.DIHCacheSupport.getIdCacheData(DIHCacheSupport.java:179)
> at 
> org.apache.solr.handler.dataimport.DIHCacheSupport.getCacheData(DIHCacheSupport.java:145)
> at 
> org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:129)
> at 
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:75)
> at 
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
> ... 10 more

Is it possible to temporarily remove the caching from the entity?  I
know that this will make performance suck, but I'm suggesting it only as
a troubleshooting step.  I'm wondering if maybe it's a problem in the
caching implementation and not the main DIH jdbc code.

Thanks,
Shawn



Re: Determining which field caused a document to not be imported

2014-10-03 Thread Shawn Heisey
On 10/3/2014 8:24 AM, Tom Evans wrote:
> On Fri, Oct 3, 2014 at 3:13 PM, Tom Evans  wrote:
>> I tried converting the selected data to SIGNED INTEGER, eg
>> "CONVERT(country_id, SIGNED INTEGER) AS country_id", but this did not
>> have the desired effect.
> 
> However, changing them to be cast to CHAR changed the error message -
> "java.lang.Integer cannot be cast to java.lang.String".
> 
> I guess this is saying that the type of the map key must match the
> type of the key coming from the parent entity (which is logical), so I
> guess my question is - what do SQL type do I need to select out to get
> a java.lang.Integer, to match what the map is expecting?

I still need to digest the stacktrace.  What database software are you
connecting to, what version of their JDBC driver do you use, and what
are your typical column types in the DB?

I'm not very familiar with DIH code, and when I've looked in the past,
I've found it very hard to follow ... but later tonight I will check the
code locations mentioned in your stacktrace to see whether it's possible
to log which field is producing the message.  Hopefully we can get you
something.  Ideally it will log all available information, which means
hopefully it can see definitions in the DIH config file like entity,
dataSource, table, etc.

Thanks,
Shawn



Re: Determining which field caused a document to not be imported

2014-10-03 Thread Tom Evans
On Fri, Oct 3, 2014 at 3:24 PM, Tom Evans  wrote:
> On Fri, Oct 3, 2014 at 3:13 PM, Tom Evans  wrote:
>> I tried converting the selected data to SIGNED INTEGER, eg
>> "CONVERT(country_id, SIGNED INTEGER) AS country_id", but this did not
>> have the desired effect.
>
> However, changing them to be cast to CHAR changed the error message -
> "java.lang.Integer cannot be cast to java.lang.String".
>
> I guess this is saying that the type of the map key must match the
> type of the key coming from the parent entity (which is logical), so I
> guess my question is - what do SQL type do I need to select out to get
> a java.lang.Integer, to match what the map is expecting?
>

I rewrote the query for the map, which was doing strange casts itself
(integer to integer casts). This then meant that the values from the
parent query were the same type as those in the map query, and no
funky casts are required anywhere.

However, I still don't have a way to determine which field is failing
when indexing fails like this, and it would be neat if I could
determine a way to do so for future debugging.

Cheers

Tom


RE: Boosting Top selling items

2014-10-03 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Hi Bob,  I tried using a product type attribute which separates the 
products/parts and boost the product in TOP with OR condition for productype as 
parts. This way you get all the products /parts related to your search and 
always keeping the Products in the Top and Parts next to Products. This is a 
kind of playing with the query so your relevance won't break.

Regards

Ravi

-Original Message-
From: Bob Laferriere [mailto:spongeb...@icloud.com] 
Sent: Thursday, October 02, 2014 10:47 PM
To: solr-user@lucene.apache.org
Subject: Boosting Top selling items

I have been working to try and identify top selling items in an eCommerce app 
and boost those in the results. The struggle I am having is that our catalog 
stores products and parts in the same taxonomy. Since parts are ordered more 
frequently when you search for something like TV you see cables and antennas 
first. My theory is that someone needs to tag products as Top Selling as a 
facet then use faceted search to avoid an artificial boost which screws up 
document relevance. Anyone fight with anything similar? Interested in 
discussing with other eCommerce search developers.

Regards,

Bob


Re: Determining which field caused a document to not be imported

2014-10-03 Thread Tom Evans
On Fri, Oct 3, 2014 at 3:13 PM, Tom Evans  wrote:
> I tried converting the selected data to SIGNED INTEGER, eg
> "CONVERT(country_id, SIGNED INTEGER) AS country_id", but this did not
> have the desired effect.

However, changing them to be cast to CHAR changed the error message -
"java.lang.Integer cannot be cast to java.lang.String".

I guess this is saying that the type of the map key must match the
type of the key coming from the parent entity (which is logical), so I
guess my question is - what do SQL type do I need to select out to get
a java.lang.Integer, to match what the map is expecting?

Cheers

Tom


Re: Determining which field caused a document to not be imported

2014-10-03 Thread Tom Evans
On Fri, Oct 3, 2014 at 2:24 PM, Shawn Heisey  wrote:
> Can you give us the entire stacktrace, with complete details from any
> "caused by" sections?  Also, is this 4.8.0 or 4.8.1?
>

Thanks Shawn, this is SOLR 4.8.1 and here is the full traceback from the log:

95191 [Thread-21] INFO
org.apache.solr.update.processor.LogUpdateProcessor  – [products]
webapp=/products path=/dataimport-from-denorm
params={id=2148732&optimize=false&clean=false&indent=true&commit=true&verbose=false&command=full-import&debug=false&wt=json}
status=0 QTime=32 {} 0 32
95199 [Thread-21] ERROR
org.apache.solr.handler.dataimport.DataImporter  – Full Import
failed:java.lang.RuntimeException: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.ClassCastException: java.lang.Integer cannot be cast to
java.lang.Long
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:278)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:464)
Caused by: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.ClassCastException: java.lang.Integer cannot be cast to
java.lang.Long
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:418)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:331)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:239)
... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.ClassCastException: java.lang.Integer cannot be cast to
java.lang.Long
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:63)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:246)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:477)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:503)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:503)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:503)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)
... 5 more
Caused by: java.lang.ClassCastException: java.lang.Integer cannot be
cast to java.lang.Long
at java.lang.Long.compareTo(Long.java:50)
at java.util.TreeMap.getEntry(TreeMap.java:346)
at java.util.TreeMap.get(TreeMap.java:273)
at 
org.apache.solr.handler.dataimport.SortedMapBackedCache.iterator(SortedMapBackedCache.java:147)
at 
org.apache.solr.handler.dataimport.DIHCacheSupport.getIdCacheData(DIHCacheSupport.java:179)
at 
org.apache.solr.handler.dataimport.DIHCacheSupport.getCacheData(DIHCacheSupport.java:145)
at 
org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:129)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:75)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
... 10 more

95199 [Thread-21] INFO  org.apache.solr.update.UpdateHandler  – start rollback{}

I've tracked it down to a single entity now that selects some content
out of the database and then looks up other fields using that data
from sub-entities that have SortedMapBackedCache caching in use, but
I'm still not sure how to fix it.

Eg, the original entity selects out "country_id", which is then used
by this entity:


  


I tried converting the selected data to SIGNED INTEGER, eg
"CONVERT(country_id, SIGNED INTEGER) AS country_id", but this did not
have the desired effect.

The source database is mysql, the source column for "country_id" is
"`country_id` smallint(6) NOT NULL default '0'".

Again, I'm not 100% sure that it is even the "country" field that
causes this, there are several SortedMapBackedCache sub-entities (but
they are all analogous to this one).

Thanks in advance

Tom


Re: Determining which field caused a document to not be imported

2014-10-03 Thread Shawn Heisey
On 10/3/2014 5:41 AM, Tom Evans wrote:
> I recently rewrote our SOLR 4.8 dataimport to read from a set of
> denormalised DB tables, in an attempt to increase full indexing speed.
> When I tried it out however, indexing broke telling me that
> "java.lang.Long cannot be cast to java.lang.Integer" (full stack
> below, with the document elided). From googling, this tends to be some
> field that is being selected out as a long, where it should probably
> be cast as a string.
> 
> Unfortunately, our documents have some 400+ fields and over 100
> entities; is there another way to determine which field could not be
> cast from Long to Integer other than disabling each integer field in
> turn?

Can you give us the entire stacktrace, with complete details from any
"caused by" sections?  Also, is this 4.8.0 or 4.8.1?

Thanks,
Shawn



Token Filter -> synonyms by regex substitution

2014-10-03 Thread Bruno René Santos
Hi,

I want to create synonyms for a token where I use a regular expression on
that token and create a synonym for it with the result.

I tried the following code:


public PatternRulesFilter(TokenStream input, Map
substitutions)
{
super(input);
this.substitutions = substitutions;
this.charTermAttr = addAttribute(CharTermAttribute.class);
this.posIncAttr = addAttribute(PositionIncrementAttribute.class);
this.offsetAttr = addAttribute(OffsetAttribute.class);
this.terms = new LinkedList<>();
}

@Override
public boolean incrementToken() throws IOException
{
if (!terms.isEmpty())
{
String buffer = terms.poll();
charTermAttr.setEmpty();
maxLen = Math.max(maxLen, buffer.length());
charTermAttr.copyBuffer(buffer.toCharArray(), 0, buffer.length());
offsetAttr.setOffset(start, start + buffer.length());
posIncAttr.setPositionIncrement(0);
log.info("new attr: {}", String.valueOf(buffer));
return true;
}
if (input.incrementToken())
{
// we add the new substitutions
String buffer = String.valueOf(charTermAttr.buffer()).trim();
start = maxLen;
maxLen = buffer.length();
terms.addAll(substitutions.entrySet().stream()
.filter(e -> e.getKey().matcher(buffer).find())
.map(e -> e.getKey().matcher(buffer).replaceAll(e.getValue()))
.collect(Collectors.toSet()));
// we return true and leave the original token unchanged
return true;
}
return false;
}

when I use search terms with 2 or more words the second token is overlapped
by substitutions results from the first token. for example if i have a
rules x -> sc and the search term 'taxi sun' i get token like:

taxi sun
tasci sunci

Any ideas why? If you know a token filter that already does this I would
mind use it at all.

Thanx
Bruno

-- 

Bruno René Santos
about.me/brunorene
[image: Bruno René Santos on about.me]
  


Re: If I can a field from text_ws to text do I need to drop and reindex or just reindex?

2014-10-03 Thread waynemailinglist
OK thanks everyone for the help and clarification



--
View this message in context: 
http://lucene.472066.n3.nabble.com/If-I-can-a-field-from-text-ws-to-text-do-I-need-to-drop-and-reindex-or-just-reindex-tp4162488p4162506.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: If I can a field from text_ws to text do I need to drop and reindex or just reindex?

2014-10-03 Thread Erik Hatcher
Then some documents will be indexed one way, and newer docs the other way.  Not 
the end of the world, but it could affect whether documents match as you expect 
during searches.   When changing field types, the safest and sanest thing to do 
is reindex everything again.

Erik

On Oct 3, 2014, at 6:05 AM, waynemailinglist  
wrote:

> What happens if I don't re-index every document? Its possible this might
> happen.
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/If-I-can-a-field-from-text-ws-to-text-do-I-need-to-drop-and-reindex-or-just-reindex-tp4162488p4162491.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Determining which field caused a document to not be imported

2014-10-03 Thread Tom Evans
Hi all

I recently rewrote our SOLR 4.8 dataimport to read from a set of
denormalised DB tables, in an attempt to increase full indexing speed.
When I tried it out however, indexing broke telling me that
"java.lang.Long cannot be cast to java.lang.Integer" (full stack
below, with the document elided). From googling, this tends to be some
field that is being selected out as a long, where it should probably
be cast as a string.

Unfortunately, our documents have some 400+ fields and over 100
entities; is there another way to determine which field could not be
cast from Long to Integer other than disabling each integer field in
turn?

Cheers

Tom


Exception while processing: variant document :
SolrInputDocument(fields: [(removed)]):
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.ClassCastException: java.lang.Long cannot be cast to
java.lang.Integer
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:63)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:246)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:477)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:503)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:503)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:503)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:331)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:239)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:464)
Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast
to java.lang.Integer
at java.lang.Integer.compareTo(Integer.java:52)
at java.util.TreeMap.getEntry(TreeMap.java:346)
at java.util.TreeMap.get(TreeMap.java:273)
at 
org.apache.solr.handler.dataimport.SortedMapBackedCache.iterator(SortedMapBackedCache.java:147)
at 
org.apache.solr.handler.dataimport.DIHCacheSupport.getIdCacheData(DIHCacheSupport.java:179)
at 
org.apache.solr.handler.dataimport.DIHCacheSupport.getCacheData(DIHCacheSupport.java:145)
at 
org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:129)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:75)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
... 10 more


RE: If I can a field from text_ws to text do I need to drop and reindex or just reindex?

2014-10-03 Thread waynemailinglist
What happens if I don't re-index every document? Its possible this might
happen.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/If-I-can-a-field-from-text-ws-to-text-do-I-need-to-drop-and-reindex-or-just-reindex-tp4162488p4162491.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: If I can a field from text_ws to text do I need to drop and reindex or just reindex?

2014-10-03 Thread Markus Jelsma
Hi - you don't need to erase the data directory, you can just reindex, but make 
sure you overwrite all documents.

 
 
-Original message-
> From:Wayne W 
> Sent: Friday 3rd October 2014 11:55
> To: solr-user@lucene.apache.org
> Subject: If I can a field from text_ws to text do I need to drop and reindex 
> or just reindex?
> 
> Hi,
> 
> I've realized I need to change a particular field from text_ws to text.
> 
> I realize I need to reindex as the tokens are being stored in a case
> sensitive manner which we do not want.
> 
> However can I just reindex all my documents, or do I need to drop/wipe the
> /data/index dir and start fresh?
> 
> I really don't want to drop as the current users will not be able to search
> and reindexing could take as long as a week.
> 
> many thanks
> Wayne
> 


If I can a field from text_ws to text do I need to drop and reindex or just reindex?

2014-10-03 Thread Wayne W
Hi,

I've realized I need to change a particular field from text_ws to text.

I realize I need to reindex as the tokens are being stored in a case
sensitive manner which we do not want.

However can I just reindex all my documents, or do I need to drop/wipe the
/data/index dir and start fresh?

I really don't want to drop as the current users will not be able to search
and reindexing could take as long as a week.

many thanks
Wayne