Re: problems with indexing documents

2019-04-02 Thread Bill Tantzen
Right, as Mark said, this is how the dates were indexed previously.
However, instead of passing in the actual String, we passed a
java.util.Date object which was automagically converted to the correct
string.

Now (the code on our end has not changed), solr throws an exception
because the string it sees is of the form 'Sun Jul 31 19:00:00 CDT
2016' -- (which I believe is the Date.toString() result) instead of
the DatePointField or TrieDateField format.

~~ Bill

On Mon, Apr 1, 2019 at 8:44 PM Zheng Lin Edwin Yeo  wrote:
>
> Hi Bill,
>
> Previously, did you index the date in the same format as you are using now,
> or in the Solr format of "-MM-DDTHH:MM:SSZ"?
>
> Regards,
> Edwin
>
>
> On Tue, 2 Apr 2019 at 00:32, Bill Tantzen  wrote:
>
> > In a legacy application using Solr 4.1 and solrj, I have always been
> > able to add documents with TrieDateField types using java.util.Date
> > objects, for instance,
> >
> > doc.addField ( "date", new java.util.Date() );
> >
> > having recently upgraded to Solr 7.7, and updating my schema to
> > leverage DatePointField as my type, that code no longer works,  it
> > throws an exception with an error like:
> >
> > Invalid Date String: 'Sun Jul 31 19:00:00 CDT 2016'
> >
> > I understand that this String is not what solr expects, but in lieu of
> > formatting the correct String, is there no longer a way to pass in a
> > simple Date object?  Was there some kind of implicit conversion taking
> > place earlier that is no longer happening?
> >
> > In fact, in the some of the example code that come with the solr
> > distribution, (SolrExampleTests.java), document timestamp fields are
> > added using the same AddField call I am attempting to use, so I am
> > very confused.
> >
> > Thanks for any advice!
> >
> > Regards,
> > Bill
> >



-- 
Human wheels spin round and round
While the clock keeps the pace... -- John Mellencamp

Bill TantzenUniversity of Minnesota Libraries
612-626-9949 (U of M)612-325-1777 (cell)


problems with indexing documents

2019-04-01 Thread Bill Tantzen
In a legacy application using Solr 4.1 and solrj, I have always been
able to add documents with TrieDateField types using java.util.Date
objects, for instance,

doc.addField ( "date", new java.util.Date() );

having recently upgraded to Solr 7.7, and updating my schema to
leverage DatePointField as my type, that code no longer works,  it
throws an exception with an error like:

Invalid Date String: 'Sun Jul 31 19:00:00 CDT 2016'

I understand that this String is not what solr expects, but in lieu of
formatting the correct String, is there no longer a way to pass in a
simple Date object?  Was there some kind of implicit conversion taking
place earlier that is no longer happening?

In fact, in the some of the example code that come with the solr
distribution, (SolrExampleTests.java), document timestamp fields are
added using the same AddField call I am attempting to use, so I am
very confused.

Thanks for any advice!

Regards,
Bill


Solr 6.6 using swap space causes recovery?

2017-12-15 Thread Bill Oconnor
Hello,


We recently upgraded to SolrCloud 6.6. We are running on Ubuntu servers LTS 
14.x - VMware on Nutanics boxs. We have 4 nodes with 32GB each and 16GB for the 
jvm with 12GB minimum. Usually it is only using 4-7GB.


We do nightly indexing of partial fields for all our docs ~200K. This usually 
takes 3hr using 10 threads. About every other week we have a server go into 
recovery mode during the update. The recovering server has a much larger swap 
usage than the other servers in the cluster. We think this this related to the 
mmap files used for indexes. The server eventually recovers but it triggers 
alerts for devops which are annoying.


I have found a previous mail  list question (Shawn responded to) with almost an 
identical problem from 2014 but there is no suggested remedy. ( 
http://lucene.472066.n3.nabble.com/Solr-4-3-1-memory-swapping-td4126641.html)


Questions :


Is there progress regarding this?


Some kind of configuration that can mitigate this?


Maybe this is a lucene issue.


Thanks,

Bill OConnor (www.plos.org)


Re: Replicates not recovering after rolling restart

2017-09-22 Thread Bill Oconnor

Thanks everyone for the responses.


I believe I have found the problem.


The type of __version__ is incorrect in our schema. This is a required field 
that is primarily used by Solr.


Our schema has typed it as type=int instead of  type=long


I believe that this number is used by the replication process to figure out 
what needs to be sync'd on an

individual replicate. In our case Solr puts the value in during indexing. It 
appears that Solr has chosen a

number that cannot be represented by "int". As the replicates query the leader 
to determine if a sync is

necessary the the leader throws an error as it try's to format the response 
with the large _version_ .

This process continues until the replicates give up.


I finally verified this by doing a simple query _version_:*which throws the 
same error but gives

more helpful info "re-index your documents"


Thanks.






From: Rick Leir 
Sent: Friday, September 22, 2017 12:34:57 AM
To: solr-user@lucene.apache.org
Subject: Re: Replicates not recovering after rolling restart

Wunder, Erick

$ dc
16o
1578578283947098112p
15E83C95E8D0

That is an interesting number. Is it, as a guess, machine instructions
or an address pointer? It does not look like UTF-8 or ASCII. Machine
code looks promising:


Disassembly:

0:  15 e8 3c 95 e8  adceax,0xe8953ce8
5:  d0 00   rolBYTE PTR [rax],1


/ADC/dest,src Modifies flags: AF CF OF SF PF ZF Sums two binary operands
placing the result in the destination.

*ROL - Rotate Left*

Registers: the/64-bit/extension of/eax/is called/rax/.

Is that code possibly in the JVM executable? Or a random memory page.

cheers -- Rick

On 2017-09-20 07:21 PM, Walter Underwood wrote:
> 1578578283947098112 needs 61 bits. Is it being parsed into a 32 bit target?
>
> That doesn’t explain where it came from, of course.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Sep 20, 2017, at 3:35 PM, Erick Erickson  wrote:
>>
>> The numberformatexception is...odd. Clearly that's too big a number
>> for an integer, did anything in the underlying schema change?
>>
>> Best,
>> Erick
>>
>> On Wed, Sep 20, 2017 at 3:00 PM, Walter Underwood  
>> wrote:
>>> Rolling restarts work fine for us. I often include installing new configs 
>>> with that. Here is our script. Pass it any hostname in the cluster. I use 
>>> the load balancer name. You’ll need to change the domain and the install 
>>> directory of course.
>>>
>>> #!/bin/bash
>>>
>>> cluster=$1
>>>
>>> hosts=`curl -s 
>>> "http://${cluster}:8983/solr/admin/collections?action=CLUSTERSTATUS&wt=json";
>>>  | jq -r '.cluster.live_nodes[]' | sort`
>>>
>>> for host in $hosts
>>> do
>>> host="${host}.cloud.cheggnet.com"
>>> echo restarting Solr on $host
>>> ssh $host 'cd /apps/solr6 ; sudo -u bin bin/solr stop; sudo -u bin 
>>> bin/solr start -cloud -h `hostname`'
>>> done
>>>
>>>
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>
>>>> On Sep 20, 2017, at 1:42 PM, Bill Oconnor  wrote:
>>>>
>>>> Hello,
>>>>
>>>>
>>>> Background:
>>>>
>>>>
>>>> We have been successfully using Solr for over 5 years and we recently made 
>>>> the decision to move into SolrCloud. For the most part that has been easy 
>>>> but we have repeated problems with our rolling restart were server remain 
>>>> functional but stay in Recovery until they stop trying. We restarted 
>>>> because we increased the memory from 12GB to 16GB on the JVM.
>>>>
>>>>
>>>> Does anyone have any insight as to what is going on here?
>>>>
>>>> Is there a special procedure I should use for starting a stopping host?
>>>>
>>>> Is it ok to do a rolling restart on all the nodes in s shard?
>>>>
>>>>
>>>> Any insight would be appreciated.
>>>>
>>>>
>>>> Configuration:
>>>>
>>>>
>>>> We have a group of servers with multiple collections. Each collection 
>>>> consist of one shard and multiple replicates. We are running the latest 
>>>> stable version of SolrClound 6.6 on Ubuntu LTS and Oracle Corporation Java 
>>>> HotSpot(TM) 64-Bit Server VM 1.8.0_66 25.66-b17
>>>>
>&g

Re: Replicates not recovering after rolling restart

2017-09-21 Thread Bill Oconnor

  1.  We are moving from 4.X to 6.6.
  2.  Changed the schema - adding the version etc nothing major.
  3.  Full re-index of documents into the cluster - so this is not a migration.
  4.  Changed the the JVM parameter from 12GB to 16GB and did a restart.
  5.  Replicates go into recovery which fails to complete after many hours. 
They still respond to queries but the /update POST from the replicates fails 
with the 500 server error and a stack trace because of the number format 
failure.


My other cluster  does not reuse any nodes. The restart went as expected with 
the JVM change. Al


From: Erick Erickson 
Sent: Thursday, September 21, 2017 8:25:32 AM
To: solr-user
Subject: Re: Replicates not recovering after rolling restart

Hmmm, I didn't ask what version you're upgrading _from_. 5 years ago
would be Solr 4. Are you replacing Solr 5 or 4? I'm guessing 5, but
want to check unlikely possibilities.

Next question: I'm assuming all your nodes have been upgraded to Solr 6, right?

Best,
Erick

On Wed, Sep 20, 2017 at 7:18 PM, Bill Oconnor  wrote:
> I have no clue where that number comes from it does not seem to be in the 
> actual post to the leader as seen in my tcpdump.   It is mystery.
>
> 
> From: Walter Underwood 
> Sent: Wednesday, September 20, 2017 7:00:53 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Replicates not recovering after rolling restart
>
>
>> On Sep 20, 2017, at 6:15 PM, Bill Oconnor  wrote:
>>
>> I restart using the standard "sudo service solr start/stop"
>
> You might look into what that actually does.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
[https://wunderwood.files.wordpress.com/2017/02/diva.png?w=32]<http://observer.wunderwood.org/>

Most Casual Observer<http://observer.wunderwood.org/>
observer.wunderwood.org



>


Re: Replicates not recovering after rolling restart

2017-09-20 Thread Bill Oconnor
I have no clue where that number comes from it does not seem to be in the 
actual post to the leader as seen in my tcpdump.   It is mystery.


From: Walter Underwood 
Sent: Wednesday, September 20, 2017 7:00:53 PM
To: solr-user@lucene.apache.org
Subject: Re: Replicates not recovering after rolling restart


> On Sep 20, 2017, at 6:15 PM, Bill Oconnor  wrote:
>
> I restart using the standard "sudo service solr start/stop"

You might look into what that actually does.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Re: Replicates not recovering after rolling restart

2017-09-20 Thread Bill Oconnor
Thanks everyone for the response.


I do not think we changed anything other than the JVM memory size.


I did leave out one piece of info - one of the host is a replicate in another 
shard.


collection1 -> shard1 -> *h1, h2, h3, h4where star is leader

collection2 -> shard1 -> *h5, h3


When I restart *h1 works fine h2,h3,h4 go into recovery but still respond to 
request. *h1 starts getting

the post from the recovering servers and responds with the 500 Server Error 
until the servers quit.


Collection2 with h3 is active and fine even though it is recovering in 
collection1.


This happened before and I resolved it by deleting and then creating a new 
collection.


I restart using the standard "sudo service solr start/stop"


I have to say I am not comfortable with have multiple shards being shared on 
the same host. The Productions servers will not be configured this way but 
these servers are for development.


From: Erick Erickson 
Sent: Wednesday, September 20, 2017 3:35:16 PM
To: solr-user
Subject: Re: Replicates not recovering after rolling restart

The numberformatexception is...odd. Clearly that's too big a number
for an integer, did anything in the underlying schema change?

Best,
Erick

On Wed, Sep 20, 2017 at 3:00 PM, Walter Underwood  wrote:
> Rolling restarts work fine for us. I often include installing new configs 
> with that. Here is our script. Pass it any hostname in the cluster. I use the 
> load balancer name. You’ll need to change the domain and the install 
> directory of course.
>
> #!/bin/bash
>
> cluster=$1
>
> hosts=`curl -s 
> "http://${cluster}:8983/solr/admin/collections?action=CLUSTERSTATUS&wt=json"; 
> | jq -r '.cluster.live_nodes[]' | sort`
>
> for host in $hosts
> do
> host="${host}.cloud.cheggnet.com"
> echo restarting Solr on $host
> ssh $host 'cd /apps/solr6 ; sudo -u bin bin/solr stop; sudo -u bin 
> bin/solr start -cloud -h `hostname`'
> done
>
>
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Sep 20, 2017, at 1:42 PM, Bill Oconnor  wrote:
>>
>> Hello,
>>
>>
>> Background:
>>
>>
>> We have been successfully using Solr for over 5 years and we recently made 
>> the decision to move into SolrCloud. For the most part that has been easy 
>> but we have repeated problems with our rolling restart were server remain 
>> functional but stay in Recovery until they stop trying. We restarted because 
>> we increased the memory from 12GB to 16GB on the JVM.
>>
>>
>> Does anyone have any insight as to what is going on here?
>>
>> Is there a special procedure I should use for starting a stopping host?
>>
>> Is it ok to do a rolling restart on all the nodes in s shard?
>>
>>
>> Any insight would be appreciated.
>>
>>
>> Configuration:
>>
>>
>> We have a group of servers with multiple collections. Each collection 
>> consist of one shard and multiple replicates. We are running the latest 
>> stable version of SolrClound 6.6 on Ubuntu LTS and Oracle Corporation Java 
>> HotSpot(TM) 64-Bit Server VM 1.8.0_66 25.66-b17
>>
>>
>> (collection)  (shard)  (replicates)
>>
>> journals_stage   ->  shard1  ->  solr-220 (leader) , solr-223, solr-221, 
>> solr-222 (replicates)
>>
>>
>> Problem:
>>
>>
>> Restarting the system puts the replicates in a recovery state they never 
>> exit from. They eventually give up after 500 tries.  If I go to the 
>> individual replicates and execute a query the data is still available.
>>
>>
>> Using tcpdump I find the replicates sending this request to the leader (the 
>> leader appears to be active).
>>
>>
>> The exchange goes  like this - :
>>
>>
>> solr-220 is the leader.
>>
>> Solr-221 to Solr-220
>>
>>
>> 10:18:42.426823 IP solr-221:54341 > solr-220:8983:
>>
>>
>> POST /solr/journals_stage_shard1_replica1/update HTTP/1.1
>> Content-Type: application/x-www-form-urlencoded; charset=UTF-8
>> User-Agent: 
>> Solr[org.apache.solr<http://org.apache.solr/>.client.solrj.impl<http://client.solrj.impl/>.HttpSolrClient]
>>  1.0
>> Content-Length: 108
>> Host: solr-220:8983
>> Connection: Keep-Alive
>>
>>
>> commit_end_point=true&openSearcher=false&commit=true&softCommit=false&waitSearcher=true&wt=javabin&version=2
>>
>>
>> Solr-220 back to Solr-221
>>
>>
>> IP solr-220:8983 > solr

Replicates not recovering after rolling restart

2017-09-20 Thread Bill Oconnor
Hello,


Background:


We have been successfully using Solr for over 5 years and we recently made the 
decision to move into SolrCloud. For the most part that has been easy but we 
have repeated problems with our rolling restart were server remain functional 
but stay in Recovery until they stop trying. We restarted because we increased 
the memory from 12GB to 16GB on the JVM.


Does anyone have any insight as to what is going on here?

Is there a special procedure I should use for starting a stopping host?

Is it ok to do a rolling restart on all the nodes in s shard?


Any insight would be appreciated.


Configuration:


We have a group of servers with multiple collections. Each collection consist 
of one shard and multiple replicates. We are running the latest stable version 
of SolrClound 6.6 on Ubuntu LTS and Oracle Corporation Java HotSpot(TM) 64-Bit 
Server VM 1.8.0_66 25.66-b17


(collection)  (shard)  (replicates)

journals_stage   ->  shard1  ->  solr-220 (leader) , solr-223, solr-221, 
solr-222 (replicates)


Problem:


Restarting the system puts the replicates in a recovery state they never exit 
from. They eventually give up after 500 tries.  If I go to the individual 
replicates and execute a query the data is still available.


Using tcpdump I find the replicates sending this request to the leader (the 
leader appears to be active).


The exchange goes  like this - :


solr-220 is the leader.

Solr-221 to Solr-220


10:18:42.426823 IP solr-221:54341 > solr-220:8983:


POST /solr/journals_stage_shard1_replica1/update HTTP/1.1
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
User-Agent: 
Solr[org.apache.solr.client.solrj.impl.HttpSolrClient]
 1.0
Content-Length: 108
Host: solr-220:8983
Connection: Keep-Alive


commit_end_point=true&openSearcher=false&commit=true&softCommit=false&waitSearcher=true&wt=javabin&version=2


Solr-220 back to Solr-221


IP solr-220:8983 > solr-221:54341: Flags [P.], seq 1:5152, ack 385, win 235, 
options [nop,nop,
TS val 85813 ecr 858107069], length 5151
..HTTP/1.1 500 Server Error
Content-Type: application/octet-stream
Content-Length: 5060


.responseHeader..&statusT..%QTimeC.%error..#msg?.For input string: 
"1578578283947098112".%trace?.&java.lang.NumberFormatException: For
input string: "1578578283947098112"
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:583)
at java.lang.Integer.parseInt(Integer.java:615)
at 
org.apache.lucene.queries.function.docvalues.IntDocValues.getRangeScorer(IntDocValues.java:89)
at 
org.apache.solr.search.function.ValueSourceRangeFilter$1.iterator(ValueSourceRangeFilter.java:83)
at 
org.apache.solr.search.SolrConstantScoreQuery$ConstantWeight.scorer(SolrConstantScoreQuery.java:100)
at org.apache.lucene.search.Weight.scorerSupplier(Weight.java:126)
at 
org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:400)
at org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:381)
at 
org.apache.solr.update.DeleteByQueryWrapper$1.scorer(DeleteByQueryWrapper.java:90)
at 
org.apache.lucene.index.BufferedUpdatesStream.applyQueryDeletes(BufferedUpdatesStream.java:709)

at 
org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:267)




VPAT / 508 compliance information?

2016-10-11 Thread Bill Yosmanovich
Would anyone happen to know if SOLR has a VPAT or where I could obtain any 
Section 508 compliance information?

Thanks!
Bill Yosmanovich



VPAT?

2016-10-11 Thread Bill Yosmanovich
Would anyone happen to know if SOLR has a VPAT or where I could obtain any 
Section 508 compliance information?

Thanks!
Bill Yosmanovich


Re: admin-extra

2015-10-11 Thread Bill Au
admin-extra allows one to include additional links and/or information in
the Solr admin main page:

https://cwiki.apache.org/confluence/display/solr/Core-Specific+Tools

Bill

On Wed, Oct 7, 2015 at 5:40 PM, Upayavira  wrote:

> Do you use admin-extra within the admin UI?
>
> If so, please go to [1] and document your use case. The feature
> currently isn't implemented in the new admin UI, and without use-cases,
> it likely won't be - so if you want it in there, please help us
> understand how you use it!
>
> Thanks!
>
> Upayavira
>
> [1] https://issues.apache.org/jira/browse/SOLR-8140
>


Re: Best Indexing Approaches - To max the throughput

2015-10-06 Thread Bill Dueber
Just to add...my informal tests show that batching has way more effect
than solrj vs json.

I haven't look at CUSC in a while, last time I looked it was impossible to
do anything smart about error handling, so check that out before you get
too deeply into it. We use a strategy of sending a batch of json documents,
and if it returns an error sending each record one at a time until we find
the bad one and can log something useful.



On Mon, Oct 5, 2015 at 12:07 PM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> Thanks Erick,
> you confirmed my impressions!
> Thank you very much for the insights, an other opinion is welcome :)
>
> Cheers
>
> 2015-10-05 14:55 GMT+01:00 Erick Erickson :
>
> > SolrJ tends to be faster for several reasons, not the least of which
> > is that it sends packets to Solr in a more efficient binary format.
> >
> > Batching is critical. I did some rough tests using SolrJ and sending
> > docs one at a time gave a throughput of < 400 docs/second.
> > Sending 10 gave 2,300 or so. Sending 100 at a time gave
> > over 5,300 docs/second. Curiously, 1,000 at a time gave only
> > marginal improvement over 100. This was with a single thread.
> > YMMV of course.
> >
> > CloudSolrClient is definitely the better way to go with SolrCloud,
> > it routes the docs to the correct leader instead of having the
> > node you send the docs to do the routing.
> >
> > Best,
> > Erick
> >
> > On Mon, Oct 5, 2015 at 4:57 AM, Alessandro Benedetti
> >  wrote:
> > > I was doing some studies and analysis, just wondering in your opinion
> > which
> > > one is the best approach to use to index in Solr to reach the best
> > > throughput possible.
> > > I know that a lot of factor are affecting Indexing time, so let's only
> > > focus in the feeding approach.
> > > Let's isolate different scenarios :
> > >
> > > *Single Solr Infrastructure*
> > >
> > > 1) Xml/Json batch request to /update IndexHandler (xml/json)
> > >
> > > 2) SolrJ ConcurrentUpdateSolrClient ( javabin)
> > > I was thinking this to be the fastest approach for a multi threaded
> > > indexing application.
> > > Posting batch of docs if possible per request.
> > >
> > > *Solr Cloud*
> > >
> > > 1) Xml/Json batch request to /update IndexHandler(xml/json)
> > >
> > > 2) SolrJ ConcurrentUpdateSolrClient ( javabin)
> > >
> > > 3) CloudSolrClient ( javabin)
> > > it seems the best approach accordingly to this improvements [1]
> > >
> > > What are your opinions ?
> > >
> > > A bonus observation should be for using some Map/Reduce big data
> indexer,
> > > but let's assume we don't have a big cluster of cpus, but the average
> > > Indexer server.
> > >
> > >
> > > [1]
> > >
> >
> https://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
> > >
> > >
> > > Cheers
> > >
> > >
> > > --
> > > --
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


Way to determine (via analyzer) what fields/types will be created for a given field name?

2015-09-30 Thread Bill Dueber
Let’s say I have





[I
​started thinking this sort of thing
 through a while back
<http://robotlibrarian.billdueber.com/2014/10/schemaless-solr-with-dynamicfield-and-copyfield/>
]

If I index a field named lastname_st, I end up with:

   - field lastname_t of type text
   - field lastname of type string

*​​*
*Is there any way for me to query solr to find out what​ fields and
fieldtypes​ it’s going to ​produce, in the way the analysis handlers can
show me transformations and so on?*

—
Bill Dueber
Library Systems Programmer
University of Michigan Library
​


solrcloud and core swapping

2015-08-28 Thread Bill Au
Is core swapping supported in SolrCloud?  If I have a 5 nodes SolrCloud
cluster and I do a core swap on the leader, will the core be swapped on the
other 4 nodes as well?  Or do I need to do a core swap on each node?

Bill


TimeAllowed bug

2015-08-24 Thread Bill Bell
Weird fq caching bug when using timeAllowed

Find a pwid (in this case YLGVQ)
Run a query w/ a FQ on the pwid and timeAllowed=1.
http://hgsolr2devsl.healthgrades.com:8983/solr/providersearch/select/?q=*:*&wt=json&fl=pwid&fq=pwid:YLGVQ&timeAllowed=1
Ensure #2 returns 0 results
Rerun the query without the timeAllowed param.
http://hgsolr2devsl.healthgrades.com:8983/solr/providersearch/select/?q=*:*&wt=json&fl=pwid&fq=pwid:YLGVQ
Note that after removing the timeAllowed parameter the query is still returning 
0 results.

 Solr seems to be caching the FQ when the timeAllowed parameter is present.


Bill Bell
Sent from mobile



Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Bill Bell
We use 8gb to 10gb for those size indexes all the time.


Bill Bell
Sent from mobile


> On Aug 23, 2015, at 8:52 AM, Shawn Heisey  wrote:
> 
>> On 8/22/2015 10:28 PM, Zheng Lin Edwin Yeo wrote:
>> Hi Shawn,
>> 
>> Yes, I've increased the heap size to 4GB already, and I'm using a machine
>> with 32GB RAM.
>> 
>> Is it recommended to further increase the heap size to like 8GB or 16GB?
> 
> Probably not, but I know nothing about your data.  How many Solr docs
> were created by indexing 1GB of data?  How much disk space is used by
> your Solr index(es)?
> 
> I know very little about clustering, but it looks like you've gotten a
> reply from Toke, who knows a lot more about that part of the code than I do.
> 
> Thanks,
> Shawn
> 


Re: solr multicore vs sharding vs 1 big collection

2015-08-03 Thread Bill Bell
Yeah a separate by month or year is good and can really help in this case.

Bill Bell
Sent from mobile


> On Aug 2, 2015, at 5:29 PM, Jay Potharaju  wrote:
> 
> Shawn,
> Thanks for the feedback. I agree that increasing timeout might alleviate
> the timeout issue. The main problem with increasing timeout is the
> detrimental effect it will have on the user experience, therefore can't
> increase it.
> I have looked at the queries that threw errors, next time I try it
> everything seems to work fine. Not sure how to reproduce the error.
> My concern with increasing the memory to 32GB is what happens when the
> index size grows over the next few months.
> One of the other solutions I have been thinking about is to rebuild
> index(weekly) and create a new collection and use it. Are there any good
> references for doing that?
> Thanks
> Jay
> 
>> On Sun, Aug 2, 2015 at 10:19 AM, Shawn Heisey  wrote:
>> 
>>> On 8/2/2015 8:29 AM, Jay Potharaju wrote:
>>> The document contains around 30 fields and have stored set to true for
>>> almost 15 of them. And these stored fields are queried and updated all
>> the
>>> time. You will notice that the deleted documents is almost 30% of the
>>> docs.  And it has stayed around that percent and has not come down.
>>> I did try optimize but that was disruptive as it caused search errors.
>>> I have been playing with merge factor to see if that helps with deleted
>>> documents or not. It is currently set to 5.
>>> 
>>> The server has 24 GB of memory out of which memory consumption is around
>> 23
>>> GB normally and the jvm is set to 6 GB. And have noticed that the
>> available
>>> memory on the server goes to 100 MB at times during a day.
>>> All the updates are run through DIH.
>> 
>> Using all availble memory is completely normal operation for ANY
>> operating system.  If you hold up Windows as an example of one that
>> doesn't ... it lies to you about "available" memory.  All modern
>> operating systems will utilize memory that is not explicitly allocated
>> for the OS disk cache.
>> 
>> The disk cache will instantly give up any of the memory it is using for
>> programs that request it.  Linux doesn't try to hide the disk cache from
>> you, but older versions of Windows do.  In the newer versions of Windows
>> that have the Resource Monitor, you can go there to see the actual
>> memory usage including the cache.
>> 
>>> Every day at least once i see the following error, which result in search
>>> errors on the front end of the site.
>>> 
>>> ERROR org.apache.solr.servlet.SolrDispatchFilter -
>>> null:org.eclipse.jetty.io.EofException
>>> 
>>> From what I have read these are mainly due to timeout and my timeout is
>> set
>>> to 30 seconds and cant set it to a higher number. I was thinking maybe
>> due
>>> to high memory usage, sometimes it leads to bad performance/errors.
>> 
>> Although this error can be caused by timeouts, it has a specific
>> meaning.  It means that the client disconnected before Solr responded to
>> the request, so when Solr tried to respond (through jetty), it found a
>> closed TCP connection.
>> 
>> Client timeouts need to either be completely removed, or set to a value
>> much longer than any request will take.  Five minutes is a good starting
>> value.
>> 
>> If all your client timeout is set to 30 seconds and you are seeing
>> EofExceptions, that means that your requests are taking longer than 30
>> seconds, and you likely have some performance issues.  It's also
>> possible that some of your client timeouts are set a lot shorter than 30
>> seconds.
>> 
>>> My objective is to stop the errors, adding more memory to the server is
>> not
>>> a good scaling strategy. That is why i was thinking maybe there is a
>> issue
>>> with the way things are set up and need to be revisited.
>> 
>> You're right that adding more memory to the servers is not a good
>> scaling strategy for the general case ... but in this situation, I think
>> it might be prudent.  For your index and heap sizes, I would want the
>> company to pay for at least 32GB of RAM.
>> 
>> Having said that ... I've seen Solr installs work well with a LOT less
>> memory than the ideal.  I don't know that adding more memory is
>> necessary, unless your system (CPU, storage, and memory speeds) is
>> particularly slow.  Based on your document count and index size, your
>> documents are quite small

Re: Nested objects in Solr

2015-07-24 Thread Bill Au
What exactly do you mean by nested objects in Solr.  It would help if you
give an example.  The Solr schema is flat as far as I know.

Bill

On Fri, Jul 24, 2015 at 9:24 AM, Rajesh 
wrote:

> You can use nested entities like below.
>
> 
>  query="SELECT * FROM User">
>  
> 
>
>  query="select * from subject" >
>
> 
> 
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Nested-objects-in-Solr-tp4213212p4219039.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


DIH question: importing string containing comma-delimited list into a multiValued field

2015-07-17 Thread Bill Au
One of my database column is a varchar containing a comma-delimited list of
values.  I wold like to import these values into a multiValued field.  I
figure that I will need to write a ScriptTransformer to do that.  Is there
a better way?

Bill


Re: Division with Stats Component when Grouping in Solr

2015-06-13 Thread Bill Bell
It would be cool to be able to set 2 group by with facets 

>> GROUP BY
>>site_id, keyword


Bill Bell
Sent from mobile


On Jun 13, 2015, at 2:28 PM, Yonik Seeley  wrote:

>> GROUP BY
>>site_id, keyword


Added As Editor

2015-05-18 Thread Bill Trembley
As large users of Solr/Lucene for many of our existing sites
(BoatTrader.com, GetAuto.com, ForRent.com, etc) we would like for the
ability to contribute to the wiki as we come across items.   My current
wiki login is BillTrembley (bill.tremb...@gmail.com).

Thanks,

Bill Trembley

Director of Product Development and Technology

Dominion Performance Network
150 Granby Street

Norfolk, VA 23510

p(757) 351-7648

c(757) 575-0582

bill.tremb...@dominionenterprises.com

Skype: bill.trembley


Re: SolrCloud indexing

2015-05-12 Thread Bill Au
Thanks for the reply.

Actually in our case we want the timestamp to be populated locally on each
node in the SolrCloud cluster.  We want to see if there is any delay in the
document being distributed within the cluster.  Just want to confirm that
the timestamp can be use for that purpose.

Bill

On Sat, May 9, 2015 at 11:37 PM, Shawn Heisey  wrote:

> On 5/9/2015 8:41 PM, Bill Au wrote:
> > Is the behavior of document being indexed independently on each node in a
> > SolrCloud cluster new in 5.x or is that true in 4.x also?
> >
> > If the document is indexed independently on each node, then if I query
> the
> > document from each node directly, a timestamp could hold different values
> > since the document is indexed independently, right?
> >
> >  > default="NOW" />
>
> SolrCloud has had that behavior from day one, when it was released in
> version 4.0.  You are correct that it can result in a different
> timestamp on each replica if the default comes from schema.xml.
>
> I am pretty sure that the solution for this problem is to set up an
> update processor chain that includes TimestampUpdateProcessorFactory to
> populate the timestamp field before the document is distributed to each
> replica.
>
> https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors
>
> Thanks,
> Shawn
>
>


Re: SolrCloud indexing

2015-05-09 Thread Bill Au
Is the behavior of document being indexed independently on each node in a
SolrCloud cluster new in 5.x or is that true in 4.x also?

If the document is indexed independently on each node, then if I query the
document from each node directly, a timestamp could hold different values
since the document is indexed independently, right?



Bill

On Fri, May 8, 2015 at 6:39 PM, Vincenzo D'Amore  wrote:

> I have just added a comment to the CWiki.
> Thanks again for your prompt answer Erick.
>
> Best,
> Vincenzo
>
> On Fri, May 8, 2015 at 12:39 AM, Erick Erickson 
> wrote:
>
> > bq: ...forwards the index notation to itself and any replicas...
> >
> > That's just odd phrasing.
> >
> > All that means is that the document sent through the indexing process
> > on the leader and all followers for a shard and
> > is indexed independently on each.
> >
> > This is as opposed to the old master/slave situation where the master
> > indexed the doc, but the slave got the indexed
> > version as part of a segment when it replicated.
> >
> > Could you add a comment to the CWiki calling the phrasing out? It
> > really is a bit mysterious.
> >
> > Best,
> > Erick
> >
> > On Thu, May 7, 2015 at 2:18 PM, Vincenzo D'Amore 
> > wrote:
> > > Thanks Shawn.
> > >
> > > Just to make the picture more clear, I'm trying to understand why a 3
> > node
> > > solrcloud cluster and a old style solr server take same time to index
> > same
> > > documents.
> > >
> > > But in the wiki is written:
> > >
> > > If the machine is a leader, SolrCloud determines which shard the
> document
> > >> should go to, forwards the document the leader for that shard, indexes
> > the
> > >> document for this shard, and *forwards the index notation to itself
> and
> > >> any replicas*.
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
> > >
> > >
> > > Could you please explain what does it mean "forwards the index
> notation"
> > ?
> > >
> > > On the other hand, on solrcloud I have 3 shards and 2 replicas for each
> > > shard. So, every node is indexing all the documents and this explains
> why
> > > solrcloud consumes same time compared to an old-style solr server.
> > >
> > >
> > >
> > > On Thu, May 7, 2015 at 3:08 PM, Shawn Heisey 
> > wrote:
> > >
> > >> On 5/7/2015 3:04 AM, Vincenzo D'Amore wrote:
> > >> > Thanks Erick. I'm not sure I got your answer.
> > >> >
> > >> > I try to recap, when the raw document has to be indexed, it will be
> > >> > forwarded to shard leader. Shard leader indexes the document for
> that
> > >> > shard, and then forwards the indexed document to any replicas.
> > >> >
> > >> > I want just be sure that when the raw document is forwarded from the
> > >> leader
> > >> > to the replicas it will be indexed only one time on the shard
> leader.
> > >> From
> > >> > what I understand replicas do not indexes, only the leader indexes.
> > >>
> > >> The document is indexed by all replicas.  There is no way to forward
> the
> > >> indexed document, it can only forward the source document ... so each
> > >> replica must index it independently.
> > >>
> > >> The old-style master-slave replication (which existed long before
> > >> SolrCloud) copies the finished Lucene segments, so only the master
> > >> actually does indexing.
> > >>
> > >> SolrCloud doesn't have a master, only multiple replicas, one of which
> is
> > >> elected leader, and replication only comes into the picture if
> there's a
> > >> serious problem and Solr determines that it can't use the transaction
> > >> log to recover the index.
> > >>
> > >> Thanks,
> > >> Shawn
> > >>
> > >>
> > >
> > >
> > > --
> > > Vincenzo D'Amore
> > > email: v.dam...@gmail.com
> > > skype: free.dev
> > > mobile: +39 349 8513251
> >
>
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251
>


solr-user@lucene.apache.org

2015-04-22 Thread Bill Tsay


On 4/22/15, 7:36 AM, "Martin Keller" 
wrote:

>OK, I found the problem and as so often it was sitting in front of the
>display. 
>
>Now the next problem:
>The suggestions returned consist always of a complete text block where
>the match was found. I would have expected a single word or a small
>phrase.
>
>Thanks in advance
>Martin
>
>
>> Am 22.04.2015 um 12:50 schrieb Martin Keller
>>:
>> 
>> Unfortunately, setting suggestAnalyzerFieldType to "text_suggest"
>>didn’t change anything.
>> The suggest dictionary is freshly built.
>> As I mentioned before, only words or phrases of the source field
>>„content“ are not matched.
>> When querying the index, the response only contains „suggestions“ field
>>data not coming from the „content“ field.
>> The complete schema is a slightly modified techproducts schema.
>> „Normal“ searching for words which I would expect coming from „content“
>>works.
>> 
>> Any more ideas?
>> 
>> Thanks 
>> Martin
>> 
>> 
>>> Am 21.04.2015 um 17:39 schrieb Erick Erickson
>>>:
>>> 
>>> Did you build your suggest dictionary after indexing? Kind of a shot
>>>in the
>>> dark but worth a try.
>>> 
>>> Note that the suggest field of your suggester isn't using your
>>>"text_suggest"
>>> field type to make suggestions, it's using "text_general". IOW, the
>>>text may
>>> not be analyzed as you expect.
>>> 
>>> Best,
>>> Erick
>>> 
>>> On Tue, Apr 21, 2015 at 7:16 AM, Martin Keller
>>>  wrote:
 Hello together,
 
 I have some problems with the Solr 5.1.0 suggester.
 I followed the instructions in
https://cwiki.apache.org/confluence/display/solr/Suggester and also
tried the techproducts example delivered with the binary package,
which is working well.
 
 I added a field suggestions-Field to the schema:
 
 >>>stored="true" multiValued="true“/>
 
 
 And added some copies to the field:
 
 
 
 
 
 
 
 
 The field type definition for „text_suggest“ is pretty simple:
 
 >>>positionIncrementGap="100">
   
   
   >>>words="stopwords.txt" />
   
   
 
 
 
 I Also changed the solrconfig.xml to use the suggestions field:
 
 
 
   mySuggester
   FuzzyLookupFactory
   DocumentDictionaryFactory
   suggestions
   text_general
   false
 
 
 
 
 For Tokens original coming from „title" or „author“, I get
suggestions, but not any from the content field.
 So, what do I have to do?
 
 Any help is appreciated.
 
 
 Martin
 
>> 
>



Re: Facet

2015-04-05 Thread Bill Bell
Ok

Clarification

The limit is set to -1. But the average result is 300. 

The amount of strings stored in the field increased a lot. Like 250k to 350k. 
But the amount coming out is limited by facet.prefix. 

Would creating 900 fields be better ? Then I could just put the prefix in the 
field name. Like this: proc_ps122

Thoughts ?

So far I heard solcloud, docvalues as viable solutions. Stay away from enum.

Bill Bell
Sent from mobile


> On Apr 5, 2015, at 2:56 AM, Toke Eskildsen  wrote:
> 
> William Bell  wrote:
> Sent: 05 April 2015 06:20
> To: solr-user@lucene.apache.org
> Subject: Facet
> 
>> We increased our number of terms (String) in a facet by 50,000.
> 
> Do you mean facet.limit=5?
> 
>> Now we are getting an error when we facet by this field - so we switched it 
>> to
>> facet.method=enum, and now the results come back. However, when we put
>> it into production we literally hit a wall (CPU went to 100% for 16 cores)
>> after about 30 minutes live.
> 
> It was strange that enum worked. Internally, the difference between 
> facet.limit=100 and facet.limit=5 is quite small. The real hits are for 
> fine-counting within SolrCloud and serializing the result in order to deliver 
> it to the client. I thought enum behaved the same as fc with regard to those 
> two.
> 
>> We tried adding more machines to reduce the CPU, but it did not help.
> 
> Sounds like SolrCloud. More machines does not help here, it might even be 
> worse. What happens is that distributed faceting is two-phase, where the 
> second phase is fine-counting. The fine-counting essentially makes all shards 
> perform micro-searches for a large part of the terms returned: Your shards 
> are bogged down by tens of thousands of small searches.
> 
> If you are feeling adventurous, you can try putting
> http://tokee.github.io/lucene-solr/
> on a test-installation (I am the author). It changes the way the 
> fine-counting is done.
> 
> 
> Depending on your container, you might need to raise the internal limits for 
> GET-communication. Tomcat has a default of 2MB somewhere (sorry, don't 
> remember the details), which is not a lot for 50,000 values.
> 
>> What are some ideas? We are going to try docValues on the field. Does
>> anyone know if method=fc or method=enum works for docValue? I cannot find
>> any documentation on that.
> 
> If DocValues are enabled, fc will use them. It does not change anything for 
> enum. But I would argue against enum for anything in the thousands anyway.
> 
>> We are thinking of splitting the field into 2 fields (fielda, fieldb). At
>> least the number will be less, but not sure if it will help memory?
> 
> The killer is the number of terms requested/returned.
> 
>> The weird thing is for the first 30 minutes things are performing great.
>> Literally at like 10% CPU across 16 cores, not much memory and normal GC.
> 
> It might be because you have just been lucky. Take a look at
> https://twitter.com/anjacks0n/status/509284768035262464
> for how different performance can be for different result set sizes.
> 
>> Originally the facet was a method=fc. Is there an issue with enum? We have
>> facet.threads=20 set, and not sure this is wise for a enum ?
> 
> Facet threading does not thread within each field, it just means that 
> multiple fields are processed in parallel.
> 
> - Toke Eskildsen


Re: ZFS File System for SOLR 3.6 and SOLR 4

2015-03-28 Thread Bill Bell
Is the an advantage for Xfs over ext4 for Solr ? Anyone done testing?

Bill Bell
Sent from mobile


> On Mar 27, 2015, at 8:14 AM, Shawn Heisey  wrote:
> 
>> On 3/27/2015 12:30 AM, abhi Abhishek wrote:
>> i am trying to use ZFS as filesystem for my Linux Environment. are
>> there any performance implications of using any filesystem other than
>> ext-3/ext-4 with SOLR?
> 
> That should work with no problem.
> 
> The only time Solr tends to have problems is if you try to use a network
> filesystem.  As long as it's a local filesystem and it implements
> everything a program can typically expect from a local filesystem, Solr
> should work perfectly.
> 
> Because of the compatibility problems that the license for ZFS has with
> the GPL, ZFS on Linux is probably not as well tested as other
> filesystems like ext4, xfs, or btrfs, but I have not heard about any big
> problems, so it's probably safe.
> 
> Thanks,
> Shawn
> 


Re: How to boost documents at index time?

2015-03-28 Thread Bill Bell
Issue a Jura ticket ?

Did you try debugQuery ?

Bill Bell
Sent from mobile


> On Mar 28, 2015, at 1:49 AM, CKReddy Bhimavarapu  wrote:
> 
> I am want to boost docs at index time, I am doing this using boost
> parameter in doc field .
> but I can't see direct impact on the  doc by using  debuQuery.
> 
> My question is that is there any other way to boost doc at index time and
> can see the reflected changes i.e direct impact.
> 
> -- 
> ckreddybh. 


Re: Sort on multivalued attributes

2015-02-09 Thread Bill Bell
Definitely needed !!

Bill Bell
Sent from mobile


> On Feb 9, 2015, at 5:51 AM, Jan Høydahl  wrote:
> 
> Sure, vote for it. Number of votes do not directly make prioritized sooner.
> So you better also add a comment to the JIRA, it will raise committer's 
> attention.
> Even better of course is if you are able to help bring the issue forward by 
> submitting patches.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
>> 9. feb. 2015 kl. 12.15 skrev Flavio Pompermaier :
>> 
>> Do I have to vote for it..?
>> 
>>> On Mon, Feb 9, 2015 at 11:50 AM, Jan Høydahl  wrote:
>>> 
>>> See https://issues.apache.org/jira/browse/SOLR-2522
>>> 
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>> 
>>>> 9. feb. 2015 kl. 10.30 skrev Flavio Pompermaier :
>>>> 
>>>> In my use case it could be very helpful because I use the SIREn plugin to
>>>> index arbitrary JSON-LD and this plugin automatically index also all
>>> nested
>>>> attributes as a Solr field.
>>>> Thus I need for example to gather all entries with a certain value of the
>>>> "type" attribute, ordered by "name" (but name could be a multivalued
>>>> attribute in my use case :( )
>>>> I'd like to avoid to switch to Elasticsearch just to have this single
>>>> feature.
>>>> 
>>>> Thanks for the support,
>>>> Flavio
>>>> 
>>>> On Mon, Feb 9, 2015 at 10:02 AM, Anshum Gupta 
>>>> wrote:
>>>> 
>>>>> Sure, that's correct and makes sense in some use cases. I'll need to
>>> check
>>>>> if Solr functions support such a thing.
>>>>> 
>>>>> On Mon, Feb 9, 2015 at 12:47 AM, Flavio Pompermaier <
>>> pomperma...@okkam.it>
>>>>> wrote:
>>>>> 
>>>>>> I saw that this is possible in Lucene (
>>>>>> https://issues.apache.org/jira/browse/LUCENE-5454) and also in
>>>>>> Elasticsearch. Or am I wrong?
>>>>>> 
>>>>>> On Mon, Feb 9, 2015 at 9:05 AM, Anshum Gupta 
>>>>>> wrote:
>>>>>> 
>>>>>>> Unless I'm missing something here, sorting on a multi-valued field
>>>>> would
>>>>>> be
>>>>>>> non-deterministic in nature.
>>>>>>> 
>>>>>>> On Sun, Feb 8, 2015 at 11:59 PM, Flavio Pompermaier <
>>>>>> pomperma...@okkam.it>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi to all,
>>>>>>>> 
>>>>>>>> Is there any possibility that in the near future Solr could support
>>>>>>> sorting
>>>>>>>> on multivalued fields?
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Flavio
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Anshum Gupta
>>>>>>> http://about.me/anshumgupta
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Anshum Gupta
>>>>> http://about.me/anshumgupta
> 


Re: Collations are not working fine.

2015-02-09 Thread Bill Bell
Can you order the collation a by highest to lowest hits ?

Bill Bell
Sent from mobile


> On Feb 9, 2015, at 6:47 AM, Nitin Solanki  wrote:
> 
> I am working on spell checking in Solr. I have implemented Suggestions and
> collations in my spell checker component.
> 
> Most of the time collations work fine but in few case it fails.
> 
> *Working*:
> I tried query:*gone wthh thes wnd*: In this "wnd" doesn't give suggestion
> "wind" but collation is coming right = "gone with the wind", hits = 117
> 
> 
> *Not working:*
> But when I tried query: *gone wthh thes wint*: In this "wint" does give
> suggestion "wind" but collation is not coming right. Instead of gone with
> the wind it gives gone with the west, hits = 1.
> 
> And I want to also know what is *hits* in collations.


timestamp field and atomic updates

2015-01-30 Thread Bill Au
I have a timestamp field in my schema to track when each doc was indexed:



Recently, we have switched over to use atomic update instead of re-indexing
when we need to update a doc in the index.  It looks to me that the
timestamp field is not updated during an atomic update.  I have also looked
into TimestampUpdateProcessorFactory and it looks to me that won't help in
my case.

Is there anything within Solr that I can use to update the timestamp during
atomic update, or do I have to explicitly include the timestamp field as
part of the atomic update?

Bill


Re: How large is your solr index?

2015-01-03 Thread Bill Bell
For Solr 5 why don't we switch it to 64 bit ??

Bill Bell
Sent from mobile


> On Dec 29, 2014, at 1:53 PM, Jack Krupansky  wrote:
> 
> And that Lucene index document limit includes deleted and updated
> documents, so even if your actual document count stays under 2^31-1,
> deleting and updating documents can push the apparent document count over
> the limit unless you very aggressively merge segments to expunge deleted
> documents.
> 
> -- Jack Krupansky
> 
> -- Jack Krupansky
> 
> On Mon, Dec 29, 2014 at 12:54 PM, Erick Erickson 
> wrote:
> 
>> When you say 2B docs on a single Solr instance, are you talking only one
>> shard?
>> Because if you are, you're very close to the absolute upper limit of a
>> shard, internally
>> the doc id is an int or 2^31. 2^31 + 1 will cause all sorts of problems.
>> 
>> But yeah, your 100B documents are going to use up a lot of servers...
>> 
>> Best,
>> Erick
>> 
>> On Mon, Dec 29, 2014 at 7:24 AM, Bram Van Dam 
>> wrote:
>>> Hi folks,
>>> 
>>> I'm trying to get a feel of how large Solr can grow without slowing down
>> too
>>> much. We're looking into a use-case with up to 100 billion documents
>>> (SolrCloud), and we're a little afraid that we'll end up requiring 100
>>> servers to pull it off.
>>> 
>>> The largest index we currently have is ~2billion documents in a single
>> Solr
>>> instance. Documents are smallish (5k each) and we have ~50 fields in the
>>> schema, with an index size of about 2TB. Performance is mostly OK. Cold
>>> searchers take a while, but most queries are alright after warming up. I
>>> wish I could provide more statistics, but I only have very limited
>> access to
>>> the data (...banks...).
>>> 
>>> I'd very grateful to anyone sharing statistics, especially on the larger
>> end
>>> of the spectrum -- with or without SolrCloud.
>>> 
>>> Thanks,
>>> 
>>> - Bram
>> 


Re: Old facet value doesn't go away after index update

2014-12-19 Thread Bill Bell
Set mincount=1

Bill Bell
Sent from mobile


> On Dec 19, 2014, at 12:22 PM, Tang, Rebecca  wrote:
> 
> Hi there,
> 
> I have an index that has a field called collection_facet.
> 
> There was a value 'Ness Motley Law Firm Documents' that we wanted to update 
> to 'Ness Motley Law Firm'.  There were 36,132 records with this value.  So I 
> re-indexed just the 36,132 records.  After the update, I ran a facet query 
> (q=*:*&facet=true&facet.field=collection_facet) to see if the value got 
> updated and I saw
> Ness Motley Law Firm 36,132  -- as expected
> Ness Motley Law Firm Documents 0 — Why is this value still here even though 
> clearly there are no records with this value anymore?  I thought maybe it was 
> cached, so I restarted solr, but I still got the same results.
> 
> "facet_fields": { "collection_facet": [
> … "Ness Motley Law Firm", 36132,
> … "Ness Motley Law Firm Documents", 0 ]
> 
> 
> 
> Rebecca Tang
> Applications Developer, UCSF CKM
> Legacy Tobacco Document Library
> E: rebecca.t...@ucsf.edu


Too much Lucene code to refactor but I like SolrCloud

2014-12-03 Thread Bill Drake
I have an existing application that includes Lucene code. I want to add
high availability. From what I have read SolrCloud looks like an effective
approach. My problem is that there is a lot of Lucene code; out of 100+
java files in the application more than 20 of them are focused on Lucene
code.  Refactoring this much code seems very risky.

My thought was to migrate the index from Lucene 35 to Solr/Lucene 4.10 then
after making sure everything still works I would add in the HA. I looked at
solrj with the hope that it would look like Lucene but it did not look like
it would simplify the transition.

So the question is can I leave most of the existing Lucene code and just
make small changes to get the benefit of HA from SolrCloud. Is there a
better approach?


Re: fq syntax for requiring all multiValued field values to be within a list?

2014-09-27 Thread White, Bill
Yes, that was it, thank you!

On Sat, Sep 27, 2014 at 5:28 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> http://wiki.apache.org/solr/SchemaXml#Default_query_parser_operator ?
> once again, debugQuery=true perfectly explains what's going on with q.
>
> On Sun, Sep 28, 2014 at 1:24 AM, White, Bill  wrote:
>
> > It worked for me once I changed to
> >
> > -color:({* TO red} OR {red TO *})
> >
> > I'm not sure why the OR is needed, maybe it's my version? (4.6.1)
> >
> > On Sat, Sep 27, 2014 at 5:22 PM, Mikhail Khludnev <
> > mkhlud...@griddynamics.com> wrote:
> >
> > > hm. try to convert it to query q=-color:({* TO red} {red TO *}) and
> check
> > > the explanation from debugQuery=true
> > >
> > > I tried to play with my index
> > >
> > >   "q": "*:*",
> > >   "facet.field": "swatchColors_string_mv",
> > >   "fq": "-swatchColors_string_mv:({* TO RED} {RED TO *})",
> > >
> > > I got the following facets:
> > >
> > >   "facet_fields": {
> > >   "swatchColors_string_mv": [
> > >     "RED",
> > > 122,
> > > "BLACK",
> > > 0,
> > > "BLUE",
> > > 0,
> > > "BROWN",
> > > 0,
> > > "GREEN",
> > > 0,
> > >
> > > so, it works for me at least...
> > >
> > >
> > >
> > > On Sun, Sep 28, 2014 at 12:54 AM, White, Bill  wrote:
> > >
> > > > Hmm.  If I understand correctly this builds a set out of open
> intervals
> > > > (exclusive ranges), that's a great idea!
> > > >
> > > > It doesn't seem to work for me, though;  fq=-color:({* TO red} {red
> TO
> > > *})
> > > > is giving me results with color="burnt sienna"
> > > >
> > > > The field is defined as  > indexed="true"
> > > > stored="true" multiValued="true" />
> > > >
> > > > On Sat, Sep 27, 2014 at 4:43 PM, Mikhail Khludnev <
> > > > mkhlud...@griddynamics.com> wrote:
> > > >
> > > > > indeed!
> > > > > the exclusive range {green TO red} matches to the "lemon yellow"
> > > > > hence, the negation suppresses it from appearing
> > > > > fq=-color:{green TO red}
> > > > > then you need to suppress eg black and white also
> > > > > fq=-color:({* TO green} {green TO red} {red TO *})
> > > > >
> > > > > I have no control over the
> > > > > > possible values of 'color',
> > > > >
> > > > > You don't need to control possible values, you just suppressing any
> > > > values
> > > > > beside of the given green and red.
> > > > > Mind that either green or red passes that negation of exclusive
> > ranges
> > > > > disjunction.
> > > > >
> > > > >
> > > > > On Sun, Sep 28, 2014 at 12:15 AM, White, Bill 
> > wrote:
> > > > >
> > > > > > OK, let me try phrasing it better.
> > > > > >
> > > > > > How do I exclude from search, any result which contains any value
> > for
> > > > > > multivalued field 'color' which is not within a given "constraint
> > > set"
> > > > > > (e.g., "red", "green", "yellow", "burnt sienna"), given that I do
> > not
> > > > > what
> > > > > > any of the other possible values of 'color' are?
> > > > > >
> > > > > > In pseudocode:
> > > > > >
> > > > > > for all x in result.color
> > > > > > if x not in ("red","green","yellow", "burnt sienna")
> > > > > > filter out result
> > > > > >
> > > > > > I don't see how range queries would work since I have no control
> > over
> > > > the
> > > > > > possible values of 'color', e.g., there could be a valid color
> > "lemon
> > > > > > yellow" between "green" and "red", and I d

Re: fq syntax for requiring all multiValued field values to be within a list?

2014-09-27 Thread White, Bill
It worked for me once I changed to

-color:({* TO red} OR {red TO *})

I'm not sure why the OR is needed, maybe it's my version? (4.6.1)

On Sat, Sep 27, 2014 at 5:22 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> hm. try to convert it to query q=-color:({* TO red} {red TO *}) and check
> the explanation from debugQuery=true
>
> I tried to play with my index
>
>   "q": "*:*",
>   "facet.field": "swatchColors_string_mv",
>   "fq": "-swatchColors_string_mv:({* TO RED} {RED TO *})",
>
> I got the following facets:
>
>   "facet_fields": {
>   "swatchColors_string_mv": [
> "RED",
> 122,
> "BLACK",
> 0,
> "BLUE",
> 0,
> "BROWN",
> 0,
> "GREEN",
> 0,
>
> so, it works for me at least...
>
>
>
> On Sun, Sep 28, 2014 at 12:54 AM, White, Bill  wrote:
>
> > Hmm.  If I understand correctly this builds a set out of open intervals
> > (exclusive ranges), that's a great idea!
> >
> > It doesn't seem to work for me, though;  fq=-color:({* TO red} {red TO
> *})
> > is giving me results with color="burnt sienna"
> >
> > The field is defined as  > stored="true" multiValued="true" />
> >
> > On Sat, Sep 27, 2014 at 4:43 PM, Mikhail Khludnev <
> > mkhlud...@griddynamics.com> wrote:
> >
> > > indeed!
> > > the exclusive range {green TO red} matches to the "lemon yellow"
> > > hence, the negation suppresses it from appearing
> > > fq=-color:{green TO red}
> > > then you need to suppress eg black and white also
> > > fq=-color:({* TO green} {green TO red} {red TO *})
> > >
> > > I have no control over the
> > > > possible values of 'color',
> > >
> > > You don't need to control possible values, you just suppressing any
> > values
> > > beside of the given green and red.
> > > Mind that either green or red passes that negation of exclusive ranges
> > > disjunction.
> > >
> > >
> > > On Sun, Sep 28, 2014 at 12:15 AM, White, Bill  wrote:
> > >
> > > > OK, let me try phrasing it better.
> > > >
> > > > How do I exclude from search, any result which contains any value for
> > > > multivalued field 'color' which is not within a given "constraint
> set"
> > > > (e.g., "red", "green", "yellow", "burnt sienna"), given that I do not
> > > what
> > > > any of the other possible values of 'color' are?
> > > >
> > > > In pseudocode:
> > > >
> > > > for all x in result.color
> > > > if x not in ("red","green","yellow", "burnt sienna")
> > > > filter out result
> > > >
> > > > I don't see how range queries would work since I have no control over
> > the
> > > > possible values of 'color', e.g., there could be a valid color "lemon
> > > > yellow" between "green" and "red", and I don't want a result which
> has
> > > > (color: red, color: "lemon yellow")
> > > >
> > > > On Sat, Sep 27, 2014 at 4:02 PM, Mikhail Khludnev <
> > > > mkhlud...@griddynamics.com> wrote:
> > > >
> > > > > On Sat, Sep 27, 2014 at 11:36 PM, White, Bill 
> > wrote:
> > > > >
> > > > > > but do NOT match ANY other color.
> > > > >
> > > > >
> > > > > Bill, I miss the whole picture, it's worth to rephrase the problem
> in
> > > one
> > > > > sentence.
> > > > > But regarding the quote above, you can try to use exclusive ranges
> > > > >
> > > > >
> > > >
> > >
> >
> https://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Range_Searches
> > > > > fq=-color:({* TO green} {green TO red} {red TO *})
> > > > > just don't forget to build ranges alphabetically
> > > > >
> > > > > --
> > > > > Sincerely yours
> > > > > Mikhail Khludnev
> > > > > Principal Engineer,
> > > > > Grid Dynamics
> > > > >
> > > > > <http://www.griddynamics.com>
> > > > > 
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > > Principal Engineer,
> > > Grid Dynamics
> > >
> > > <http://www.griddynamics.com>
> > > 
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> 
>


Re: fq syntax for requiring all multiValued field values to be within a list?

2014-09-27 Thread White, Bill
Hmm.  If I understand correctly this builds a set out of open intervals
(exclusive ranges), that's a great idea!

It doesn't seem to work for me, though;  fq=-color:({* TO red} {red TO *})
is giving me results with color="burnt sienna"

The field is defined as 

On Sat, Sep 27, 2014 at 4:43 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> indeed!
> the exclusive range {green TO red} matches to the "lemon yellow"
> hence, the negation suppresses it from appearing
> fq=-color:{green TO red}
> then you need to suppress eg black and white also
> fq=-color:({* TO green} {green TO red} {red TO *})
>
> I have no control over the
> > possible values of 'color',
>
> You don't need to control possible values, you just suppressing any values
> beside of the given green and red.
> Mind that either green or red passes that negation of exclusive ranges
> disjunction.
>
>
> On Sun, Sep 28, 2014 at 12:15 AM, White, Bill  wrote:
>
> > OK, let me try phrasing it better.
> >
> > How do I exclude from search, any result which contains any value for
> > multivalued field 'color' which is not within a given "constraint set"
> > (e.g., "red", "green", "yellow", "burnt sienna"), given that I do not
> what
> > any of the other possible values of 'color' are?
> >
> > In pseudocode:
> >
> > for all x in result.color
> > if x not in ("red","green","yellow", "burnt sienna")
> > filter out result
> >
> > I don't see how range queries would work since I have no control over the
> > possible values of 'color', e.g., there could be a valid color "lemon
> > yellow" between "green" and "red", and I don't want a result which has
> > (color: red, color: "lemon yellow")
> >
> > On Sat, Sep 27, 2014 at 4:02 PM, Mikhail Khludnev <
> > mkhlud...@griddynamics.com> wrote:
> >
> > > On Sat, Sep 27, 2014 at 11:36 PM, White, Bill  wrote:
> > >
> > > > but do NOT match ANY other color.
> > >
> > >
> > > Bill, I miss the whole picture, it's worth to rephrase the problem in
> one
> > > sentence.
> > > But regarding the quote above, you can try to use exclusive ranges
> > >
> > >
> >
> https://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Range_Searches
> > > fq=-color:({* TO green} {green TO red} {red TO *})
> > > just don't forget to build ranges alphabetically
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > > Principal Engineer,
> > > Grid Dynamics
> > >
> > > <http://www.griddynamics.com>
> > > 
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> 
>


Re: fq syntax for requiring all multiValued field values to be within a list?

2014-09-27 Thread White, Bill
Thanks!

On Sat, Sep 27, 2014 at 4:18 PM, Yonik Seeley  wrote:

> On Sat, Sep 27, 2014 at 3:46 PM, White, Bill  wrote:
> > Hmm, that won't work since color is free-form.
> >
> > Is there a way to invoke (via fq) a user-defined function (hopefully
> > defined as part of the fq syntax, but alternatively, written in Java) and
> > have it applied to the resultset?
>
> https://wiki.apache.org/solr/SolrPlugins#QParserPlugin
>
> -Yonik
> http://heliosearch.org - native code faceting, facet functions,
> sub-facets, off-heap data
>


Re: fq syntax for requiring all multiValued field values to be within a list?

2014-09-27 Thread White, Bill
OK, let me try phrasing it better.

How do I exclude from search, any result which contains any value for
multivalued field 'color' which is not within a given "constraint set"
(e.g., "red", "green", "yellow", "burnt sienna"), given that I do not what
any of the other possible values of 'color' are?

In pseudocode:

for all x in result.color
if x not in ("red","green","yellow", "burnt sienna")
filter out result

I don't see how range queries would work since I have no control over the
possible values of 'color', e.g., there could be a valid color "lemon
yellow" between "green" and "red", and I don't want a result which has
(color: red, color: "lemon yellow")

On Sat, Sep 27, 2014 at 4:02 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> On Sat, Sep 27, 2014 at 11:36 PM, White, Bill  wrote:
>
> > but do NOT match ANY other color.
>
>
> Bill, I miss the whole picture, it's worth to rephrase the problem in one
> sentence.
> But regarding the quote above, you can try to use exclusive ranges
>
> https://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Range_Searches
> fq=-color:({* TO green} {green TO red} {red TO *})
> just don't forget to build ranges alphabetically
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> 
>


Re: fq syntax for requiring all multiValued field values to be within a list?

2014-09-27 Thread White, Bill
Hmm, that won't work since color is free-form.

Is there a way to invoke (via fq) a user-defined function (hopefully
defined as part of the fq syntax, but alternatively, written in Java) and
have it applied to the resultset?

On Sat, Sep 27, 2014 at 3:41 PM, Yonik Seeley  wrote:

> On Sat, Sep 27, 2014 at 3:36 PM, White, Bill  wrote:
> > Sorry, color is multivalued, so a given record might be both blue and
> red.
> > I don't want those to show up in the results.
>
> I think the only way currently (out of the box) is to enumerate the
> other possible colors to exclude them.
>
> color:(red yellow green)  -color:(blue cyan xxx)
>
> -Yonik
> http://heliosearch.org - native code faceting, facet functions,
> sub-facets, off-heap data
>
>
>
> > On Sat, Sep 27, 2014 at 3:36 PM, White, Bill  wrote:
> >
> >> Not just that.  I'm looking for things which match either red or yellow
> or
> >> green, but do NOT match ANY other color.  I can probably drop the
> >> requirement related to having no color.
> >>
> >> On Sat, Sep 27, 2014 at 3:28 PM, Yonik Seeley 
> >> wrote:
> >>
> >>> On Sat, Sep 27, 2014 at 2:52 PM, White, Bill  wrote:
> >>> > Hello,
> >>> >
> >>> > I've attempted to figure this out from reading the documentation but
> >>> > without much luck.  I looked for a comprehensive query syntax
> >>> specification
> >>> > (e.g., with BNF and a list of operator semantics) but I'm unable to
> find
> >>> > such a document (does such a thing exist? or is the syntax too much
> of a
> >>> > moving target?)
> >>> >
> >>> > I'm using 4.6.1, if that makes a difference, though upgrading is an
> >>> option
> >>> > if it necessary to make this work.
> >>> >
> >>> > I've got a multiValued field "color", which describes the colors of
> >>> item in
> >>> > the database.  Items can have zero or more colors.  What I want is
> to be
> >>> > able to filter out all hits that contain colors not within a
> >>> constraining
> >>> > list, i.e., something like
> >>> >
> >>> > NOT (color NOT IN ("red","yellow","green")).
> >>> >
> >>> > So the following would be passed by the filter:
> >>> > (no value for 'color')
> >>> > color: red
> >>> > color: red, color: green
> >>> >
> >>> > whereas these would be excluded:
> >>> > color: red, color: blue
> >>> > color: magenta
> >>>
> >>> You're looking for things that either match red, yellow, or green, or
> >>> have no color:
> >>>
> >>> color:(red yellow green) OR (*:* -color:*)
> >>>
> >>> -Yonik
> >>> http://heliosearch.org - native code faceting, facet functions,
> >>> sub-facets, off-heap data
> >>>
> >>
> >>
>


Re: fq syntax for requiring all multiValued field values to be within a list?

2014-09-27 Thread White, Bill
Sorry, color is multivalued, so a given record might be both blue and red.
I don't want those to show up in the results.

On Sat, Sep 27, 2014 at 3:36 PM, White, Bill  wrote:

> Not just that.  I'm looking for things which match either red or yellow or
> green, but do NOT match ANY other color.  I can probably drop the
> requirement related to having no color.
>
>
> On Sat, Sep 27, 2014 at 3:28 PM, Yonik Seeley 
> wrote:
>
>> On Sat, Sep 27, 2014 at 2:52 PM, White, Bill  wrote:
>> > Hello,
>> >
>> > I've attempted to figure this out from reading the documentation but
>> > without much luck.  I looked for a comprehensive query syntax
>> specification
>> > (e.g., with BNF and a list of operator semantics) but I'm unable to find
>> > such a document (does such a thing exist? or is the syntax too much of a
>> > moving target?)
>> >
>> > I'm using 4.6.1, if that makes a difference, though upgrading is an
>> option
>> > if it necessary to make this work.
>> >
>> > I've got a multiValued field "color", which describes the colors of
>> item in
>> > the database.  Items can have zero or more colors.  What I want is to be
>> > able to filter out all hits that contain colors not within a
>> constraining
>> > list, i.e., something like
>> >
>> > NOT (color NOT IN ("red","yellow","green")).
>> >
>> > So the following would be passed by the filter:
>> > (no value for 'color')
>> > color: red
>> > color: red, color: green
>> >
>> > whereas these would be excluded:
>> > color: red, color: blue
>> > color: magenta
>>
>> You're looking for things that either match red, yellow, or green, or
>> have no color:
>>
>> color:(red yellow green) OR (*:* -color:*)
>>
>> -Yonik
>> http://heliosearch.org - native code faceting, facet functions,
>> sub-facets, off-heap data
>>
>
>


Re: fq syntax for requiring all multiValued field values to be within a list?

2014-09-27 Thread White, Bill
Not just that.  I'm looking for things which match either red or yellow or
green, but do NOT match ANY other color.  I can probably drop the
requirement related to having no color.


On Sat, Sep 27, 2014 at 3:28 PM, Yonik Seeley  wrote:

> On Sat, Sep 27, 2014 at 2:52 PM, White, Bill  wrote:
> > Hello,
> >
> > I've attempted to figure this out from reading the documentation but
> > without much luck.  I looked for a comprehensive query syntax
> specification
> > (e.g., with BNF and a list of operator semantics) but I'm unable to find
> > such a document (does such a thing exist? or is the syntax too much of a
> > moving target?)
> >
> > I'm using 4.6.1, if that makes a difference, though upgrading is an
> option
> > if it necessary to make this work.
> >
> > I've got a multiValued field "color", which describes the colors of item
> in
> > the database.  Items can have zero or more colors.  What I want is to be
> > able to filter out all hits that contain colors not within a constraining
> > list, i.e., something like
> >
> > NOT (color NOT IN ("red","yellow","green")).
> >
> > So the following would be passed by the filter:
> > (no value for 'color')
> > color: red
> > color: red, color: green
> >
> > whereas these would be excluded:
> > color: red, color: blue
> > color: magenta
>
> You're looking for things that either match red, yellow, or green, or
> have no color:
>
> color:(red yellow green) OR (*:* -color:*)
>
> -Yonik
> http://heliosearch.org - native code faceting, facet functions,
> sub-facets, off-heap data
>


fq syntax for requiring all multiValued field values to be within a list?

2014-09-27 Thread White, Bill
Hello,

I've attempted to figure this out from reading the documentation but
without much luck.  I looked for a comprehensive query syntax specification
(e.g., with BNF and a list of operator semantics) but I'm unable to find
such a document (does such a thing exist? or is the syntax too much of a
moving target?)

I'm using 4.6.1, if that makes a difference, though upgrading is an option
if it necessary to make this work.

I've got a multiValued field "color", which describes the colors of item in
the database.  Items can have zero or more colors.  What I want is to be
able to filter out all hits that contain colors not within a constraining
list, i.e., something like

NOT (color NOT IN ("red","yellow","green")).

So the following would be passed by the filter:
(no value for 'color')
color: red
color: red, color: green

whereas these would be excluded:
color: red, color: blue
color: magenta


Nothing I've come up with so far, e.g. -(-color: "red" -color: "green"),
seems to work.

I've also looked into using a function query but it seems to lack operators
for dealing with string multivalued fields.

Ideas?

Thanks,
Bill


Re: Solr Dynamic Field Performance

2014-09-14 Thread Bill Bell
How about perf if you dynamically create 5000 fields ?

Bill Bell
Sent from mobile


> On Sep 14, 2014, at 10:06 AM, Erick Erickson  wrote:
> 
> Dynamic fields, once they are actually _in_ a document, aren't any
> different than statically defined fields. Literally, there's no place
> in the search code that I know of that _ever_ has to check
> whether a field was dynamically or statically defined.
> 
> AFAIK, the only additional cost would be figuring out which pattern
> matched at index time, which is such a tiny portion of the cost of
> indexing that I doubt you could measure it.
> 
> Best,
> Erick
> 
> On Sun, Sep 14, 2014 at 7:58 AM, Saumitra Srivastav
>  wrote:
>> I have a collection with 200 fields and >300M docs running in cloud mode.
>> Each doc have around 20 fields. I now have a use case where I need to
>> replace these explicit fields with 6 dynamic fields. Each of these 200
>> fields will match one of the 6 dynamic field.
>> 
>> I am evaluating performance implications of switching to dynamicFields. I
>> have tested with a smaller dataset(5M docs) but didn't noticed any indexing
>> or query performance degradation.
>> 
>> Query on dynamic fields will either be faceting, range query or full text
>> search.
>> 
>> Are there any known performance issues with using dynamicFields instead of
>> explicit ones?
>> 
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Solr-Dynamic-Field-Performance-tp4158737.html
>> Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to solve?

2014-09-06 Thread Bill Bell
Yeah we already use it. I will try to create a custom functionif I get it 
to work I will post.

The challenge for me is how to dynamically match and add them based in the 
faceting.

Here is a better example.

The doctor core has payload as name:val. The "name" are doctor specialties. I 
need to pull back by the name since the user faceted on a specialty. So far 
payloads work. But the user now wants to facet on another specialty. For 
example they are looking for a cardiologist and an internal medicine doctor and 
if the doctor practices at the same hospital I need to take the values and add 
them. Else take the max value for the 2 specialties. 

Make sense now ?

Seems like I need to create a payload and my own custom function.

Bill Bell
Sent from mobile


> On Sep 6, 2014, at 12:57 PM, Erick Erickson  wrote:
> 
> Here's a blog with an end-to-end example. Jack's right, it takes some
> configuration and having first-class support in Solr would be a good
> thing...
> 
> http://searchhub.org/2014/06/13/end-to-end-payload-example-in-solr/
> 
> Best,
> Erick
> 
>> On Sat, Sep 6, 2014 at 10:24 AM, Jack Krupansky  
>> wrote:
>> Payload really don't have first class support in Solr. It's a solid feature
>> of Lucene, but never expressed well in Solr. Any thoughts or proposals are
>> welcome!
>> 
>> (Hmmm... I wonder what the good folks at Heliosearch have up their sleeves
>> in this area?!)
>> 
>> -- Jack Krupansky
>> 
>> -Original Message- From: William Bell
>> Sent: Friday, September 5, 2014 10:03 PM
>> To: solr-user@lucene.apache.org
>> Subject: How to solve?
>> 
>> 
>> We have a core with each document as a person.
>> 
>> We want to boost based on the sweater color, but if the person has sweaters
>> in their closet which are the same manufactuer we want to boost even more
>> by adding them together.
>> 
>> Peter Smit - Sweater: Blue = 1 : Nike, Sweater: Red = 2: Nike, Sweater:
>> Blue=1 : Polo
>> Tony S - Sweater: Red =2: Nike
>> Bill O - Sweater:Red = 2: Polo, Blue=1: Polo
>> 
>> Scores:
>> 
>> Peter Smit - 1+2 = 3.
>> Tony S - 2
>> Bill O - 2 + 1
>> 
>> I thought about using payloads.
>> 
>> sweaters_payload
>> Blue: Nike: 1
>> Red: Nike: 2
>> Blue: Polo: 1
>> 
>> How do I query this?
>> 
>> http://localhost:8983/solr/persons?q=*:*&sort=??
>> 
>> Ideas?
>> 
>> 
>> 
>> 
>> --
>> Bill Bell
>> billnb...@gmail.com
>> cell 720-256-8076


Re: embedded documents

2014-08-24 Thread Bill Bell
See my Jira. It supports it via json.fsuffix=_json&wt=json

http://mail-archives.apache.org/mod_mbox/lucene-dev/201304.mbox/%3CJIRA.12641293.1365394604231.125944.1365397875874@arcas%3E

Bill Bell
Sent from mobile


> On Aug 24, 2014, at 6:43 AM, "Jack Krupansky"  wrote:
> 
> Indexing and query of raw JSON would be a valuable addition to Solr, so maybe 
> you could simply explain more precisely your data model and transformation 
> rules. For example, when multi-level nesting occurs, what does your loader do?
> 
> Maybe if the fielld names were derived by concatenating the full path of JSON 
> key names, like titles_json.FR, field_naming nesting could be handled in a 
> fully automated manner.
> 
> I had been thinking of filing a Jira proposing exactly that, so that even the 
> most deeply nested JSON maps could be supported, although combinations of 
> arrays and maps would be problematic.
> 
> -- Jack Krupansky
> 
> -Original Message- From: Michael Pitsounis
> Sent: Wednesday, August 20, 2014 7:14 PM
> To: solr-user@lucene.apache.org
> Subject: embedded documents
> 
> Hello everybody,
> 
> I had a requirement to store complicated json documents in solr.
> 
> i have modified the JsonLoader to accept complicated json documents with
> arrays/objects as values.
> 
> It stores the object/array and then flatten it and  indexes the fields.
> 
> e.g  basic example document
> 
> {
>   "titles_json":{"FR":"This is the FR title" , "EN":"This is the EN
> title"} ,
>   "id": 103,
>   "guid": "3b2f2998-85ac-4a4e-8867-beb551c0b3c6"
>  }
> 
> It will store titles_json:{"FR":"This is the FR title" , "EN":"This is the
> EN title"}
> and then index fields
> 
> titles.FR:"This is the FR title"
> titles.EN:"This is the EN title"
> 
> 
> Do you see any problems with this approach?
> 
> 
> 
> Regards,
> Michael Pitsounis 


Re: SolrCloud Scale Struggle

2014-08-02 Thread Bill Bell
Auto correct not good

Corrected below 

Bill Bell
Sent from mobile


> On Aug 2, 2014, at 11:11 AM, Bill Bell  wrote:
> 
> Seems way overkill. Are you using /get at all ? If you need the docs avail 
> right away - why ? How about after 30 seconds ? How many docs do you get 
> added per second during peak ? Even Google has a delay when you do Adwords. 
> 
> One idea is to have an empty core that you insert into and then shard into 
> the queries. So one core would be called newdocs and then you would add this 
> core into your query. There are a couple issues with this with scoring but it 
> works nicely. I would not even use Solrcloud for that core.
> 
> Try to reduce number of Java instances running. Reduce memory and use one 
> java per machine. 
> 
> Then if you need faster avail of docs you really need to ask why. Why not 
> later? Do you need search or just showing the user the info ? If for showing 
> maybe query a indexed table for the few not yet indexed ?? Or just store in a 
> db to show the user the info and index later?
> 
> Bill Bell
> Sent from mobile
> 
> 
>> On Aug 1, 2014, at 4:19 AM, "anand.mahajan"  wrote:
>> 
>> Hello all,
>> 
>> Struggling to get this going with SolrCloud - 
>> 
>> Requirement in brief :
>> - Ingest about 4M Used Cars listings a day and track all unique cars for
>> changes
>> - 4M automated searches a day (during the ingestion phase to check if a doc
>> exists in the index (based on values of 4-5 key fields) or it is a new one
>> or an updated version)
>> - Of the 4 M - About 3M Updates to existing docs (for every non-key value
>> change)
>> - About 1M inserts a day (I'm assuming these many new listings come in
>> every day)
>> - Daily Bulk CSV exports of inserts / updates in last 24 hours of various
>> snapshots of the data to various clients
>> 
>> My current deployment : 
>> i) I'm using Solr 4.8 and have set up a SolrCloud with 6 dedicated machines
>> - 24 Core + 96 GB RAM each.
>> ii)There are over 190M docs in the SolrCloud at the moment (for all
>> replicas its consuming overall disk 2340GB which implies - each doc is at
>> about 5-8kb in size.)
>> iii) The docs are split into 36 Shards - and 3 replica per shard (in all
>> 108 Solr Jetty processes split over 6 Servers leaving about 18 Jetty JVMs
>> running on each host)
>> iv) There are 60 fields per doc and all fields are stored at the moment  :( 
>> (The backend is only Solr at the moment)
>> v) The current shard/routing key is a combination of Car Year, Make and
>> some other car level attributes that help classify the cars
>> vi) We are mostly using the default Solr config as of now - no heavy caching
>> as the search is pretty random in nature 
>> vii) Autocommit is on - with maxDocs = 1
>> 
>> Current throughput & Issues :
>> With the above mentioned deployment the daily throughout is only at about
>> 1.5M on average (Inserts + Updates) - falling way short of what is required.
>> Search is slow - Some queries take about 15 seconds to return - and since
>> insert is dependent on at least one Search that degrades the write
>> throughput too. (This is not a Solr issue - but the app demands it so)
>> 
>> Questions :
>> 
>> 1. Autocommit with maxDocs = 1 - is that a goof up and could that be slowing
>> down indexing? Its a requirement that all docs are available as soon as
>> indexed.
>> 
>> 2. Should I have been better served had I deployed a Single Jetty Solr
>> instance per server with multiple cores running inside? The servers do start
>> to swap out after a couple of days of Solr uptime - right now we reboot the
>> entire cluster every 4 days.
>> 
>> 3. The routing key is not able to effectively balance the docs on available
>> shards - There are a few shards with just about 2M docs - and others over
>> 11M docs. Shall I split the larger shards? But I do not have more nodes /
>> hardware to allocate to this deployment. In such case would splitting up the
>> large shards give better read-write throughput? 
>> 
>> 4. To remain with the current hardware - would it help if I remove 1 replica
>> each from a shard? But that would mean even when just 1 node goes down for a
>> shard there would be only 1 live node left that would not serve the write
>> requests.
>> 
>> 5. Also, is there a way to control where the Split Shard replicas would go?
>> Is there a pattern / rule that Solr follows when it creates replicas for
>> split shards?
>> 
>> 6. I read somewhere that creating a Core would cost the OS on

Re: SolrCloud Scale Struggle

2014-08-02 Thread Bill Bell
Seems way overkill. Are you using /get at all ? If you need the docs avail 
right away - why ? How about after 30 seconds ? How many docs do you get added 
per second during peak ? Even Google has a delay when you do Adwords. 

One idea is yo have an empty core that you insert into and then shard into the 
queries. So one fire would be called newdocs and then you would add this core 
into your query. There are a couple issues with this with scoring but it works 
nicely. I would not even use Solrcloud for that core.

Try to reduce number of Java running. Reduce memory and use one java per 
machine. 

Then if you need faster avail if docs you really need to ask why. Why not 
later? If it got search or just showing the user the info ? If for showing 
maybe query a not indexes table for the few not yet indexed ?? Or just store in 
a db to show the user the info and index later?

Bill Bell
Sent from mobile


> On Aug 1, 2014, at 4:19 AM, "anand.mahajan"  wrote:
> 
> Hello all,
> 
> Struggling to get this going with SolrCloud - 
> 
> Requirement in brief :
> - Ingest about 4M Used Cars listings a day and track all unique cars for
> changes
> - 4M automated searches a day (during the ingestion phase to check if a doc
> exists in the index (based on values of 4-5 key fields) or it is a new one
> or an updated version)
> - Of the 4 M - About 3M Updates to existing docs (for every non-key value
> change)
> - About 1M inserts a day (I'm assuming these many new listings come in
> every day)
> - Daily Bulk CSV exports of inserts / updates in last 24 hours of various
> snapshots of the data to various clients
> 
> My current deployment : 
> i) I'm using Solr 4.8 and have set up a SolrCloud with 6 dedicated machines
> - 24 Core + 96 GB RAM each.
> ii)There are over 190M docs in the SolrCloud at the moment (for all
> replicas its consuming overall disk 2340GB which implies - each doc is at
> about 5-8kb in size.)
> iii) The docs are split into 36 Shards - and 3 replica per shard (in all
> 108 Solr Jetty processes split over 6 Servers leaving about 18 Jetty JVMs
> running on each host)
> iv) There are 60 fields per doc and all fields are stored at the moment  :( 
> (The backend is only Solr at the moment)
> v) The current shard/routing key is a combination of Car Year, Make and
> some other car level attributes that help classify the cars
> vi) We are mostly using the default Solr config as of now - no heavy caching
> as the search is pretty random in nature 
> vii) Autocommit is on - with maxDocs = 1
> 
> Current throughput & Issues :
> With the above mentioned deployment the daily throughout is only at about
> 1.5M on average (Inserts + Updates) - falling way short of what is required.
> Search is slow - Some queries take about 15 seconds to return - and since
> insert is dependent on at least one Search that degrades the write
> throughput too. (This is not a Solr issue - but the app demands it so)
> 
> Questions :
> 
> 1. Autocommit with maxDocs = 1 - is that a goof up and could that be slowing
> down indexing? Its a requirement that all docs are available as soon as
> indexed.
> 
> 2. Should I have been better served had I deployed a Single Jetty Solr
> instance per server with multiple cores running inside? The servers do start
> to swap out after a couple of days of Solr uptime - right now we reboot the
> entire cluster every 4 days.
> 
> 3. The routing key is not able to effectively balance the docs on available
> shards - There are a few shards with just about 2M docs - and others over
> 11M docs. Shall I split the larger shards? But I do not have more nodes /
> hardware to allocate to this deployment. In such case would splitting up the
> large shards give better read-write throughput? 
> 
> 4. To remain with the current hardware - would it help if I remove 1 replica
> each from a shard? But that would mean even when just 1 node goes down for a
> shard there would be only 1 live node left that would not serve the write
> requests.
> 
> 5. Also, is there a way to control where the Split Shard replicas would go?
> Is there a pattern / rule that Solr follows when it creates replicas for
> split shards?
> 
> 6. I read somewhere that creating a Core would cost the OS one thread and a
> file handle. Since a core repsents an index in its entirty would it not be
> allocated the configured number of write threads? (The dafault that is 8)
> 
> 7. The Zookeeper cluster is deployed on the same boxes as the Solr instance
> - Would separating the ZK cluster out help?
> 
> Sorry for the long thread _ I thought of asking these all at once rather
> than posting separate ones.
> 
> Thanks,
> Anand
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Latest jetty

2014-07-26 Thread Bill Bell
Since we are now on latest Java JDK can we move to Jetty 9?

Thoughts ?

Bill Bell
Sent from mobile



Re: Solr atomic updates question

2014-07-08 Thread Bill Au
I see what you mean now.  Thanks for the example.  It makes things very
clear.

I have been thinking about the explanation in the original response more.
 According to that, both regular update with entire doc and atomic update
involves a delete by id followed by a add.  But both the Solr reference doc
(
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents)
says that:

"The first is *atomic updates*. This approach allows changing only one or
more fields of a document without having to re-index the entire document."

But since Solr is doing a delete by id followed by a add, so "without
having to re-index the entire document" apply to the client side only?  On
the server side the add means that the entire document is re-indexed, right?

Bill


On Tue, Jul 8, 2014 at 7:32 PM, Steve McKay  wrote:

> Take a look at this update XML:
>
> 
>   
> 05991
> Steve McKay
> Walla Walla
> Python
>   
> 
>
> Let's say employeeId is the key. If there's a fourth field, salary, on the
> existing doc, should it be deleted or retained? With this update it will
> obviously be deleted:
>
> 
>   
> 05991
> Steve McKay
>   
> 
>
> With this XML it will be retained:
>
> 
>   
> 05991
> Walla Walla
> Python
>   
> 
>
> I'm not willing to guess what will happen in the case where non-atomic and
> atomic updates are present on the same add because I haven't looked at that
> code since 4.0, but I think I could make a case for retaining salary or for
> discarding it. That by itself reeks--and it's also not well documented.
> Relying on iffy, poorly-documented behavior is asking for pain at upgrade
> time.
>
> Steve
>
> On Jul 8, 2014, at 7:02 PM, Bill Au  wrote:
>
> > Thanks for that under-the-cover explanation.
> >
> > I am not sure what you mean by "mix atomic updates with regular field
> > values".  Can you give an example?
> >
> > Thanks.
> >
> > Bill
> >
> >
> > On Tue, Jul 8, 2014 at 6:56 PM, Steve McKay  wrote:
> >
> >> Atomic updates fetch the doc with RealTimeGet, apply the updates to the
> >> fetched doc, then reindex. Whether you use atomic updates or send the
> >> entire doc to Solr, it has to deleteById then add. The perf difference
> >> between the atomic updates and "normal" updates is likely minimal.
> >>
> >> Atomic updates are for when you have changes and want to apply them to a
> >> document without affecting the other fields. A regular add will replace
> an
> >> existing document completely. AFAIK Solr will let you mix atomic updates
> >> with regular field values, but I don't think it's a good idea.
> >>
> >> Steve
> >>
> >> On Jul 8, 2014, at 5:30 PM, Bill Au  wrote:
> >>
> >>> Solr atomic update allows for changing only one or more fields of a
> >>> document without having to re-index the entire document.  But what
> about
> >>> the case where I am sending in the entire document?  In that case the
> >> whole
> >>> document will be re-indexed anyway, right?  So I assume that there will
> >> be
> >>> no saving.  I am actually thinking that there will be a performance
> >> penalty
> >>> since atomic update requires Solr to first retrieve all the fields
> first
> >>> before updating.
> >>>
> >>> Bill
> >>
> >>
>
>


Re: Solr atomic updates question

2014-07-08 Thread Bill Au
Thanks for that under-the-cover explanation.

I am not sure what you mean by "mix atomic updates with regular field
values".  Can you give an example?

Thanks.

Bill


On Tue, Jul 8, 2014 at 6:56 PM, Steve McKay  wrote:

> Atomic updates fetch the doc with RealTimeGet, apply the updates to the
> fetched doc, then reindex. Whether you use atomic updates or send the
> entire doc to Solr, it has to deleteById then add. The perf difference
> between the atomic updates and "normal" updates is likely minimal.
>
> Atomic updates are for when you have changes and want to apply them to a
> document without affecting the other fields. A regular add will replace an
> existing document completely. AFAIK Solr will let you mix atomic updates
> with regular field values, but I don't think it's a good idea.
>
> Steve
>
> On Jul 8, 2014, at 5:30 PM, Bill Au  wrote:
>
> > Solr atomic update allows for changing only one or more fields of a
> > document without having to re-index the entire document.  But what about
> > the case where I am sending in the entire document?  In that case the
> whole
> > document will be re-indexed anyway, right?  So I assume that there will
> be
> > no saving.  I am actually thinking that there will be a performance
> penalty
> > since atomic update requires Solr to first retrieve all the fields first
> > before updating.
> >
> > Bill
>
>


Solr atomic updates question

2014-07-08 Thread Bill Au
Solr atomic update allows for changing only one or more fields of a
document without having to re-index the entire document.  But what about
the case where I am sending in the entire document?  In that case the whole
document will be re-indexed anyway, right?  So I assume that there will be
no saving.  I am actually thinking that there will be a performance penalty
since atomic update requires Solr to first retrieve all the fields first
before updating.

Bill


Re: stucked with log4j configuration

2014-04-12 Thread Bill Bell
Well I hope log4j2 is something Solr supports when GA

Bill Bell
Sent from mobile


> On Apr 12, 2014, at 7:26 AM, Aman Tandon  wrote:
> 
> I have upgraded my solr4.2 to solr 4.7.1 but in my logs there is an error
> for log4j
> 
> log4j: Could not find resource
> 
> Please find the attachment of the screenshot of the error console
> https://drive.google.com/file/d/0B5GzwVkR3aDzdjE1b2tXazdxcGs/edit?usp=sharing
> -- 
> With Regards
> Aman Tandon


Re: boost results within 250km

2014-04-09 Thread Bill Bell
Just take geodist and use the map function and send to bf or boost 

Bill Bell
Sent from mobile


> On Apr 9, 2014, at 8:26 AM, Erick Erickson  wrote:
> 
> Why do you want to do this? This sounds like an XY problem, you're
> asking how to do something specific without explaining why you care,
> perhaps there are other ways to do this.
> 
> Best,
> Erick
> 
>> On Tue, Apr 8, 2014 at 11:30 PM, Aman Tandon  wrote:
>> How can i gave the more boost to the results within 250km than others
>> without using result filtering.


Re: Luke 4.6.1 released

2014-02-16 Thread Bill Bell
Yes it works with Solr 

Bill Bell
Sent from mobile


> On Feb 16, 2014, at 3:38 PM, Alexandre Rafalovitch  wrote:
> 
> Does it work with Solr? I couldn't tell what the description was from
> this repo and it's Solr relevance.
> 
> I am sure all the long timers know, but for more recent Solr people,
> the additional information would be useful.
> 
> Regards,
>   Alex.
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
> 
> 
>> On Mon, Feb 17, 2014 at 3:02 AM, Dmitry Kan  wrote:
>> Hello!
>> 
>> Luke 4.6.1 has been just released. Grab it here:
>> 
>> https://github.com/DmitryKey/luke/releases/tag/4.6.1
>> 
>> fixes:
>> loading the jar from command line is now working fine.
>> 
>> --
>> Dmitry Kan
>> Blog: http://dmitrykan.blogspot.com
>> Twitter: twitter.com/dmitrykan


Status if 4.6.1?

2014-01-18 Thread Bill Bell
We just need the bug fix for Solr.xml 

https://issues.apache.org/jira/browse/SOLR-5543

Bill Bell
Sent from mobile



Re: question about DIH solr-data-config.xml and XML include

2014-01-14 Thread Bill Au
The problem is with the admin UI not following the XML include to find
entity so it found none.  DIH itself does support XML include as I can
issue the DIH commands via HTTP on the included entities successfully.

Bill


On Mon, Jan 13, 2014 at 8:03 PM, Shawn Heisey  wrote:

> On 1/13/2014 3:31 PM, Bill Au wrote:
>
>> But when I use XML include, the Entity pull-down in the Dataimport section
>> of the Solr admin UI is empty.  I know that happens when there is a syntax
>> error in solr-data-config.xml.  Does DIH supports XML include?  Also I am
>> not seeing any error message in the log even if I set log level to ALL.
>>  Is
>> there any way to get DIH to log what it thinks is wrong
>> solr-data-cofig.xml?
>>
>
> Paying it forward.  Someone on this mailing list helped me with this.  I
> have tested this DIH configand found that it works:
>
> 
> http://www.w3.org/2001/XInclude";>
>driver="com.mysql.jdbc.Driver"
> encoding="UTF-8"
> url="jdbc:mysql://${dih.request.dbHost}:3306/${dih.request.dbSchema}?
> zeroDateTimeBehavior=convertToNull"
> batchSize="-1"
> user="REDACTED"
> password="REDACTED"/>
>   
>   
>   
> 
>
> The xlmns:xi attribute in the outer tag makes it possible to use the
> xi:include syntax later.
>
> I make extensive use of this in my solrconfig.xml file. There's almost no
> actual config in that file, everything is included from other files.
>
> When you look at the config in the admin UI, you will not see the included
> text, you'll only see the xi:include tag.
>
> Thanks,
> Shawn
>
>


question about DIH solr-data-config.xml and XML include

2014-01-13 Thread Bill Au
I am trying to simplify my Solr DIH configuration by using XML schema
include element.  Here is an example:





]>

&dataSource;

&entity1;
&entity2;




I know my included XML files are good because if I put them all into a
single XML file, DIH works as expected.

But when I use XML include, the Entity pull-down in the Dataimport section
of the Solr admin UI is empty.  I know that happens when there is a syntax
error in solr-data-config.xml.  Does DIH supports XML include?  Also I am
not seeing any error message in the log even if I set log level to ALL.  Is
there any way to get DIH to log what it thinks is wrong solr-data-cofig.xml?

BTW, the admin UI show the DIH config as shown above.  So I suspecting that
DIH isn't actually doing the XML include.

Bill


Re: Call to Solr via TCP

2013-12-10 Thread Bill Bell
Yeah open socket to port and send correct Get syntax and Solr will respond with 
results...



Bill Bell
Sent from mobile


> On Dec 10, 2013, at 2:50 PM, Doug Turnbull 
>  wrote:
> 
> Zwer, is there a reason you need to do this? Its probably very hard to
> get solr to speak TCP. But if you're having a performance or
> infrastructure problem, the group might be able to help you with a far
> simpler solution.
> 
> Sent from my Windows Phone From: Zwer
> Sent: 12/10/2013 12:15 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Call to Solr via TCP
> Maybe I asked incorrectly.
> 
> 
> Solr is Web Application, hosted by some servlet container and is reachable
> via HTTP.
> 
> HTTP is an extension of TCP and I would like to know whether exists some
> lower way to communicate with application (i.e. Solr) hosted by Jetty?
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Call-to-Solr-via-TCP-tp4105932p4105935.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Reverse mm(min-should-match)

2013-11-22 Thread Bill Bell
This is an awesome idea!

Sent from my iPad

> On Nov 22, 2013, at 12:54 PM, Doug Turnbull 
>  wrote:
> 
> Instead of specifying a percentage or number of query terms must match
> tokens in a field, I'd like to do the opposite -- specify how much of a
> field must match a query.
> 
> The problem I'm trying to solve is to boost document titles that closely
> match the query string. If a title looks something like
> 
> *Title: *[solr] [the] [worlds] [greatest] [search] [engine]
> 
> I want to be able to specify how much of the field must match the query
> string. This differs from normal mm. Normal mm specifies a how much of the
> query must match a field.
> 
> As an example, with this title, if I use normal mm=100% and perform the
> following query:
> 
> mm=100%
> q=solr
> 
> This will match the title above, as 100% of [solr] matches the field
> 
> What I really want to get at is a reverse mm:
> 
> Rmm=100%
> q=solr
> 
> The title above will not match in this case. Only 1/6 of the tokens in the
> field match the query.
> 
> However an exact search would match:
> 
> Rmm=100%
> q=solr the worlds greatest search engine
> 
> Here 100% of the query matches the title, so I'm good.
> 
> Is there any way to achieve this in Solr?
> 
> -- 
> Doug Turnbull
> Search & Big Data Architect
> OpenSource Connections 


Re: NullPointerException

2013-11-22 Thread Bill Bell
It seems to be a modified row and referenced in EvaluatorBag.

I am not familiar with either.

Sent from my iPad

> On Nov 22, 2013, at 3:05 AM, Adrien RUFFIE  wrote:
> 
> Hello all,
> 
> I have perform a full indexation with solr, but when I try to perform an 
> incrementation indexation I get the following exception (cf attachment).
> 
> Any one have a idea of the problem ?
> 
> Greate thank
> 


Re: useColdSearcher in SolrCloud config

2013-11-22 Thread Bill Bell
Wouldn't that be true means use cold searcher? It seems backwards to me...

Sent from my iPad

> On Nov 22, 2013, at 2:44 AM, ade-b  wrote:
> 
> Hi
> 
> The definition of useColdSearcher config element in solrconfig.xml is
> 
> "If a search request comes in and there is no current registered searcher,
> then immediately register the still warming searcher and use it.  If "false"
> then all requests will block until the first searcher is done warming".
> 
> By the term 'block', I assume SOLR returns a non 200 response to requests.
> Does anybody know the exact response code returned when the server is
> blocking requests?
> 
> If a new SOLR server is introduced into an existing array of SOLR servers
> (in SOLR Cloud setup), it will sync it's index from the leader. To save you
> having to specify warm-up queries in the solrconfig.xml file for first
> searchers, would/could the new server not auto warm it's caches from the
> caches of an existing server?
> 
> Thanks
> Ade 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/useColdSearcher-in-SolrCloud-config-tp4102569.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to work with remote solr savely?

2013-11-22 Thread Bill Bell
Do you have a sample jetty XML to setup basic auth for updates in Solr?

Sent from my iPad

> On Nov 22, 2013, at 7:34 AM, "michael.boom"  wrote:
> 
> Use HTTP basic authentication, setup in your servlet container
> (jetty/tomcat).
> 
> That should work fine if you are *not* using SolrCloud.
> 
> 
> 
> -
> Thanks,
> Michael
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-work-with-remote-solr-savely-tp4102612p4102613.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Jetty 9?

2013-11-07 Thread Bill Bell
So no Jetty 9 until Solr 5? Java 7 is at rel 40 Is that our commitment to 
not require Java 7 until Solr 5? 

Most people are probably already on Java 7...

Bill Bell
Sent from mobile


> On Nov 7, 2013, at 1:29 AM, Furkan KAMACI  wrote:
> 
> Here is an issue points to that:
> https://issues.apache.org/jira/browse/SOLR-4839
> 
> 
> 2013/11/7 William Bell 
> 
>> When are we moving Solr to Jetty 9?
>> 
>> --
>> Bill Bell
>> billnb...@gmail.com
>> cell 720-256-8076
>> 


Re: Performance of "rows" and "start" parameters

2013-11-04 Thread Bill Bell
Do you want to look thru then all ? Have you considered Lucene API? Not sure if 
that is better but it might be.

Bill Bell
Sent from mobile


> On Nov 4, 2013, at 6:43 AM, "michael.boom"  wrote:
> 
> I saw that some time ago there was a JIRA ticket dicussing this, but still i
> found no relevant information on how to deal with it.
> 
> When working with big nr of docs (e.g. 70M) in my case, I'm using
> start=0&rows=30 in my requests.
> For the first req the query time is ok, the next one is visibily slower, the
> third even more slow and so on until i get some huge query times of up
> 140secs, after a few hundreds requests. My test were done with SolrMeter at
> a rate of 1000qpm. Same thing happens at 100qpm, tough.
> 
> Is there a best practice on how to do in this situation, or maybe an
> explanation why is the query time increasing, from request to request ?
> 
> Thanks!
> 
> 
> 
> -
> Thanks,
> Michael
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Performance-of-rows-and-start-parameters-tp4099194.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Core admin: create new core

2013-11-04 Thread Bill Bell
You could pre create a bunch of directories and base configs. Create as needed. 
Then use schema less API to set it up ... Or make changes in a script and 
reload the core..

Bill Bell
Sent from mobile


> On Nov 4, 2013, at 6:06 AM, Erick Erickson  wrote:
> 
> Right, this has been an issue for a while, there's no current
> way to do this.
> 
> Someday, I'll be able to work on SOLR-4779 which should
> go some toward making this work more easily. It's still not
> exactly what you're looking for, but it might work.
> 
> Of course with SolrCloud you can specify a configuration
> set that is used for multiple collections.
> 
> People are using Puppet or similar to automate this over
> large numbers of nodes, but that's not entirely satisfactory
> either in our case I suspect.
> 
> FWIW,
> Erick
> 
> 
>> On Mon, Nov 4, 2013 at 4:00 AM, Bram Van Dam  wrote:
>> 
>> The core admin CREATE function requires that the new instance dir and
>> schema/config exist already. Is there a particular reason for this? It
>> would be incredible convenient if I could create a core with a new schema
>> and new config simply by calling CREATE (maybe providing the contents of
>> config.xml and schema.xml as base64 encoded strings in HTTP POST or
>> something?).
>> 
>> I'm guessing this isn't currently possible?
>> 
>> Ta,
>> 
>> - bram
>> 


Re: Proposal for new feature, cold replicas, brainstorming

2013-10-27 Thread Bill Bell
Yeah replicate to a DR site would be good too. 

Bill Bell
Sent from mobile


> On Oct 24, 2013, at 6:27 AM, yriveiro  wrote:
> 
> I'm wondering some time ago if it's possible have replicas of a shard
> synchronized but in an state that they can't accept queries only updates. 
> 
> This replica in "replication" mode only awake to accept queries if it's the
> last alive replica and goes to replication mode when other replica becomes
> alive and synchronized.
> 
> The motivation of this is simple, I want have replication but I don't want
> have n replicas actives with full resources allocated (cache and so on).
> This is usefull in enviroments where replication is needed but a high query
> throughput is not fundamental and the resources are limited.
> 
> I know that right now is not possible, but I think that it's a feature that
> can be implemented in a easy way creating a new status for shards.
> 
> The bottom line question is, I'm the only one with this kind of
> requeriments? Does it make sense one functionality like this?
> 
> 
> 
> -
> Best regards
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Proposal-for-new-feature-cold-replicas-brainstorming-tp4097501.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr - what's the next big thing?

2013-10-26 Thread Bill Bell
Full JSON support deep complex object indexing and search Game changer 

Bill Bell
Sent from mobile


> On Oct 26, 2013, at 1:04 PM, Otis Gospodnetic  
> wrote:
> 
> Hi,
> 
>> On Sat, Oct 26, 2013 at 5:58 AM, Saar Carmi  wrote:
>> LOL,  Jack.  I can imagine Otis saying that.
> 
> Funny indeed, but not really.
> 
>> Otis,  with these marriage,  are we going to see map reduce based queries?
> 
> Can you please describe what you mean by that?  Maybe with an example.
> 
> Thanks,
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
> 
> 
> 
>>> On Oct 25, 2013 10:03 PM, "Jack Krupansky"  wrote:
>>> 
>>> But a lot of that big yellow elephant stuff is in 4.x anyway.
>>> 
>>> (Otis: I was afraid that you were going to say that the next big thing in
>>> Solr is... Elasticsearch!)
>>> 
>>> -- Jack Krupansky
>>> 
>>> -Original Message- From: Otis Gospodnetic
>>> Sent: Friday, October 25, 2013 2:43 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Solr - what's the next big thing?
>>> 
>>> Saar,
>>> 
>>> The marriage with the big yellow elephant is a big deal. It changes the
>>> scale.
>>> 
>>> Otis
>>> Solr & ElasticSearch Support
>>> http://sematext.com/
>>> On Oct 25, 2013 5:32 AM, "Saar Carmi"  wrote:
>>> 
>>> If I am not mistaken the most impressive improvement of Solr 4.0 compared
>>>> to previous versions was the Solr Cloud architecture.
>>>> 
>>>> What would be the next big thing in Solr 5.0 ?
>>>> 
>>>> Saar
>>> 


Re: Spatial Distance Range

2013-10-22 Thread Bill Bell
Yes frange works 

Bill Bell
Sent from mobile


> On Oct 22, 2013, at 8:17 AM, Eric Grobler  wrote:
> 
> Hi Everyone,
> 
> Normally one would search for documents where the location is within a
> specified distance, for example widthin 5 km:
> fq={!geofilt pt=45.15,-93.85 sfield=store
> d=5}<http://localhost:8983/solr/select?wt=json&indent=true&fl=name,store&q=*:*&fq=%7B!geofilt%20pt=45.15,-93.85%20sfield=store%20d=5%7D>
> 
> It there a way to specify a range between 10 and 20 km?
> Something like:
> fq={!geofilt pt=45.15,-93.85 sfield=store distancefrom=10
> distanceupto=20}<http://localhost:8983/solr/select?wt=json&indent=true&fl=name,store&q=*:*&fq=%7B!geofilt%20pt=45.15,-93.85%20sfield=store%20d=5%7D>
> 
> Thanks
> Ericz


Re: Skipping caches on a /select

2013-10-17 Thread Bill Bell
But global on a qt would be awesome !!!

Bill Bell
Sent from mobile


> On Oct 17, 2013, at 2:43 PM, Yonik Seeley  wrote:
> 
> There isn't a global  "cache=false"... it's a local param that can be
> applied to any "fq" or "q" parameter independently.
> 
> -Yonik
> 
> 
>> On Thu, Oct 17, 2013 at 4:39 PM, Tim Vaillancourt  
>> wrote:
>> Thanks Yonik,
>> 
>> Does "cache=false" apply to all caches? The docs make it sound like it is
>> for filterCache only, but I could be misunderstanding.
>> 
>> When I force a commit and perform a /select a query many times with
>> "cache=false", I notice my query gets cached still, my guess is in the
>> queryResultCache. At first the query takes 500ms+, then all subsequent
>> requests take 0-1ms. I'll confirm this queryResultCache assumption today.
>> 
>> Cheers,
>> 
>> Tim
>> 
>> 
>>> On 16/10/13 06:33 PM, Yonik Seeley wrote:
>>> 
>>> On Wed, Oct 16, 2013 at 6:18 PM, Tim Vaillancourt
>>> wrote:
>>>> 
>>>> I am debugging some /select queries on my Solr tier and would like to see
>>>> if there is a way to tell Solr to skip the caches on a given /select
>>>> query
>>>> if it happens to ALREADY be in the cache. Live queries are being inserted
>>>> and read from the caches, but I want my debug queries to bypass the cache
>>>> entirely.
>>>> 
>>>> I do know about the "cache=false" param (that causes the results of a
>>>> select to not be INSERTED in to the cache), but what I am looking for
>>>> instead is a way to tell Solr to not read the cache at all, even if there
>>>> actually is a cached result for my query.
>>> 
>>> Yeah, cache=false for "q" or "fq" should already not use the cache at
>>> all (read or write).
>>> 
>>> -Yonik


Re: DIH

2013-10-15 Thread Bill Bell
We are NOW CPU bound Thoughts ???

Bill Bell
Sent from mobile


> On Oct 15, 2013, at 8:49 PM, Bill Bell  wrote:
> 
> We have a custom Field processor in DIH and we are not CPU bound on one 
> core... How do we thread it ?? We need to use more cores
> 
> The box has 32 cores and 1 is 100% CPU bound.
> 
> Ideas ?
> 
> Bill Bell
> Sent from mobile
> 


DIH

2013-10-15 Thread Bill Bell
We have a custom Field processor in DIH and we are not CPU bound on one core... 
How do we thread it ?? We need to use more cores

The box has 32 cores and 1 is 100% CPU bound.

Ideas ?

Bill Bell
Sent from mobile



Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

2013-10-10 Thread Bill Bell
Does this work ?
I can suggest -XX:-UseLoopPredicate to switch off predicates.

???

Which version of 7 is recommended ?

Bill Bell
Sent from mobile


> On Oct 10, 2013, at 11:29 AM, "Smiley, David W."  wrote:
> 
> *Don't* use JDK 7u40, it's been known to cause index corruption and
> SIGSEGV faults with Lucene: LUCENE-5212   This has not been unnoticed by
> Oracle.
> 
> ~ David
> 
>> On 10/10/13 12:34 PM, "Guido Medina"  wrote:
>> 
>> 2. Java version: There are huges performance winning between Java 5, 6
>>   and 7; we use Oracle JDK 7u40.
> 


Re: Field with default value and stored=false, will be reset back to the default value in case of updating other fields

2013-10-09 Thread Bill Bell
You have to update the whole record including all fields...

Bill Bell
Sent from mobile


> On Oct 9, 2013, at 7:50 PM, deniz  wrote:
> 
> hi all,
> 
> I have encountered some problems and post it on stackoverflow here:
> http://stackoverflow.com/questions/19285251/solr-field-with-default-value-resets-itself-if-it-is-stored-false
>  
> 
> as you can see from the response, does it make sense to open a bug ticket
> for this? because, although i can workaround this by setting everything back
> to stored=true, it does not make sense to keep every field stored while i
> dont need to return them in the search result.. or will anyone can make more
> detailed explanations that this is expected and normal? 
> 
> 
> 
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Field-with-default-value-and-stored-false-will-be-reset-back-to-the-default-value-in-case-of-updatins-tp4094508.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: problem with data import handler delta import due to use of multiple datasource

2013-10-08 Thread Bill Au
Thanks for the suggestion but that won't work as I have last_modified field
in both the parent entity and child entity as I want delta import to kick
in when either change.  That other approach has the same problem since the
parent and child entity uses different datasources.

Bill


On Tue, Oct 8, 2013 at 10:18 AM, Dyer, James
wrote:

> Bill,
>
> I do not believe there is any way to tell it to use a different datasource
> for the parent delta query.
>
> If you used this approach, would it solve your problem:
> http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport ?
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Bill Au [mailto:bill.w...@gmail.com]
> Sent: Tuesday, October 08, 2013 8:50 AM
> To: solr-user@lucene.apache.org
> Subject: Re: problem with data import handler delta import due to use of
> multiple datasource
>
> I am using 4.3.  It is not related to bugs related to last_index_time.  The
> problem is caused by the fact that the parent entity and child entity use
> different data source (different databases on different hosts).
>
> From the log output, I do see the the delta query of the child entity being
> executed correctly and found all the rows that have been modified for the
> child entity.  But it fails when it executed the parentDeltaQuery because
> it is still using the database connection from the child entity (ie
> datasource ds2 in my example above).
>
> Is there a way to tell DIH to use a different datasource in the
> parentDeltaQuery?
>
> Bill
>
>
> On Sat, Oct 5, 2013 at 10:28 PM, Alexandre Rafalovitch
> wrote:
>
> > Which version of Solr and what kind of SQL errors? There were some bugs
> in
> > 4.x related to last_index_time, but it does not sound related.
> >
> > Regards,
> >Alex.
> >
> > Personal website: http://www.outerthoughts.com/
> > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > - Time is the quality of nature that keeps events from happening all at
> > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
> >
> >
> > On Sun, Oct 6, 2013 at 8:51 AM, Bill Au  wrote:
> >
> > > Here is my DIH config:
> > >
> > > 
> > >  > driver="com.mysql.jdbc.Driver"
> > > url="jdbc:mysql://localhost1/dbname1" user="db_username1"
> > > password="db_password1"/>
> > >  > driver="com.mysql.jdbc.Driver"
> > > url="jdbc:mysql://localhost2/dbname2" user="db_username2"
> > > password="db_password2"/>
> > > 
> > > 
> > > 
> > > 
> > >
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > >
> > > I am having trouble with delta import.  I think it is because the main
> > > entity and the sub-entity use different data source.  I have tried
> using
> > > both a delta query:
> > >
> > > deltaQuery="select id from item where id in (select item_id as id from
> > > feature where last_modified > '${dih.last_index_time}') or
> last_modified
> > > > '${dih.last_index_time}'"
> > >
> > > and a parentDeltaQuery:
> > >
> > >  > > parentDeltaQuery="select ID from item where ID=${feature.ITEM_ID}"/>
> > >
> > > I ended up with an SQL error for both.  Is there any way to make delta
> > > import work in my case?
> > >
> > > Bill
> > >
> >
>
>


Re: problem with data import handler delta import due to use of multiple datasource

2013-10-08 Thread Bill Au
I am using 4.3.  It is not related to bugs related to last_index_time.  The
problem is caused by the fact that the parent entity and child entity use
different data source (different databases on different hosts).

>From the log output, I do see the the delta query of the child entity being
executed correctly and found all the rows that have been modified for the
child entity.  But it fails when it executed the parentDeltaQuery because
it is still using the database connection from the child entity (ie
datasource ds2 in my example above).

Is there a way to tell DIH to use a different datasource in the
parentDeltaQuery?

Bill


On Sat, Oct 5, 2013 at 10:28 PM, Alexandre Rafalovitch
wrote:

> Which version of Solr and what kind of SQL errors? There were some bugs in
> 4.x related to last_index_time, but it does not sound related.
>
> Regards,
>Alex.
>
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Sun, Oct 6, 2013 at 8:51 AM, Bill Au  wrote:
>
> > Here is my DIH config:
> >
> > 
> >  driver="com.mysql.jdbc.Driver"
> > url="jdbc:mysql://localhost1/dbname1" user="db_username1"
> > password="db_password1"/>
> >  driver="com.mysql.jdbc.Driver"
> > url="jdbc:mysql://localhost2/dbname2" user="db_username2"
> > password="db_password2"/>
> > 
> > 
> > 
> > 
> >
> > 
> > 
> > 
> > 
> > 
> > 
> >
> > I am having trouble with delta import.  I think it is because the main
> > entity and the sub-entity use different data source.  I have tried using
> > both a delta query:
> >
> > deltaQuery="select id from item where id in (select item_id as id from
> > feature where last_modified > '${dih.last_index_time}') or last_modified
> > > '${dih.last_index_time}'"
> >
> > and a parentDeltaQuery:
> >
> >  > parentDeltaQuery="select ID from item where ID=${feature.ITEM_ID}"/>
> >
> > I ended up with an SQL error for both.  Is there any way to make delta
> > import work in my case?
> >
> > Bill
> >
>


problem with data import handler delta import due to use of multiple datasource

2013-10-05 Thread Bill Au
Here is my DIH config:
















I am having trouble with delta import.  I think it is because the main
entity and the sub-entity use different data source.  I have tried using
both a delta query:

deltaQuery="select id from item where id in (select item_id as id from
feature where last_modified > '${dih.last_index_time}') or last_modified
> '${dih.last_index_time}'"

and a parentDeltaQuery:



I ended up with an SQL error for both.  Is there any way to make delta
import work in my case?

Bill


Re: Solr 4.5 spatial search - distance and score

2013-09-13 Thread Bill Bell
You can apply his 4.5 patches to 4.4 or take trunk and it is there

Bill Bell
Sent from mobile


On Sep 12, 2013, at 6:23 PM, Weber  wrote:

> I'm trying to get score by using a custom boost and also get the distance. I
> found David's code* to get it using "Intersects", which I want to replace by
> {!geofilt} or geodist()
> 
> *David's code: https://issues.apache.org/jira/browse/SOLR-4255
> 
> He told me geodist() will be available again for this kind of field, which
> is a geohash type.
> 
> Then, I'd like to know how it can be done today on 4.4 with {!geofilt} and
> how it will be done on 4.5 using geodist()
> 
> Thanks in advance.
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-4-5-spatial-search-distance-and-score-tp4089706.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Some highlighted snippets aren't being returned

2013-09-08 Thread Bill Bell
Zip up all your configs 

Bill Bell
Sent from mobile


On Sep 8, 2013, at 3:00 PM, "Eric O'Hanlon"  wrote:

> Hi again Everyone,
> 
> I didn't get any replies to this, so I thought I'd re-send in case anyone 
> missed it and has any thoughts.
> 
> Thanks,
> Eric
> 
> On Aug 7, 2013, at 1:51 PM, Eric O'Hanlon  wrote:
> 
>> Hi Everyone,
>> 
>> I'm facing an issue in which my solr query is returning highlighted snippets 
>> for some, but not all results.  For reference, I'm searching through an 
>> index that contains web crawls of human-rights-related websites.  I'm 
>> running solr as a webapp under Tomcat and I've included the query's solr 
>> params from the Tomcat log:
>> 
>> ...
>> webapp=/solr-4.2
>> path=/select
>> params={facet=true&sort=score+desc&group.limit=10&spellcheck.q=Unangan&f.mimetype_code.facet.limit=7&hl.simple.pre=&q.alt=*:*&f.organization_type__facet.facet.limit=6&f.language__facet.facet.limit=6&hl=true&f.date_of_capture_.facet.limit=6&group.field=original_url&hl.simple.post=&facet.field=domain&facet.field=date_of_capture_&facet.field=mimetype_code&facet.field=geographic_focus__facet&facet.field=organization_based_in__facet&facet.field=organization_type__facet&facet.field=language__facet&facet.field=creator_name__facet&hl.fragsize=600&f.creator_name__facet.facet.limit=6&facet.mincount=1&qf=text^1&hl.fl=contents&hl.fl=title&hl.fl=original_url&wt=ruby&f.geographic_focus__facet.facet.limit=6&defType=edismax&rows=10&f.domain.facet.limit=6&q=Unangan&f.organization_based_in__facet.facet.limit=6&q.op=AND&group=true&hl.usePhraseHighlighter=true}
>>  hits=8 status=0 QTime=108
>> ...
>> 
>> For the query above (which can be simplified to say: find all documents that 
>> contain the word "unangan" and return facets, highlights, etc.), I get five 
>> search results.  Only three of these are returning highlighted snippets.  
>> Here's the "highlighting" portion of the solr response (note: printed in 
>> ruby notation because I'm receiving this response in a Rails app):
>> 
>> 
>> "highlighting"=>
>> {"20100602195444/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%202002%20tentang%20Perlindungan%20Anak.pdf"=>
>>   {},
>>  
>> "20100902203939/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%202002%20tentang%20Perlindungan%20Anak.pdf"=>
>>   {},
>>  
>> "20111202233029/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%202002%20tentang%20Perlindungan%20Anak.pdf"=>
>>   {},
>>  "20100618201646/http://www.komnasham.go.id/portal/files/39-99.pdf"=>
>>   {"contents"=>
>> ["...actual snippet is returned here..."]},
>>  "20100902235358/http://www.komnasham.go.id/portal/files/39-99.pdf"=>
>>   {"contents"=>
>> ["...actual snippet is returned here..."]},
>>  
>> "20110302213056/http://www.komnasham.go.id/publikasi/doc_download/2-uu-no-39-tahun-1999"=>
>>   {"contents"=>
>> ["...actual snippet is returned here..."]},
>>  
>> "20110302213102/http://www.komnasham.go.id/publikasi/doc_view/2-uu-no-39-tahun-1999?tmpl=component&format=raw"=>
>>   {"contents"=>
>> ["...actual snippet is returned here..."]},
>>  
>> "20120303113654/http://www.iwgia.org/iwgia_files_publications_files/0028_Utimut_heritage.pdf"=>
>>   {}}
>> 
>> 
>> I have eight (as opposed to five) results above because I'm also doing a 
>> grouped query, grouping by a field called "original_url", and this leads to 
>> five grouped results.
>> 
>> I've confirmed that my highlight-lacking results DO contain the word 
>> "unangan", as expected, and this term is appearing in a text field that's 
>> indexed and stored, and being searched for all text searches.  For example, 
>> one of the search results is for a crawl of this document: 
>> http://www.iwgia.org/iwgia_files_publications_files/0028_Utimut_heritage.pdf
>> 
>> And if you view that document on the web, you'll see that it does contain 
>> "unangan".
>> 
>> Has anyone seen this before?  And does anyone have any good suggestions for 
>> troubleshooting/fixing the problem?
>> 
>> Thanks!
>> 
>> - Eric
> 


Re: Solr 4.2.1 update to 4.3/4.4 problem

2013-08-27 Thread Bill Bell
Index and query

analyzer type="index">

Bill Bell
Sent from mobile


On Aug 26, 2013, at 5:42 AM, skorrapa  wrote:

> I have also re indexed the data and tried. And also tried with the belowl
>   sortMissingLast="true" omitNorms="true">
>  
>
>
>  
>
>
>
>  
>
>
>
>  
>
> This didnt work as well...
> 
> 
> 
> On Mon, Aug 26, 2013 at 4:03 PM, skorrapa [via Lucene] <
> ml-node+s472066n4086601...@n3.nabble.com> wrote:
> 
>> Hello All,
>> 
>> I am still facing the same issue. Case insensitive search isnot working on
>> Solr 4.3
>> I am using the below configurations in schema.xml
>> > sortMissingLast="true" omitNorms="true">
>>  
>>
>>
>>  
>>
>>
>>
>>  
>>
>>
>>
>>  
>>
>> Basically I want my string which could have spaces or characters like '-'
>> or \ to be searched upon case insensitively.
>> Please help.
>> 
>> 
>> --
>> If you reply to this email, your message will be added to the discussion
>> below:
>> 
>> http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086601.html
>> To unsubscribe from Solr 4.2.1 update to 4.3/4.4 problem, click 
>> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4081896&code=a29ycmFwYXRpLnN1c2htYUBnbWFpbC5jb218NDA4MTg5Nnw0MjEwNTY0Mzc=>
>> .
>> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086606.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Concat 2 fields in another field

2013-08-27 Thread Bill Bell
If for search just copyField into a multivalued field

Or do it on indexing using DIH or code. A rhino script works too.

Bill Bell
Sent from mobile


On Aug 27, 2013, at 7:15 AM, "Jack Krupansky"  wrote:

> I have additional examples in the two most recent early access releases of my 
> book - variations on using the existing update processors.
> 
> -- Jack Krupansky
> 
> -Original Message- From: Federico Chiacchiaretta
> Sent: Tuesday, August 27, 2013 8:39 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Concat 2 fields in another field
> 
> Hi,
> we do the same thing using an update request processor chain, this is the
> snippet from solrconfig.xml
> 
> 
>  > firstname concatfield   class="solr.CloneFieldUpdateProcessorFactory"> lastname str> concatfield   "solr.ConcatFieldUpdateProcessorFactory"> concatfield
>  _ 
>   "solr.RunUpdateProcessorFactory" />
> 
> 
> 
> Regards,
> Federico Chiacchiaretta
> 
> 
> 
> 2013/8/27 Markus Jelsma 
> 
>> You may be more interested in the ConcatFieldUpdateProcessorFactory:
>> 
>> http://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/ConcatFieldUpdateProcessorFactory.html
>> 
>> 
>> 
>> -Original message-
>> > From:Alok Bhandari 
>> > Sent: Tuesday 27th August 2013 14:05
>> > To: solr-user@lucene.apache.org
>> > Subject: Re: Concat 2 fields in another field
>> >
>> > Thanks for reply.
>> >
>> > But I don't want to introduce any scripting in my code so want to know > is
>> > there any Java component available for the same.
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> http://lucene.472066.n3.nabble.com/Concat-2-fields-in-another-field-tp4086786p4086791.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>> >
> 


Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Bill Bell
This seems like a fairly large issue. Can you create a Jira issue ?

Bill Bell
Sent from mobile


On Jul 30, 2013, at 12:34 PM, Dotan Cohen  wrote:

> On Tue, Jul 30, 2013 at 9:21 PM, Aloke Ghoshal  wrote:
>> Does adding facet.mincount=2 help?
>> 
>> 
> 
> In fact, when adding facet.mincount=20 (I know that some dupes are in
> the hundreds) I got the OutOfMemoryError in seconds instead of
> minutes.
> 
> -- 
> Dotan Cohen
> 
> http://gibberish.co.il
> http://what-is-what.com


Re: Performance question on Spatial Search

2013-07-29 Thread Bill Bell
Can you compare with the old geo handler as a baseline. ?

Bill Bell
Sent from mobile


On Jul 29, 2013, at 4:25 PM, Erick Erickson  wrote:

> This is very strange. I'd expect slow queries on
> the first few queries while these caches were
> warmed, but after that I'd expect things to
> be quite fast.
> 
> For a 12G index and 256G RAM, you have on the
> surface a LOT of hardware to throw at this problem.
> You can _try_ giving the JVM, say, 18G but that
> really shouldn't be a big issue, your index files
> should be MMaped.
> 
> Let's try the crude thing first and give the JVM
> more memory.
> 
> FWIW
> Erick
> 
> On Mon, Jul 29, 2013 at 4:45 PM, Steven Bower  wrote:
>> I've been doing some performance analysis of a spacial search use case I'm
>> implementing in Solr 4.3.0. Basically I'm seeing search times alot higher
>> than I'd like them to be and I'm hoping people may have some suggestions
>> for how to optimize further.
>> 
>> Here are the specs of what I'm doing now:
>> 
>> Machine:
>> - 16 cores @ 2.8ghz
>> - 256gb RAM
>> - 1TB (RAID 1+0 on 10 SSD)
>> 
>> Content:
>> - 45M docs (not very big only a few fields with no large textual content)
>> - 1 geo field (using config below)
>> - index is 12gb
>> - 1 shard
>> - Using MMapDirectory
>> 
>> Field config:
>> 
>> > distErrPct="0.025" maxDistErr="0.00045"
>> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
>> units="degrees"/>
>> 
>> > required="false" stored="true" type="geo"/>
>> 
>> 
>> What I've figured out so far:
>> 
>> - Most of my time (98%) is being spent in
>> java.nio.Bits.copyToByteArray(long,Object,long,long) which is being
>> driven by BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock()
>> which from what I gather is basically reading terms from the .tim file
>> in blocks
>> 
>> - I moved from Java 1.6 to 1.7 based upon what I read here:
>> http://blog.vlad1.com/2011/10/05/looking-at-java-nio-buffer-performance/
>> and it definitely had some positive impact (i haven't been able to
>> measure this independantly yet)
>> 
>> - I changed maxDistErr from 0.09 (which is 1m precision per docs)
>> to 0.00045 (50m precision) ..
>> 
>> - It looks to me that the .tim file are being memory mapped fully (ie
>> they show up in pmap output) the virtual size of the jvm is ~18gb
>> (heap is 6gb)
>> 
>> - I've optimized the index but this doesn't have a dramatic impact on
>> performance
>> 
>> Changing the precision and the JVM upgrade yielded a drop from ~18s
>> avg query time to ~9s avg query time.. This is fantastic but I want to
>> get this down into the 1-2 second range.
>> 
>> At this point it seems that basically i am bottle-necked on basically
>> copying memory out of the mapped .tim file which leads me to think
>> that the only solution to my problem would be to read less data or
>> somehow read it more efficiently..
>> 
>> If anyone has any suggestions of where to go with this I'd love to know
>> 
>> 
>> thanks,
>> 
>> steve


Re: Solr 4.3.0 DIH problem with MySQL datetime being imported with time as 00:00:00

2013-06-29 Thread Bill Au
https://issues.apache.org/jira/browse/SOLR-4978


On Sat, Jun 29, 2013 at 2:33 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Yes we need to use getTimestamp instead of getDate. Please create an issue.
>
> On Sat, Jun 29, 2013 at 11:48 PM, Bill Au  wrote:
> > So disabling convertType does provide a workaround for my problem with
> > datetime column.  But the problem still exists when convertType is
> enabled
> > because DIH is not doing the conversion correctly for a solr date field.
> >  Solr date field does have a time portion but java.sql.Date does not.  So
> > DIH should not be calling ResultSet.getDate() for a solr date field.  It
> > should really be calling ResultSet.getTimestamp() instead.  Is the fix
> this
> > simple?  Am I missing anything?
> >
> > If the fix is this simple I can submit and commit a patch to DIH.
> >
> > Bill
> >
> >
> > On Sat, Jun 29, 2013 at 12:13 PM, Bill Au  wrote:
> >
> >> Setting convertType=false does solve the datetime issue.  But there are
> >> now other columns that were working before but not working now.  Since I
> >> have already done some research into the datetime to date issue and not
> >> been able to find a solution, I think I will have to keep convertType
> set
> >> to false and deal with the other column type that are not working now.
> >>
> >> Thanks for your help.
> >>
> >> Bill
> >>
> >>
> >> On Sat, Jun 29, 2013 at 10:24 AM, Bill Au  wrote:
> >>
> >>> I just double check my config.  We are using convertType=true.  Someone
> >>> else came up with the config so I am not sure why we are using it.  I
> will
> >>> try with it set to false to see if something else will break.  Thanks
> for
> >>> pointing that out.
> >>>
> >>> This is my first time using DIH.  I really like what I have seen so
> far.
> >>>
> >>> Bill
> >>>
> >>>
> >>> On Sat, Jun 29, 2013 at 1:45 AM, Shalin Shekhar Mangar <
> >>> shalinman...@gmail.com> wrote:
> >>>
> >>>> The default in JdbcDataSource is to use ResultSet.getObject which
> >>>> returns the underlying database's type. The type specific methods in
> >>>> ResultSet are not invoked unless you are using convertType="true".
> >>>>
> >>>> Is MySQL actually returning java.sql.Timestamp objects?
> >>>>
> >>>> On Sat, Jun 29, 2013 at 5:22 AM, Bill Au  wrote:
> >>>> > I am running Solr 4.3.0, using DIH to import data from MySQL.  I am
> >>>> running
> >>>> > into a very strange problem where data from a datetime column being
> >>>> > imported with the right date but the time is 00:00:00.  I tried
> using
> >>>> SQL
> >>>> > DATE_FORMAT() and also DIH DateFormatTransformer but nothing works.
> >>>>  The
> >>>> > raw debug response of DIH, it looks like the time porting of the
> >>>> datetime
> >>>> > data is already 00:00:00 in Solr jdbc query result.
> >>>> >
> >>>> > So I looked at the source code of DIH JdbcDataSource class.  It is
> >>>> using
> >>>> > java.sql.ResultSet and its getDate() method to handle date column.
>  The
> >>>> > getDate() method returns java.sql.Date.  The java api doc for
> >>>> java.sql.Date
> >>>> >
> >>>> > http://docs.oracle.com/javase/6/docs/api/java/sql/Date.html
> >>>> >
> >>>> > states that:
> >>>> >
> >>>> > "To conform with the definition of SQL DATE, the millisecond values
> >>>> wrapped
> >>>> > by a java.sql.Date instance must be 'normalized' by setting the
> hours,
> >>>> > minutes, seconds, and milliseconds to zero in the particular time
> zone
> >>>> with
> >>>> > which the instance is associated."
> >>>> >
> >>>> > This seems to be describing exactly my problem.  Has anyone else
> notice
> >>>> > this problem?  Has anyone use DIH to index SQL datetime
> successfully?
> >>>>  If
> >>>> > so can you send me the relevant portion of the DIH config?
> >>>> >
> >>>> > Bill
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Regards,
> >>>> Shalin Shekhar Mangar.
> >>>>
> >>>
> >>>
> >>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: Solr 4.3.0 DIH problem with MySQL datetime being imported with time as 00:00:00

2013-06-29 Thread Bill Au
So disabling convertType does provide a workaround for my problem with
datetime column.  But the problem still exists when convertType is enabled
because DIH is not doing the conversion correctly for a solr date field.
 Solr date field does have a time portion but java.sql.Date does not.  So
DIH should not be calling ResultSet.getDate() for a solr date field.  It
should really be calling ResultSet.getTimestamp() instead.  Is the fix this
simple?  Am I missing anything?

If the fix is this simple I can submit and commit a patch to DIH.

Bill


On Sat, Jun 29, 2013 at 12:13 PM, Bill Au  wrote:

> Setting convertType=false does solve the datetime issue.  But there are
> now other columns that were working before but not working now.  Since I
> have already done some research into the datetime to date issue and not
> been able to find a solution, I think I will have to keep convertType set
> to false and deal with the other column type that are not working now.
>
> Thanks for your help.
>
> Bill
>
>
> On Sat, Jun 29, 2013 at 10:24 AM, Bill Au  wrote:
>
>> I just double check my config.  We are using convertType=true.  Someone
>> else came up with the config so I am not sure why we are using it.  I will
>> try with it set to false to see if something else will break.  Thanks for
>> pointing that out.
>>
>> This is my first time using DIH.  I really like what I have seen so far.
>>
>> Bill
>>
>>
>> On Sat, Jun 29, 2013 at 1:45 AM, Shalin Shekhar Mangar <
>> shalinman...@gmail.com> wrote:
>>
>>> The default in JdbcDataSource is to use ResultSet.getObject which
>>> returns the underlying database's type. The type specific methods in
>>> ResultSet are not invoked unless you are using convertType="true".
>>>
>>> Is MySQL actually returning java.sql.Timestamp objects?
>>>
>>> On Sat, Jun 29, 2013 at 5:22 AM, Bill Au  wrote:
>>> > I am running Solr 4.3.0, using DIH to import data from MySQL.  I am
>>> running
>>> > into a very strange problem where data from a datetime column being
>>> > imported with the right date but the time is 00:00:00.  I tried using
>>> SQL
>>> > DATE_FORMAT() and also DIH DateFormatTransformer but nothing works.
>>>  The
>>> > raw debug response of DIH, it looks like the time porting of the
>>> datetime
>>> > data is already 00:00:00 in Solr jdbc query result.
>>> >
>>> > So I looked at the source code of DIH JdbcDataSource class.  It is
>>> using
>>> > java.sql.ResultSet and its getDate() method to handle date column.  The
>>> > getDate() method returns java.sql.Date.  The java api doc for
>>> java.sql.Date
>>> >
>>> > http://docs.oracle.com/javase/6/docs/api/java/sql/Date.html
>>> >
>>> > states that:
>>> >
>>> > "To conform with the definition of SQL DATE, the millisecond values
>>> wrapped
>>> > by a java.sql.Date instance must be 'normalized' by setting the hours,
>>> > minutes, seconds, and milliseconds to zero in the particular time zone
>>> with
>>> > which the instance is associated."
>>> >
>>> > This seems to be describing exactly my problem.  Has anyone else notice
>>> > this problem?  Has anyone use DIH to index SQL datetime successfully?
>>>  If
>>> > so can you send me the relevant portion of the DIH config?
>>> >
>>> > Bill
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>>
>>
>>
>


Re: Solr 4.3.0 DIH problem with MySQL datetime being imported with time as 00:00:00

2013-06-29 Thread Bill Au
Setting convertType=false does solve the datetime issue.  But there are now
other columns that were working before but not working now.  Since I have
already done some research into the datetime to date issue and not been
able to find a solution, I think I will have to keep convertType set to
false and deal with the other column type that are not working now.

Thanks for your help.

Bill


On Sat, Jun 29, 2013 at 10:24 AM, Bill Au  wrote:

> I just double check my config.  We are using convertType=true.  Someone
> else came up with the config so I am not sure why we are using it.  I will
> try with it set to false to see if something else will break.  Thanks for
> pointing that out.
>
> This is my first time using DIH.  I really like what I have seen so far.
>
> Bill
>
>
> On Sat, Jun 29, 2013 at 1:45 AM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>> The default in JdbcDataSource is to use ResultSet.getObject which
>> returns the underlying database's type. The type specific methods in
>> ResultSet are not invoked unless you are using convertType="true".
>>
>> Is MySQL actually returning java.sql.Timestamp objects?
>>
>> On Sat, Jun 29, 2013 at 5:22 AM, Bill Au  wrote:
>> > I am running Solr 4.3.0, using DIH to import data from MySQL.  I am
>> running
>> > into a very strange problem where data from a datetime column being
>> > imported with the right date but the time is 00:00:00.  I tried using
>> SQL
>> > DATE_FORMAT() and also DIH DateFormatTransformer but nothing works.  The
>> > raw debug response of DIH, it looks like the time porting of the
>> datetime
>> > data is already 00:00:00 in Solr jdbc query result.
>> >
>> > So I looked at the source code of DIH JdbcDataSource class.  It is using
>> > java.sql.ResultSet and its getDate() method to handle date column.  The
>> > getDate() method returns java.sql.Date.  The java api doc for
>> java.sql.Date
>> >
>> > http://docs.oracle.com/javase/6/docs/api/java/sql/Date.html
>> >
>> > states that:
>> >
>> > "To conform with the definition of SQL DATE, the millisecond values
>> wrapped
>> > by a java.sql.Date instance must be 'normalized' by setting the hours,
>> > minutes, seconds, and milliseconds to zero in the particular time zone
>> with
>> > which the instance is associated."
>> >
>> > This seems to be describing exactly my problem.  Has anyone else notice
>> > this problem?  Has anyone use DIH to index SQL datetime successfully?
>>  If
>> > so can you send me the relevant portion of the DIH config?
>> >
>> > Bill
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>
>


Re: Solr 4.3.0 DIH problem with MySQL datetime being imported with time as 00:00:00

2013-06-29 Thread Bill Au
I just double check my config.  We are using convertType=true.  Someone
else came up with the config so I am not sure why we are using it.  I will
try with it set to false to see if something else will break.  Thanks for
pointing that out.

This is my first time using DIH.  I really like what I have seen so far.

Bill


On Sat, Jun 29, 2013 at 1:45 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> The default in JdbcDataSource is to use ResultSet.getObject which
> returns the underlying database's type. The type specific methods in
> ResultSet are not invoked unless you are using convertType="true".
>
> Is MySQL actually returning java.sql.Timestamp objects?
>
> On Sat, Jun 29, 2013 at 5:22 AM, Bill Au  wrote:
> > I am running Solr 4.3.0, using DIH to import data from MySQL.  I am
> running
> > into a very strange problem where data from a datetime column being
> > imported with the right date but the time is 00:00:00.  I tried using SQL
> > DATE_FORMAT() and also DIH DateFormatTransformer but nothing works.  The
> > raw debug response of DIH, it looks like the time porting of the datetime
> > data is already 00:00:00 in Solr jdbc query result.
> >
> > So I looked at the source code of DIH JdbcDataSource class.  It is using
> > java.sql.ResultSet and its getDate() method to handle date column.  The
> > getDate() method returns java.sql.Date.  The java api doc for
> java.sql.Date
> >
> > http://docs.oracle.com/javase/6/docs/api/java/sql/Date.html
> >
> > states that:
> >
> > "To conform with the definition of SQL DATE, the millisecond values
> wrapped
> > by a java.sql.Date instance must be 'normalized' by setting the hours,
> > minutes, seconds, and milliseconds to zero in the particular time zone
> with
> > which the instance is associated."
> >
> > This seems to be describing exactly my problem.  Has anyone else notice
> > this problem?  Has anyone use DIH to index SQL datetime successfully?  If
> > so can you send me the relevant portion of the DIH config?
> >
> > Bill
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Solr 4.3.0 DIH problem with MySQL datetime being imported with time as 00:00:00

2013-06-28 Thread Bill Au
I am running Solr 4.3.0, using DIH to import data from MySQL.  I am running
into a very strange problem where data from a datetime column being
imported with the right date but the time is 00:00:00.  I tried using SQL
DATE_FORMAT() and also DIH DateFormatTransformer but nothing works.  The
raw debug response of DIH, it looks like the time porting of the datetime
data is already 00:00:00 in Solr jdbc query result.

So I looked at the source code of DIH JdbcDataSource class.  It is using
java.sql.ResultSet and its getDate() method to handle date column.  The
getDate() method returns java.sql.Date.  The java api doc for java.sql.Date

http://docs.oracle.com/javase/6/docs/api/java/sql/Date.html

states that:

"To conform with the definition of SQL DATE, the millisecond values wrapped
by a java.sql.Date instance must be 'normalized' by setting the hours,
minutes, seconds, and milliseconds to zero in the particular time zone with
which the instance is associated."

This seems to be describing exactly my problem.  Has anyone else notice
this problem?  Has anyone use DIH to index SQL datetime successfully?  If
so can you send me the relevant portion of the DIH config?

Bill


SolrCloud excluding certain files in conf from zookeeper

2013-06-14 Thread Bill Au
When using SolrCloud, is it possible to exclude certain files in the conf
directory from being loaded into Zookeeper?

We are keeping our own solr related config files in the conf directory that
is actually different for each node.  Right now the copy in Zookeeper is
overriding the local copy.

Bill


Re: question about the file data/index.properties

2013-05-15 Thread Bill Au
Thanks for that info.  So besides the two that I have already seen, are
there any more ways that the index directory can be named?  I am working on
some home-grown administration scripts which need to know the name of the
index directory.

Bill


On Wed, May 15, 2013 at 7:13 PM, Mark Miller  wrote:

> It's fairly meaningless from a user perspective, but it happens when an
> index is replicated that cannot be simply merged with the existing index
> files and needs a new directory.
>
> - Mark
>
> On May 15, 2013, at 5:38 PM, Bill Au  wrote:
>
> > I am running 2 separate 4.3 SolrCloud clusters.  On one of them I noticed
> > the file data/index.properties on the replica nodes where the index
> > directory is named "index.".
> > On the other cluster, the index directory is just named "index".
> >
> > Under what condition is index.properties created?  I am trying to
> > understand why there is a difference between my 2 SolrCloud clusters.
> >
> > Bill
>
>


question about the file data/index.properties

2013-05-15 Thread Bill Au
I am running 2 separate 4.3 SolrCloud clusters.  On one of them I noticed
the file data/index.properties on the replica nodes where the index
directory is named "index.".
 On the other cluster, the index directory is just named "index".

Under what condition is index.properties created?  I am trying to
understand why there is a difference between my 2 SolrCloud clusters.

Bill


Best practice for rebuild index in SolrCloud

2013-04-08 Thread Bill Au
We are using SolrCloud for replication and dynamic scaling but not
distribution so we are only using a single shard.  From time to time we
make changes to the index schema that requires rebuilding of the index.

Should I treat the rebuilding as just any other index operation?  It seems
to me it would be better if I can somehow take a node "offline" and rebuild
the index there, then put it back online and let the new index be
replicated from there.  But I am not sure how to do the latter.

Bill


Re: multiple SolrCloud clusters with one ZooKeeper ensemble?

2013-03-28 Thread Bill Au
Thanks.

Now I have to go back and re-read the entire SolrCloud Wiki to see what
other info I missed and/or forgot.

Bill


On Thu, Mar 28, 2013 at 12:48 PM, Chris Hostetter
wrote:

>
> : Can I use a single ZooKeeper ensemble for multiple SolrCloud clusters or
> : would each SolrCloud cluster requires its own ZooKeeper ensemble?
>
> https://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot
>
> (I'm going to FAQ this)
>
>
> -Hoss
>


multiple SolrCloud clusters with one ZooKeeper ensemble?

2013-03-28 Thread Bill Au
Can I use a single ZooKeeper ensemble for multiple SolrCloud clusters or
would each SolrCloud cluster requires its own ZooKeeper ensemble?

Bill


Re: Solr 4.1 SolrCloud with 1 shard and 3 replicas

2013-03-27 Thread Bill Au
Thanks for the info, Erik.

I had gone through the tutorial in the SolrCloud Wiki and verified that
queries are load balanced in the two shard cluster with shard replicas
setup.  I was wondering if I need to explicitly specify distrib=false in my
single shard setup.  Glad to see that Solr is doing the right thing by
default in my case.

Bill

ps thanks for a very informative webinar.  I am going to recommend it to my
co-workers once the recording is available


On Wed, Mar 27, 2013 at 3:26 PM, Erik Hatcher wrote:

> Requests to a node in your example would be answered by that node (no need
> to distribute; it's a single shard system) and it would not internally be
> routed otherwise either.  Ultimately it is up to the client to load-balance
> the initial requests into a "SolrCloud" cluster, but internally in a
> multi-shard distributed search request it will be load balanced beyond that
> initial node.
>
> CloudSolrServer does load balance, so if you're using that client it'll
> "randomly" pick a shard to send to from the client-side.  If you're using
> some other mechanism, it'll request directly to whatever node that you've
> specified directly for that initial request.
>
> Erik
>
> p.s. Thanks for attending the webinar, Bill!   I saw your name as one of
> the question askers.  Hopefully all that stuff I made up is close to the
> truth :)
>
>
>
> On Mar 27, 2013, at 14:51 , Bill Au wrote:
>
> > I am running Solr 4.1.  I have set up SolrCloud with 1 leader and 3
> > replicas, 4 nodes total.  Do query requests send to a node only query the
> > replica on that node, or are they load-balanced to the entire cluster?
> >
> > Bill
>
>


Solr 4.1 SolrCloud with 1 shard and 3 replicas

2013-03-27 Thread Bill Au
I am running Solr 4.1.  I have set up SolrCloud with 1 leader and 3
replicas, 4 nodes total.  Do query requests send to a node only query the
replica on that node, or are they load-balanced to the entire cluster?

Bill


Re: [ANNOUNCE] Apache Solr 4.2 released

2013-03-17 Thread Bill Au
The "Upgrading from Solr 4.1.0" section of the 4.2.0 CHANGES.txt says:

"(No upgrade instructions yet)"

To me that's not the same as no need to do anything.  I think the doc
should be updated with either specific instructions or states 4.2.0 is
backward compatible with 4.1.0 so there is no need to do anything.

Bill


On Sun, Mar 17, 2013 at 6:12 AM, sandeep a  wrote:

> Hi , please let me know how to upgrade solr from 4.1.0 to 4.2.0.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/ANNOUNCE-Apache-Solr-4-2-released-tp4046510p4048201.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: multiple facet.prefix for the same facet.field VS multiple facet.query

2013-02-21 Thread Bill Au
Never mind.  I just realized the difference between the two.  Sorry for the
noise.

Bill


On Thu, Feb 21, 2013 at 8:42 AM, Bill Au  wrote:

> There have been requests for supporting multiple facet.prefix for the same
> facet.field.  There is an open JIRA with a patch:
>
> https://issues.apache.org/jira/browse/SOLR-1351
>
> Wouldn't using multiple facet.query achieve the same result?  I mean
> something like:
>
> facet.query=lastName:A*&facet.query=lastName:B*&facet.query=lastName:C*
>
>
> Bill
>
>


multiple facet.prefix for the same facet.field VS multiple facet.query

2013-02-21 Thread Bill Au
There have been requests for supporting multiple facet.prefix for the same
facet.field.  There is an open JIRA with a patch:

https://issues.apache.org/jira/browse/SOLR-1351

Wouldn't using multiple facet.query achieve the same result?  I mean
something like:

facet.query=lastName:A*&facet.query=lastName:B*&facet.query=lastName:C*


Bill


Re: Solr 4.0 SolrCloud with AWS Auto Scaling

2013-01-04 Thread Bill Au
thanks for pointing me to Solr's Zookeeper servlet.  I will look at the
source to see how I can use to fulfill my needs.

Bill


On Thu, Jan 3, 2013 at 6:43 PM, Mark Miller  wrote:

> Technically, you want to make sure zookeeper reports the node as live and
> active.
>
> You could use the same api that the UI uses for that - the
> localhost:port/solr/zookeeper (I think?) servlet.
>
> If you can't reach it for a node, it's obviously down - if you can reach
> it, parse the json and see if it notes the node as active?
>
> Not quite as clean as you'd like prob. Might be worth a JIRA issue to look
> at further options.
>
> - Mark
>
> On Jan 3, 2013, at 5:54 PM, Bill Au  wrote:
>
> > Thanks, Mark.
> >
> > That does remove the node.  And it seems to do so permanently.  Even
> when I
> > restart Solr after unloading, it does not join the SolrCloud cluster.
>  And
> > I can get it to re-join the cluster by creating the core.
> >
> > Anyone know if there is an API to determine the state of a node.  When
> AWS
> > auto scaling add a new node, I need to make sure it has before active
> > before I enable it in the load balancer.
> >
> > Bill
> >
> >
> >
> >
> > On Thu, Jan 3, 2013 at 9:10 AM, Mark Miller 
> wrote:
> >
> >>
> >> http://wiki.apache.org/solr/CoreAdmin#UNLOAD
> >>
> >> - Mark
> >>
> >> On Jan 3, 2013, at 9:06 AM, Bill Au  wrote:
> >>
> >>> Mark,
> >>>What do you mean by "unload them"?
> >>>
> >>> I am using an AWS load balancer with my auto scaling group in stead of
> >>> using Solr's built-in load balancer.  I am no sharding my index.  I am
> >>> using SolrCloud for replication only.  I am doing local search on each
> >>> instance and sending all updates to the shard leader directly because I
> >>> want to minimize traffic between nodes during search and update
> >>>
> >>> Bill
> >>>
> >>>
> >>> On Wed, Jan 2, 2013 at 6:47 PM, Mark Miller 
> >> wrote:
> >>>
> >>>>
> >>>> On Jan 2, 2013, at 5:51 PM, Bill Au  wrote:
> >>>>
> >>>>> Is anyone running Solr 4.0 SolrCloud with AWS auto scaling?
> >>>>>
> >>>>> My concern is that as AWS auto scaling add and remove instances to
> >>>>> SolrCloud, the number of nodes in SolrCloud Zookeeper config will
> grow
> >>>>> indefinitely as removed instances will never be used again.  AWS auto
> >>>>> scaling will keep on adding new instances, and there is no way to
> >> remove
> >>>>> them from Zookeeper, right?
> >>>>
> >>>> You can unload them and that removes them.
> >>>>
> >>>>> What's the effect of have all these phantom
> >>>>> nodes?
> >>>>
> >>>> Unless they are only replicas, they would need to be removed.
> >>>>
> >>>> Also, unless you are using elastic ips,
> >>>> https://issues.apache.org/jira/browse/SOLR-4078 may be of interest.
> >>>>
> >>>> - Mark
> >>
> >>
>
>


Re: Solr 4.0 SolrCloud with AWS Auto Scaling

2013-01-03 Thread Bill Au
Thanks, Mark.

That does remove the node.  And it seems to do so permanently.  Even when I
restart Solr after unloading, it does not join the SolrCloud cluster.  And
I can get it to re-join the cluster by creating the core.

Anyone know if there is an API to determine the state of a node.  When AWS
auto scaling add a new node, I need to make sure it has before active
before I enable it in the load balancer.

Bill




On Thu, Jan 3, 2013 at 9:10 AM, Mark Miller  wrote:

>
> http://wiki.apache.org/solr/CoreAdmin#UNLOAD
>
> - Mark
>
> On Jan 3, 2013, at 9:06 AM, Bill Au  wrote:
>
> > Mark,
> > What do you mean by "unload them"?
> >
> > I am using an AWS load balancer with my auto scaling group in stead of
> > using Solr's built-in load balancer.  I am no sharding my index.  I am
> > using SolrCloud for replication only.  I am doing local search on each
> > instance and sending all updates to the shard leader directly because I
> > want to minimize traffic between nodes during search and update
> >
> > Bill
> >
> >
> > On Wed, Jan 2, 2013 at 6:47 PM, Mark Miller 
> wrote:
> >
> >>
> >> On Jan 2, 2013, at 5:51 PM, Bill Au  wrote:
> >>
> >>> Is anyone running Solr 4.0 SolrCloud with AWS auto scaling?
> >>>
> >>> My concern is that as AWS auto scaling add and remove instances to
> >>> SolrCloud, the number of nodes in SolrCloud Zookeeper config will grow
> >>> indefinitely as removed instances will never be used again.  AWS auto
> >>> scaling will keep on adding new instances, and there is no way to
> remove
> >>> them from Zookeeper, right?
> >>
> >> You can unload them and that removes them.
> >>
> >>> What's the effect of have all these phantom
> >>> nodes?
> >>
> >> Unless they are only replicas, they would need to be removed.
> >>
> >> Also, unless you are using elastic ips,
> >> https://issues.apache.org/jira/browse/SOLR-4078 may be of interest.
> >>
> >> - Mark
>
>


  1   2   3   4   5   6   >