Re: problems with indexing documents

2019-04-02 Thread Bill Tantzen
Right, as Mark said, this is how the dates were indexed previously.
However, instead of passing in the actual String, we passed a
java.util.Date object which was automagically converted to the correct
string.

Now (the code on our end has not changed), solr throws an exception
because the string it sees is of the form 'Sun Jul 31 19:00:00 CDT
2016' -- (which I believe is the Date.toString() result) instead of
the DatePointField or TrieDateField format.

~~ Bill

On Mon, Apr 1, 2019 at 8:44 PM Zheng Lin Edwin Yeo  wrote:
>
> Hi Bill,
>
> Previously, did you index the date in the same format as you are using now,
> or in the Solr format of "-MM-DDTHH:MM:SSZ"?
>
> Regards,
> Edwin
>
>
> On Tue, 2 Apr 2019 at 00:32, Bill Tantzen  wrote:
>
> > In a legacy application using Solr 4.1 and solrj, I have always been
> > able to add documents with TrieDateField types using java.util.Date
> > objects, for instance,
> >
> > doc.addField ( "date", new java.util.Date() );
> >
> > having recently upgraded to Solr 7.7, and updating my schema to
> > leverage DatePointField as my type, that code no longer works,  it
> > throws an exception with an error like:
> >
> > Invalid Date String: 'Sun Jul 31 19:00:00 CDT 2016'
> >
> > I understand that this String is not what solr expects, but in lieu of
> > formatting the correct String, is there no longer a way to pass in a
> > simple Date object?  Was there some kind of implicit conversion taking
> > place earlier that is no longer happening?
> >
> > In fact, in the some of the example code that come with the solr
> > distribution, (SolrExampleTests.java), document timestamp fields are
> > added using the same AddField call I am attempting to use, so I am
> > very confused.
> >
> > Thanks for any advice!
> >
> > Regards,
> > Bill
> >



-- 
Human wheels spin round and round
While the clock keeps the pace... -- John Mellencamp

Bill TantzenUniversity of Minnesota Libraries
612-626-9949 (U of M)612-325-1777 (cell)


problems with indexing documents

2019-04-01 Thread Bill Tantzen
In a legacy application using Solr 4.1 and solrj, I have always been
able to add documents with TrieDateField types using java.util.Date
objects, for instance,

doc.addField ( "date", new java.util.Date() );

having recently upgraded to Solr 7.7, and updating my schema to
leverage DatePointField as my type, that code no longer works,  it
throws an exception with an error like:

Invalid Date String: 'Sun Jul 31 19:00:00 CDT 2016'

I understand that this String is not what solr expects, but in lieu of
formatting the correct String, is there no longer a way to pass in a
simple Date object?  Was there some kind of implicit conversion taking
place earlier that is no longer happening?

In fact, in the some of the example code that come with the solr
distribution, (SolrExampleTests.java), document timestamp fields are
added using the same AddField call I am attempting to use, so I am
very confused.

Thanks for any advice!

Regards,
Bill


Solr 6.6 using swap space causes recovery?

2017-12-15 Thread Bill Oconnor
Hello,


We recently upgraded to SolrCloud 6.6. We are running on Ubuntu servers LTS 
14.x - VMware on Nutanics boxs. We have 4 nodes with 32GB each and 16GB for the 
jvm with 12GB minimum. Usually it is only using 4-7GB.


We do nightly indexing of partial fields for all our docs ~200K. This usually 
takes 3hr using 10 threads. About every other week we have a server go into 
recovery mode during the update. The recovering server has a much larger swap 
usage than the other servers in the cluster. We think this this related to the 
mmap files used for indexes. The server eventually recovers but it triggers 
alerts for devops which are annoying.


I have found a previous mail  list question (Shawn responded to) with almost an 
identical problem from 2014 but there is no suggested remedy. ( 
http://lucene.472066.n3.nabble.com/Solr-4-3-1-memory-swapping-td4126641.html)


Questions :


Is there progress regarding this?


Some kind of configuration that can mitigate this?


Maybe this is a lucene issue.


Thanks,

Bill OConnor (www.plos.org)


Re: Replicates not recovering after rolling restart

2017-09-22 Thread Bill Oconnor

Thanks everyone for the responses.


I believe I have found the problem.


The type of __version__ is incorrect in our schema. This is a required field 
that is primarily used by Solr.


Our schema has typed it as type=int instead of  type=long


I believe that this number is used by the replication process to figure out 
what needs to be sync'd on an

individual replicate. In our case Solr puts the value in during indexing. It 
appears that Solr has chosen a

number that cannot be represented by "int". As the replicates query the leader 
to determine if a sync is

necessary the the leader throws an error as it try's to format the response 
with the large _version_ .

This process continues until the replicates give up.


I finally verified this by doing a simple query _version_:*which throws the 
same error but gives

more helpful info "re-index your documents"


Thanks.






From: Rick Leir <rl...@leirtech.com>
Sent: Friday, September 22, 2017 12:34:57 AM
To: solr-user@lucene.apache.org
Subject: Re: Replicates not recovering after rolling restart

Wunder, Erick

$ dc
16o
1578578283947098112p
15E83C95E8D0

That is an interesting number. Is it, as a guess, machine instructions
or an address pointer? It does not look like UTF-8 or ASCII. Machine
code looks promising:


Disassembly:

0:  15 e8 3c 95 e8  adceax,0xe8953ce8
5:  d0 00   rolBYTE PTR [rax],1


/ADC/dest,src Modifies flags: AF CF OF SF PF ZF Sums two binary operands
placing the result in the destination.

*ROL - Rotate Left*

Registers: the/64-bit/extension of/eax/is called/rax/.

Is that code possibly in the JVM executable? Or a random memory page.

cheers -- Rick

On 2017-09-20 07:21 PM, Walter Underwood wrote:
> 1578578283947098112 needs 61 bits. Is it being parsed into a 32 bit target?
>
> That doesn’t explain where it came from, of course.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Sep 20, 2017, at 3:35 PM, Erick Erickson <erickerick...@gmail.com> wrote:
>>
>> The numberformatexception is...odd. Clearly that's too big a number
>> for an integer, did anything in the underlying schema change?
>>
>> Best,
>> Erick
>>
>> On Wed, Sep 20, 2017 at 3:00 PM, Walter Underwood <wun...@wunderwood.org> 
>> wrote:
>>> Rolling restarts work fine for us. I often include installing new configs 
>>> with that. Here is our script. Pass it any hostname in the cluster. I use 
>>> the load balancer name. You’ll need to change the domain and the install 
>>> directory of course.
>>>
>>> #!/bin/bash
>>>
>>> cluster=$1
>>>
>>> hosts=`curl -s 
>>> "http://${cluster}:8983/solr/admin/collections?action=CLUSTERSTATUS=json;
>>>  | jq -r '.cluster.live_nodes[]' | sort`
>>>
>>> for host in $hosts
>>> do
>>> host="${host}.cloud.cheggnet.com"
>>> echo restarting Solr on $host
>>> ssh $host 'cd /apps/solr6 ; sudo -u bin bin/solr stop; sudo -u bin 
>>> bin/solr start -cloud -h `hostname`'
>>> done
>>>
>>>
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>
>>>> On Sep 20, 2017, at 1:42 PM, Bill Oconnor <bocon...@plos.org> wrote:
>>>>
>>>> Hello,
>>>>
>>>>
>>>> Background:
>>>>
>>>>
>>>> We have been successfully using Solr for over 5 years and we recently made 
>>>> the decision to move into SolrCloud. For the most part that has been easy 
>>>> but we have repeated problems with our rolling restart were server remain 
>>>> functional but stay in Recovery until they stop trying. We restarted 
>>>> because we increased the memory from 12GB to 16GB on the JVM.
>>>>
>>>>
>>>> Does anyone have any insight as to what is going on here?
>>>>
>>>> Is there a special procedure I should use for starting a stopping host?
>>>>
>>>> Is it ok to do a rolling restart on all the nodes in s shard?
>>>>
>>>>
>>>> Any insight would be appreciated.
>>>>
>>>>
>>>> Configuration:
>>>>
>>>>
>>>> We have a group of servers with multiple collections. Each collection 
>>>> consist of one shard and multiple replicates. We are running the latest 
>>>> stable version of SolrClound 6.6 on Ubuntu LTS and Oracle Corporation Java 
>>>> HotSpot(TM

Re: Replicates not recovering after rolling restart

2017-09-21 Thread Bill Oconnor

  1.  We are moving from 4.X to 6.6.
  2.  Changed the schema - adding the version etc nothing major.
  3.  Full re-index of documents into the cluster - so this is not a migration.
  4.  Changed the the JVM parameter from 12GB to 16GB and did a restart.
  5.  Replicates go into recovery which fails to complete after many hours. 
They still respond to queries but the /update POST from the replicates fails 
with the 500 server error and a stack trace because of the number format 
failure.


My other cluster  does not reuse any nodes. The restart went as expected with 
the JVM change. Al


From: Erick Erickson <erickerick...@gmail.com>
Sent: Thursday, September 21, 2017 8:25:32 AM
To: solr-user
Subject: Re: Replicates not recovering after rolling restart

Hmmm, I didn't ask what version you're upgrading _from_. 5 years ago
would be Solr 4. Are you replacing Solr 5 or 4? I'm guessing 5, but
want to check unlikely possibilities.

Next question: I'm assuming all your nodes have been upgraded to Solr 6, right?

Best,
Erick

On Wed, Sep 20, 2017 at 7:18 PM, Bill Oconnor <bocon...@plos.org> wrote:
> I have no clue where that number comes from it does not seem to be in the 
> actual post to the leader as seen in my tcpdump.   It is mystery.
>
> 
> From: Walter Underwood <wun...@wunderwood.org>
> Sent: Wednesday, September 20, 2017 7:00:53 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Replicates not recovering after rolling restart
>
>
>> On Sep 20, 2017, at 6:15 PM, Bill Oconnor <bocon...@plos.org> wrote:
>>
>> I restart using the standard "sudo service solr start/stop"
>
> You might look into what that actually does.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
[https://wunderwood.files.wordpress.com/2017/02/diva.png?w=32]<http://observer.wunderwood.org/>

Most Casual Observer<http://observer.wunderwood.org/>
observer.wunderwood.org



>


Re: Replicates not recovering after rolling restart

2017-09-20 Thread Bill Oconnor
I have no clue where that number comes from it does not seem to be in the 
actual post to the leader as seen in my tcpdump.   It is mystery.


From: Walter Underwood <wun...@wunderwood.org>
Sent: Wednesday, September 20, 2017 7:00:53 PM
To: solr-user@lucene.apache.org
Subject: Re: Replicates not recovering after rolling restart


> On Sep 20, 2017, at 6:15 PM, Bill Oconnor <bocon...@plos.org> wrote:
>
> I restart using the standard "sudo service solr start/stop"

You might look into what that actually does.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Re: Replicates not recovering after rolling restart

2017-09-20 Thread Bill Oconnor
Thanks everyone for the response.


I do not think we changed anything other than the JVM memory size.


I did leave out one piece of info - one of the host is a replicate in another 
shard.


collection1 -> shard1 -> *h1, h2, h3, h4where star is leader

collection2 -> shard1 -> *h5, h3


When I restart *h1 works fine h2,h3,h4 go into recovery but still respond to 
request. *h1 starts getting

the post from the recovering servers and responds with the 500 Server Error 
until the servers quit.


Collection2 with h3 is active and fine even though it is recovering in 
collection1.


This happened before and I resolved it by deleting and then creating a new 
collection.


I restart using the standard "sudo service solr start/stop"


I have to say I am not comfortable with have multiple shards being shared on 
the same host. The Productions servers will not be configured this way but 
these servers are for development.


From: Erick Erickson <erickerick...@gmail.com>
Sent: Wednesday, September 20, 2017 3:35:16 PM
To: solr-user
Subject: Re: Replicates not recovering after rolling restart

The numberformatexception is...odd. Clearly that's too big a number
for an integer, did anything in the underlying schema change?

Best,
Erick

On Wed, Sep 20, 2017 at 3:00 PM, Walter Underwood <wun...@wunderwood.org> wrote:
> Rolling restarts work fine for us. I often include installing new configs 
> with that. Here is our script. Pass it any hostname in the cluster. I use the 
> load balancer name. You’ll need to change the domain and the install 
> directory of course.
>
> #!/bin/bash
>
> cluster=$1
>
> hosts=`curl -s 
> "http://${cluster}:8983/solr/admin/collections?action=CLUSTERSTATUS=json; 
> | jq -r '.cluster.live_nodes[]' | sort`
>
> for host in $hosts
> do
> host="${host}.cloud.cheggnet.com"
> echo restarting Solr on $host
> ssh $host 'cd /apps/solr6 ; sudo -u bin bin/solr stop; sudo -u bin 
> bin/solr start -cloud -h `hostname`'
> done
>
>
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Sep 20, 2017, at 1:42 PM, Bill Oconnor <bocon...@plos.org> wrote:
>>
>> Hello,
>>
>>
>> Background:
>>
>>
>> We have been successfully using Solr for over 5 years and we recently made 
>> the decision to move into SolrCloud. For the most part that has been easy 
>> but we have repeated problems with our rolling restart were server remain 
>> functional but stay in Recovery until they stop trying. We restarted because 
>> we increased the memory from 12GB to 16GB on the JVM.
>>
>>
>> Does anyone have any insight as to what is going on here?
>>
>> Is there a special procedure I should use for starting a stopping host?
>>
>> Is it ok to do a rolling restart on all the nodes in s shard?
>>
>>
>> Any insight would be appreciated.
>>
>>
>> Configuration:
>>
>>
>> We have a group of servers with multiple collections. Each collection 
>> consist of one shard and multiple replicates. We are running the latest 
>> stable version of SolrClound 6.6 on Ubuntu LTS and Oracle Corporation Java 
>> HotSpot(TM) 64-Bit Server VM 1.8.0_66 25.66-b17
>>
>>
>> (collection)  (shard)  (replicates)
>>
>> journals_stage   ->  shard1  ->  solr-220 (leader) , solr-223, solr-221, 
>> solr-222 (replicates)
>>
>>
>> Problem:
>>
>>
>> Restarting the system puts the replicates in a recovery state they never 
>> exit from. They eventually give up after 500 tries.  If I go to the 
>> individual replicates and execute a query the data is still available.
>>
>>
>> Using tcpdump I find the replicates sending this request to the leader (the 
>> leader appears to be active).
>>
>>
>> The exchange goes  like this - :
>>
>>
>> solr-220 is the leader.
>>
>> Solr-221 to Solr-220
>>
>>
>> 10:18:42.426823 IP solr-221:54341 > solr-220:8983:
>>
>>
>> POST /solr/journals_stage_shard1_replica1/update HTTP/1.1
>> Content-Type: application/x-www-form-urlencoded; charset=UTF-8
>> User-Agent: 
>> Solr[org.apache.solr<http://org.apache.solr/>.client.solrj.impl<http://client.solrj.impl/>.HttpSolrClient]
>>  1.0
>> Content-Length: 108
>> Host: solr-220:8983
>> Connection: Keep-Alive
>>
>>
>> commit_end_point=true=false=true=false=true=javabin=2
>>
>>
>> Solr-220 back to Solr-221
>>
>>
>> IP solr-220:8983 > solr-221:54341: Flags [

Replicates not recovering after rolling restart

2017-09-20 Thread Bill Oconnor
Hello,


Background:


We have been successfully using Solr for over 5 years and we recently made the 
decision to move into SolrCloud. For the most part that has been easy but we 
have repeated problems with our rolling restart were server remain functional 
but stay in Recovery until they stop trying. We restarted because we increased 
the memory from 12GB to 16GB on the JVM.


Does anyone have any insight as to what is going on here?

Is there a special procedure I should use for starting a stopping host?

Is it ok to do a rolling restart on all the nodes in s shard?


Any insight would be appreciated.


Configuration:


We have a group of servers with multiple collections. Each collection consist 
of one shard and multiple replicates. We are running the latest stable version 
of SolrClound 6.6 on Ubuntu LTS and Oracle Corporation Java HotSpot(TM) 64-Bit 
Server VM 1.8.0_66 25.66-b17


(collection)  (shard)  (replicates)

journals_stage   ->  shard1  ->  solr-220 (leader) , solr-223, solr-221, 
solr-222 (replicates)


Problem:


Restarting the system puts the replicates in a recovery state they never exit 
from. They eventually give up after 500 tries.  If I go to the individual 
replicates and execute a query the data is still available.


Using tcpdump I find the replicates sending this request to the leader (the 
leader appears to be active).


The exchange goes  like this - :


solr-220 is the leader.

Solr-221 to Solr-220


10:18:42.426823 IP solr-221:54341 > solr-220:8983:


POST /solr/journals_stage_shard1_replica1/update HTTP/1.1
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
User-Agent: 
Solr[org.apache.solr.client.solrj.impl.HttpSolrClient]
 1.0
Content-Length: 108
Host: solr-220:8983
Connection: Keep-Alive


commit_end_point=true=false=true=false=true=javabin=2


Solr-220 back to Solr-221


IP solr-220:8983 > solr-221:54341: Flags [P.], seq 1:5152, ack 385, win 235, 
options [nop,nop,
TS val 85813 ecr 858107069], length 5151
..HTTP/1.1 500 Server Error
Content-Type: application/octet-stream
Content-Length: 5060


.responseHeader..%QTimeC.%error..#msg?.For input string: 
"1578578283947098112".%trace?.: For
input string: "1578578283947098112"
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:583)
at java.lang.Integer.parseInt(Integer.java:615)
at 
org.apache.lucene.queries.function.docvalues.IntDocValues.getRangeScorer(IntDocValues.java:89)
at 
org.apache.solr.search.function.ValueSourceRangeFilter$1.iterator(ValueSourceRangeFilter.java:83)
at 
org.apache.solr.search.SolrConstantScoreQuery$ConstantWeight.scorer(SolrConstantScoreQuery.java:100)
at org.apache.lucene.search.Weight.scorerSupplier(Weight.java:126)
at 
org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:400)
at org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:381)
at 
org.apache.solr.update.DeleteByQueryWrapper$1.scorer(DeleteByQueryWrapper.java:90)
at 
org.apache.lucene.index.BufferedUpdatesStream.applyQueryDeletes(BufferedUpdatesStream.java:709)

at 
org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:267)




VPAT / 508 compliance information?

2016-10-11 Thread Bill Yosmanovich
Would anyone happen to know if SOLR has a VPAT or where I could obtain any 
Section 508 compliance information?

Thanks!
Bill Yosmanovich



VPAT?

2016-10-11 Thread Bill Yosmanovich
Would anyone happen to know if SOLR has a VPAT or where I could obtain any 
Section 508 compliance information?

Thanks!
Bill Yosmanovich


Re: admin-extra

2015-10-11 Thread Bill Au
admin-extra allows one to include additional links and/or information in
the Solr admin main page:

https://cwiki.apache.org/confluence/display/solr/Core-Specific+Tools

Bill

On Wed, Oct 7, 2015 at 5:40 PM, Upayavira <u...@odoko.co.uk> wrote:

> Do you use admin-extra within the admin UI?
>
> If so, please go to [1] and document your use case. The feature
> currently isn't implemented in the new admin UI, and without use-cases,
> it likely won't be - so if you want it in there, please help us
> understand how you use it!
>
> Thanks!
>
> Upayavira
>
> [1] https://issues.apache.org/jira/browse/SOLR-8140
>


Re: Best Indexing Approaches - To max the throughput

2015-10-06 Thread Bill Dueber
Just to add...my informal tests show that batching has way more effect
than solrj vs json.

I haven't look at CUSC in a while, last time I looked it was impossible to
do anything smart about error handling, so check that out before you get
too deeply into it. We use a strategy of sending a batch of json documents,
and if it returns an error sending each record one at a time until we find
the bad one and can log something useful.



On Mon, Oct 5, 2015 at 12:07 PM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> Thanks Erick,
> you confirmed my impressions!
> Thank you very much for the insights, an other opinion is welcome :)
>
> Cheers
>
> 2015-10-05 14:55 GMT+01:00 Erick Erickson <erickerick...@gmail.com>:
>
> > SolrJ tends to be faster for several reasons, not the least of which
> > is that it sends packets to Solr in a more efficient binary format.
> >
> > Batching is critical. I did some rough tests using SolrJ and sending
> > docs one at a time gave a throughput of < 400 docs/second.
> > Sending 10 gave 2,300 or so. Sending 100 at a time gave
> > over 5,300 docs/second. Curiously, 1,000 at a time gave only
> > marginal improvement over 100. This was with a single thread.
> > YMMV of course.
> >
> > CloudSolrClient is definitely the better way to go with SolrCloud,
> > it routes the docs to the correct leader instead of having the
> > node you send the docs to do the routing.
> >
> > Best,
> > Erick
> >
> > On Mon, Oct 5, 2015 at 4:57 AM, Alessandro Benedetti
> > <abenede...@apache.org> wrote:
> > > I was doing some studies and analysis, just wondering in your opinion
> > which
> > > one is the best approach to use to index in Solr to reach the best
> > > throughput possible.
> > > I know that a lot of factor are affecting Indexing time, so let's only
> > > focus in the feeding approach.
> > > Let's isolate different scenarios :
> > >
> > > *Single Solr Infrastructure*
> > >
> > > 1) Xml/Json batch request to /update IndexHandler (xml/json)
> > >
> > > 2) SolrJ ConcurrentUpdateSolrClient ( javabin)
> > > I was thinking this to be the fastest approach for a multi threaded
> > > indexing application.
> > > Posting batch of docs if possible per request.
> > >
> > > *Solr Cloud*
> > >
> > > 1) Xml/Json batch request to /update IndexHandler(xml/json)
> > >
> > > 2) SolrJ ConcurrentUpdateSolrClient ( javabin)
> > >
> > > 3) CloudSolrClient ( javabin)
> > > it seems the best approach accordingly to this improvements [1]
> > >
> > > What are your opinions ?
> > >
> > > A bonus observation should be for using some Map/Reduce big data
> indexer,
> > > but let's assume we don't have a big cluster of cpus, but the average
> > > Indexer server.
> > >
> > >
> > > [1]
> > >
> >
> https://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
> > >
> > >
> > > Cheers
> > >
> > >
> > > --
> > > --
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


Way to determine (via analyzer) what fields/types will be created for a given field name?

2015-09-30 Thread Bill Dueber
Let’s say I have





[I
​started thinking this sort of thing
 through a while back
<http://robotlibrarian.billdueber.com/2014/10/schemaless-solr-with-dynamicfield-and-copyfield/>
]

If I index a field named lastname_st, I end up with:

   - field lastname_t of type text
   - field lastname of type string

*​​*
*Is there any way for me to query solr to find out what​ fields and
fieldtypes​ it’s going to ​produce, in the way the analysis handlers can
show me transformations and so on?*

—
Bill Dueber
Library Systems Programmer
University of Michigan Library
​


solrcloud and core swapping

2015-08-28 Thread Bill Au
Is core swapping supported in SolrCloud?  If I have a 5 nodes SolrCloud
cluster and I do a core swap on the leader, will the core be swapped on the
other 4 nodes as well?  Or do I need to do a core swap on each node?

Bill


TimeAllowed bug

2015-08-24 Thread Bill Bell
Weird fq caching bug when using timeAllowed

Find a pwid (in this case YLGVQ)
Run a query w/ a FQ on the pwid and timeAllowed=1.
http://hgsolr2devsl.healthgrades.com:8983/solr/providersearch/select/?q=*:*wt=jsonfl=pwidfq=pwid:YLGVQtimeAllowed=1
Ensure #2 returns 0 results
Rerun the query without the timeAllowed param.
http://hgsolr2devsl.healthgrades.com:8983/solr/providersearch/select/?q=*:*wt=jsonfl=pwidfq=pwid:YLGVQ
Note that after removing the timeAllowed parameter the query is still returning 
0 results.

 Solr seems to be caching the FQ when the timeAllowed parameter is present.


Bill Bell
Sent from mobile



Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Bill Bell
We use 8gb to 10gb for those size indexes all the time.


Bill Bell
Sent from mobile


 On Aug 23, 2015, at 8:52 AM, Shawn Heisey apa...@elyograg.org wrote:
 
 On 8/22/2015 10:28 PM, Zheng Lin Edwin Yeo wrote:
 Hi Shawn,
 
 Yes, I've increased the heap size to 4GB already, and I'm using a machine
 with 32GB RAM.
 
 Is it recommended to further increase the heap size to like 8GB or 16GB?
 
 Probably not, but I know nothing about your data.  How many Solr docs
 were created by indexing 1GB of data?  How much disk space is used by
 your Solr index(es)?
 
 I know very little about clustering, but it looks like you've gotten a
 reply from Toke, who knows a lot more about that part of the code than I do.
 
 Thanks,
 Shawn
 


Re: solr multicore vs sharding vs 1 big collection

2015-08-03 Thread Bill Bell
Yeah a separate by month or year is good and can really help in this case.

Bill Bell
Sent from mobile


 On Aug 2, 2015, at 5:29 PM, Jay Potharaju jspothar...@gmail.com wrote:
 
 Shawn,
 Thanks for the feedback. I agree that increasing timeout might alleviate
 the timeout issue. The main problem with increasing timeout is the
 detrimental effect it will have on the user experience, therefore can't
 increase it.
 I have looked at the queries that threw errors, next time I try it
 everything seems to work fine. Not sure how to reproduce the error.
 My concern with increasing the memory to 32GB is what happens when the
 index size grows over the next few months.
 One of the other solutions I have been thinking about is to rebuild
 index(weekly) and create a new collection and use it. Are there any good
 references for doing that?
 Thanks
 Jay
 
 On Sun, Aug 2, 2015 at 10:19 AM, Shawn Heisey apa...@elyograg.org wrote:
 
 On 8/2/2015 8:29 AM, Jay Potharaju wrote:
 The document contains around 30 fields and have stored set to true for
 almost 15 of them. And these stored fields are queried and updated all
 the
 time. You will notice that the deleted documents is almost 30% of the
 docs.  And it has stayed around that percent and has not come down.
 I did try optimize but that was disruptive as it caused search errors.
 I have been playing with merge factor to see if that helps with deleted
 documents or not. It is currently set to 5.
 
 The server has 24 GB of memory out of which memory consumption is around
 23
 GB normally and the jvm is set to 6 GB. And have noticed that the
 available
 memory on the server goes to 100 MB at times during a day.
 All the updates are run through DIH.
 
 Using all availble memory is completely normal operation for ANY
 operating system.  If you hold up Windows as an example of one that
 doesn't ... it lies to you about available memory.  All modern
 operating systems will utilize memory that is not explicitly allocated
 for the OS disk cache.
 
 The disk cache will instantly give up any of the memory it is using for
 programs that request it.  Linux doesn't try to hide the disk cache from
 you, but older versions of Windows do.  In the newer versions of Windows
 that have the Resource Monitor, you can go there to see the actual
 memory usage including the cache.
 
 Every day at least once i see the following error, which result in search
 errors on the front end of the site.
 
 ERROR org.apache.solr.servlet.SolrDispatchFilter -
 null:org.eclipse.jetty.io.EofException
 
 From what I have read these are mainly due to timeout and my timeout is
 set
 to 30 seconds and cant set it to a higher number. I was thinking maybe
 due
 to high memory usage, sometimes it leads to bad performance/errors.
 
 Although this error can be caused by timeouts, it has a specific
 meaning.  It means that the client disconnected before Solr responded to
 the request, so when Solr tried to respond (through jetty), it found a
 closed TCP connection.
 
 Client timeouts need to either be completely removed, or set to a value
 much longer than any request will take.  Five minutes is a good starting
 value.
 
 If all your client timeout is set to 30 seconds and you are seeing
 EofExceptions, that means that your requests are taking longer than 30
 seconds, and you likely have some performance issues.  It's also
 possible that some of your client timeouts are set a lot shorter than 30
 seconds.
 
 My objective is to stop the errors, adding more memory to the server is
 not
 a good scaling strategy. That is why i was thinking maybe there is a
 issue
 with the way things are set up and need to be revisited.
 
 You're right that adding more memory to the servers is not a good
 scaling strategy for the general case ... but in this situation, I think
 it might be prudent.  For your index and heap sizes, I would want the
 company to pay for at least 32GB of RAM.
 
 Having said that ... I've seen Solr installs work well with a LOT less
 memory than the ideal.  I don't know that adding more memory is
 necessary, unless your system (CPU, storage, and memory speeds) is
 particularly slow.  Based on your document count and index size, your
 documents are quite small, so I think your memory size is probably good
 -- if the CPU, memory bus, and storage are very fast.  If one or more of
 those subsystems aren't fast, then make up the difference with lots of
 memory.
 
 Some light reading, where you will learn why I think 32GB is an ideal
 memory size for your system:
 
 https://wiki.apache.org/solr/SolrPerformanceProblems
 
 It is possible that your 6GB heap is not quite big enough for good
 performance, or that your GC is not well-tuned.  These topics are also
 discussed on that wiki page.  If you increase your heap size, then the
 likelihood of needing more memory in the system becomes greater, because
 there will be less memory available for the disk cache.
 
 Thanks,
 Shawn
 
 
 -- 
 Thanks
 Jay Potharaju


Re: Nested objects in Solr

2015-07-24 Thread Bill Au
What exactly do you mean by nested objects in Solr.  It would help if you
give an example.  The Solr schema is flat as far as I know.

Bill

On Fri, Jul 24, 2015 at 9:24 AM, Rajesh rajesh.panneersel...@aspiresys.com
wrote:

 You can use nested entities like below.

 document
 entity name=OuterEntity pk=id
 query=SELECT * FROM User
  field column=id name=id /
 field column=name name=name /

 entity name=InnerEntity child=true
 query=select * from subject 
/entity
 /entity
 /document




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Nested-objects-in-Solr-tp4213212p4219039.html
 Sent from the Solr - User mailing list archive at Nabble.com.



DIH question: importing string containing comma-delimited list into a multiValued field

2015-07-17 Thread Bill Au
One of my database column is a varchar containing a comma-delimited list of
values.  I wold like to import these values into a multiValued field.  I
figure that I will need to write a ScriptTransformer to do that.  Is there
a better way?

Bill


Re: Division with Stats Component when Grouping in Solr

2015-06-13 Thread Bill Bell
It would be cool to be able to set 2 group by with facets 

 GROUP BY
site_id, keyword


Bill Bell
Sent from mobile


On Jun 13, 2015, at 2:28 PM, Yonik Seeley ysee...@gmail.com wrote:

 GROUP BY
site_id, keyword


Added As Editor

2015-05-18 Thread Bill Trembley
As large users of Solr/Lucene for many of our existing sites
(BoatTrader.com, GetAuto.com, ForRent.com, etc) we would like for the
ability to contribute to the wiki as we come across items.   My current
wiki login is BillTrembley (bill.tremb...@gmail.com).

Thanks,

Bill Trembley

Director of Product Development and Technology

Dominion Performance Network
150 Granby Street

Norfolk, VA 23510

p(757) 351-7648

c(757) 575-0582

bill.tremb...@dominionenterprises.com

Skype: bill.trembley


Re: SolrCloud indexing

2015-05-12 Thread Bill Au
Thanks for the reply.

Actually in our case we want the timestamp to be populated locally on each
node in the SolrCloud cluster.  We want to see if there is any delay in the
document being distributed within the cluster.  Just want to confirm that
the timestamp can be use for that purpose.

Bill

On Sat, May 9, 2015 at 11:37 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 5/9/2015 8:41 PM, Bill Au wrote:
  Is the behavior of document being indexed independently on each node in a
  SolrCloud cluster new in 5.x or is that true in 4.x also?
 
  If the document is indexed independently on each node, then if I query
 the
  document from each node directly, a timestamp could hold different values
  since the document is indexed independently, right?
 
  field name=timestamp type=date indexed=true stored=true
  default=NOW /

 SolrCloud has had that behavior from day one, when it was released in
 version 4.0.  You are correct that it can result in a different
 timestamp on each replica if the default comes from schema.xml.

 I am pretty sure that the solution for this problem is to set up an
 update processor chain that includes TimestampUpdateProcessorFactory to
 populate the timestamp field before the document is distributed to each
 replica.

 https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors

 Thanks,
 Shawn




Re: SolrCloud indexing

2015-05-09 Thread Bill Au
Is the behavior of document being indexed independently on each node in a
SolrCloud cluster new in 5.x or is that true in 4.x also?

If the document is indexed independently on each node, then if I query the
document from each node directly, a timestamp could hold different values
since the document is indexed independently, right?

field name=timestamp type=date indexed=true stored=true
default=NOW /

Bill

On Fri, May 8, 2015 at 6:39 PM, Vincenzo D'Amore v.dam...@gmail.com wrote:

 I have just added a comment to the CWiki.
 Thanks again for your prompt answer Erick.

 Best,
 Vincenzo

 On Fri, May 8, 2015 at 12:39 AM, Erick Erickson erickerick...@gmail.com
 wrote:

  bq: ...forwards the index notation to itself and any replicas...
 
  That's just odd phrasing.
 
  All that means is that the document sent through the indexing process
  on the leader and all followers for a shard and
  is indexed independently on each.
 
  This is as opposed to the old master/slave situation where the master
  indexed the doc, but the slave got the indexed
  version as part of a segment when it replicated.
 
  Could you add a comment to the CWiki calling the phrasing out? It
  really is a bit mysterious.
 
  Best,
  Erick
 
  On Thu, May 7, 2015 at 2:18 PM, Vincenzo D'Amore v.dam...@gmail.com
  wrote:
   Thanks Shawn.
  
   Just to make the picture more clear, I'm trying to understand why a 3
  node
   solrcloud cluster and a old style solr server take same time to index
  same
   documents.
  
   But in the wiki is written:
  
   If the machine is a leader, SolrCloud determines which shard the
 document
   should go to, forwards the document the leader for that shard, indexes
  the
   document for this shard, and *forwards the index notation to itself
 and
   any replicas*.
  
  
  
 
 https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
  
  
   Could you please explain what does it mean forwards the index
 notation
  ?
  
   On the other hand, on solrcloud I have 3 shards and 2 replicas for each
   shard. So, every node is indexing all the documents and this explains
 why
   solrcloud consumes same time compared to an old-style solr server.
  
  
  
   On Thu, May 7, 2015 at 3:08 PM, Shawn Heisey apa...@elyograg.org
  wrote:
  
   On 5/7/2015 3:04 AM, Vincenzo D'Amore wrote:
Thanks Erick. I'm not sure I got your answer.
   
I try to recap, when the raw document has to be indexed, it will be
forwarded to shard leader. Shard leader indexes the document for
 that
shard, and then forwards the indexed document to any replicas.
   
I want just be sure that when the raw document is forwarded from the
   leader
to the replicas it will be indexed only one time on the shard
 leader.
   From
what I understand replicas do not indexes, only the leader indexes.
  
   The document is indexed by all replicas.  There is no way to forward
 the
   indexed document, it can only forward the source document ... so each
   replica must index it independently.
  
   The old-style master-slave replication (which existed long before
   SolrCloud) copies the finished Lucene segments, so only the master
   actually does indexing.
  
   SolrCloud doesn't have a master, only multiple replicas, one of which
 is
   elected leader, and replication only comes into the picture if
 there's a
   serious problem and Solr determines that it can't use the transaction
   log to recover the index.
  
   Thanks,
   Shawn
  
  
  
  
   --
   Vincenzo D'Amore
   email: v.dam...@gmail.com
   skype: free.dev
   mobile: +39 349 8513251
 



 --
 Vincenzo D'Amore
 email: v.dam...@gmail.com
 skype: free.dev
 mobile: +39 349 8513251



no subject

2015-04-22 Thread Bill Tsay


On 4/22/15, 7:36 AM, Martin Keller martin.kel...@unitedplanet.com
wrote:

OK, I found the problem and as so often it was sitting in front of the
display. 

Now the next problem:
The suggestions returned consist always of a complete text block where
the match was found. I would have expected a single word or a small
phrase.

Thanks in advance
Martin


 Am 22.04.2015 um 12:50 schrieb Martin Keller
martin.kel...@unitedplanet.com:
 
 Unfortunately, setting suggestAnalyzerFieldType to text_suggest
didn’t change anything.
 The suggest dictionary is freshly built.
 As I mentioned before, only words or phrases of the source field
„content“ are not matched.
 When querying the index, the response only contains „suggestions“ field
data not coming from the „content“ field.
 The complete schema is a slightly modified techproducts schema.
 „Normal“ searching for words which I would expect coming from „content“
works.
 
 Any more ideas?
 
 Thanks 
 Martin
 
 
 Am 21.04.2015 um 17:39 schrieb Erick Erickson
erickerick...@gmail.com:
 
 Did you build your suggest dictionary after indexing? Kind of a shot
in the
 dark but worth a try.
 
 Note that the suggest field of your suggester isn't using your
text_suggest
 field type to make suggestions, it's using text_general. IOW, the
text may
 not be analyzed as you expect.
 
 Best,
 Erick
 
 On Tue, Apr 21, 2015 at 7:16 AM, Martin Keller
 martin.kel...@unitedplanet.com wrote:
 Hello together,
 
 I have some problems with the Solr 5.1.0 suggester.
 I followed the instructions in
https://cwiki.apache.org/confluence/display/solr/Suggester and also
tried the techproducts example delivered with the binary package,
which is working well.
 
 I added a field suggestions-Field to the schema:
 
 field name=suggestions type=text_suggest indexed=true
stored=true multiValued=true“/
 
 
 And added some copies to the field:
 
 copyField source=content dest=suggestions/
 copyField source=title dest=suggestions/
 copyField source=author dest=suggestions/
 copyField source=description dest=suggestions/
 copyField source=keywords dest=suggestions/
 
 
 The field type definition for „text_suggest“ is pretty simple:
 
 fieldType name=text_suggest class=solr.TextField
positionIncrementGap=100
   analyzer
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt /
   filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType
 
 
 I Also changed the solrconfig.xml to use the suggestions field:
 
 searchComponent class=solr.SuggestComponent name=suggest
 lst name=suggester
   str name=namemySuggester/str
   str name=lookupImplFuzzyLookupFactory/str
   str name=dictionaryImplDocumentDictionaryFactory/str
   str name=fieldsuggestions/str
   str name=suggestAnalyzerFieldTypetext_general/str
   str name=buildOnStartupfalse/str
 /lst
 /searchComponent
 
 
 For Tokens original coming from „title or „author“, I get
suggestions, but not any from the content field.
 So, what do I have to do?
 
 Any help is appreciated.
 
 
 Martin
 
 




Re: Facet

2015-04-05 Thread Bill Bell
Ok

Clarification

The limit is set to -1. But the average result is 300. 

The amount of strings stored in the field increased a lot. Like 250k to 350k. 
But the amount coming out is limited by facet.prefix. 

Would creating 900 fields be better ? Then I could just put the prefix in the 
field name. Like this: proc_ps122

Thoughts ?

So far I heard solcloud, docvalues as viable solutions. Stay away from enum.

Bill Bell
Sent from mobile


 On Apr 5, 2015, at 2:56 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote:
 
 William Bell billnb...@gmail.com wrote:
 Sent: 05 April 2015 06:20
 To: solr-user@lucene.apache.org
 Subject: Facet
 
 We increased our number of terms (String) in a facet by 50,000.
 
 Do you mean facet.limit=5?
 
 Now we are getting an error when we facet by this field - so we switched it 
 to
 facet.method=enum, and now the results come back. However, when we put
 it into production we literally hit a wall (CPU went to 100% for 16 cores)
 after about 30 minutes live.
 
 It was strange that enum worked. Internally, the difference between 
 facet.limit=100 and facet.limit=5 is quite small. The real hits are for 
 fine-counting within SolrCloud and serializing the result in order to deliver 
 it to the client. I thought enum behaved the same as fc with regard to those 
 two.
 
 We tried adding more machines to reduce the CPU, but it did not help.
 
 Sounds like SolrCloud. More machines does not help here, it might even be 
 worse. What happens is that distributed faceting is two-phase, where the 
 second phase is fine-counting. The fine-counting essentially makes all shards 
 perform micro-searches for a large part of the terms returned: Your shards 
 are bogged down by tens of thousands of small searches.
 
 If you are feeling adventurous, you can try putting
 http://tokee.github.io/lucene-solr/
 on a test-installation (I am the author). It changes the way the 
 fine-counting is done.
 
 
 Depending on your container, you might need to raise the internal limits for 
 GET-communication. Tomcat has a default of 2MB somewhere (sorry, don't 
 remember the details), which is not a lot for 50,000 values.
 
 What are some ideas? We are going to try docValues on the field. Does
 anyone know if method=fc or method=enum works for docValue? I cannot find
 any documentation on that.
 
 If DocValues are enabled, fc will use them. It does not change anything for 
 enum. But I would argue against enum for anything in the thousands anyway.
 
 We are thinking of splitting the field into 2 fields (fielda, fieldb). At
 least the number will be less, but not sure if it will help memory?
 
 The killer is the number of terms requested/returned.
 
 The weird thing is for the first 30 minutes things are performing great.
 Literally at like 10% CPU across 16 cores, not much memory and normal GC.
 
 It might be because you have just been lucky. Take a look at
 https://twitter.com/anjacks0n/status/509284768035262464
 for how different performance can be for different result set sizes.
 
 Originally the facet was a method=fc. Is there an issue with enum? We have
 facet.threads=20 set, and not sure this is wise for a enum ?
 
 Facet threading does not thread within each field, it just means that 
 multiple fields are processed in parallel.
 
 - Toke Eskildsen


Re: ZFS File System for SOLR 3.6 and SOLR 4

2015-03-28 Thread Bill Bell
Is the an advantage for Xfs over ext4 for Solr ? Anyone done testing?

Bill Bell
Sent from mobile


 On Mar 27, 2015, at 8:14 AM, Shawn Heisey apa...@elyograg.org wrote:
 
 On 3/27/2015 12:30 AM, abhi Abhishek wrote:
 i am trying to use ZFS as filesystem for my Linux Environment. are
 there any performance implications of using any filesystem other than
 ext-3/ext-4 with SOLR?
 
 That should work with no problem.
 
 The only time Solr tends to have problems is if you try to use a network
 filesystem.  As long as it's a local filesystem and it implements
 everything a program can typically expect from a local filesystem, Solr
 should work perfectly.
 
 Because of the compatibility problems that the license for ZFS has with
 the GPL, ZFS on Linux is probably not as well tested as other
 filesystems like ext4, xfs, or btrfs, but I have not heard about any big
 problems, so it's probably safe.
 
 Thanks,
 Shawn
 


Re: How to boost documents at index time?

2015-03-28 Thread Bill Bell
Issue a Jura ticket ?

Did you try debugQuery ?

Bill Bell
Sent from mobile


 On Mar 28, 2015, at 1:49 AM, CKReddy Bhimavarapu chaitu...@gmail.com wrote:
 
 I am want to boost docs at index time, I am doing this using boost
 parameter in doc field doc boost=2.0.
 but I can't see direct impact on the  doc by using  debuQuery.
 
 My question is that is there any other way to boost doc at index time and
 can see the reflected changes i.e direct impact.
 
 -- 
 ckreddybh. chaitu...@gmail.com


Re: Sort on multivalued attributes

2015-02-09 Thread Bill Bell
Definitely needed !!

Bill Bell
Sent from mobile


 On Feb 9, 2015, at 5:51 AM, Jan Høydahl jan@cominvent.com wrote:
 
 Sure, vote for it. Number of votes do not directly make prioritized sooner.
 So you better also add a comment to the JIRA, it will raise committer's 
 attention.
 Even better of course is if you are able to help bring the issue forward by 
 submitting patches.
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 
 9. feb. 2015 kl. 12.15 skrev Flavio Pompermaier pomperma...@okkam.it:
 
 Do I have to vote for it..?
 
 On Mon, Feb 9, 2015 at 11:50 AM, Jan Høydahl jan@cominvent.com wrote:
 
 See https://issues.apache.org/jira/browse/SOLR-2522
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 
 9. feb. 2015 kl. 10.30 skrev Flavio Pompermaier pomperma...@okkam.it:
 
 In my use case it could be very helpful because I use the SIREn plugin to
 index arbitrary JSON-LD and this plugin automatically index also all
 nested
 attributes as a Solr field.
 Thus I need for example to gather all entries with a certain value of the
 type attribute, ordered by name (but name could be a multivalued
 attribute in my use case :( )
 I'd like to avoid to switch to Elasticsearch just to have this single
 feature.
 
 Thanks for the support,
 Flavio
 
 On Mon, Feb 9, 2015 at 10:02 AM, Anshum Gupta ans...@anshumgupta.net
 wrote:
 
 Sure, that's correct and makes sense in some use cases. I'll need to
 check
 if Solr functions support such a thing.
 
 On Mon, Feb 9, 2015 at 12:47 AM, Flavio Pompermaier 
 pomperma...@okkam.it
 wrote:
 
 I saw that this is possible in Lucene (
 https://issues.apache.org/jira/browse/LUCENE-5454) and also in
 Elasticsearch. Or am I wrong?
 
 On Mon, Feb 9, 2015 at 9:05 AM, Anshum Gupta ans...@anshumgupta.net
 wrote:
 
 Unless I'm missing something here, sorting on a multi-valued field
 would
 be
 non-deterministic in nature.
 
 On Sun, Feb 8, 2015 at 11:59 PM, Flavio Pompermaier 
 pomperma...@okkam.it
 wrote:
 
 Hi to all,
 
 Is there any possibility that in the near future Solr could support
 sorting
 on multivalued fields?
 
 Best,
 Flavio
 
 
 
 --
 Anshum Gupta
 http://about.me/anshumgupta
 
 
 
 --
 Anshum Gupta
 http://about.me/anshumgupta
 


Re: Collations are not working fine.

2015-02-09 Thread Bill Bell
Can you order the collation a by highest to lowest hits ?

Bill Bell
Sent from mobile


 On Feb 9, 2015, at 6:47 AM, Nitin Solanki nitinml...@gmail.com wrote:
 
 I am working on spell checking in Solr. I have implemented Suggestions and
 collations in my spell checker component.
 
 Most of the time collations work fine but in few case it fails.
 
 *Working*:
 I tried query:*gone wthh thes wnd*: In this wnd doesn't give suggestion
 wind but collation is coming right = gone with the wind, hits = 117
 
 
 *Not working:*
 But when I tried query: *gone wthh thes wint*: In this wint does give
 suggestion wind but collation is not coming right. Instead of gone with
 the wind it gives gone with the west, hits = 1.
 
 And I want to also know what is *hits* in collations.


timestamp field and atomic updates

2015-01-30 Thread Bill Au
I have a timestamp field in my schema to track when each doc was indexed:

field name=timestamp type=date indexed=true stored=true
default=NOW multiValued=false /

Recently, we have switched over to use atomic update instead of re-indexing
when we need to update a doc in the index.  It looks to me that the
timestamp field is not updated during an atomic update.  I have also looked
into TimestampUpdateProcessorFactory and it looks to me that won't help in
my case.

Is there anything within Solr that I can use to update the timestamp during
atomic update, or do I have to explicitly include the timestamp field as
part of the atomic update?

Bill


Re: How large is your solr index?

2015-01-03 Thread Bill Bell
For Solr 5 why don't we switch it to 64 bit ??

Bill Bell
Sent from mobile


 On Dec 29, 2014, at 1:53 PM, Jack Krupansky jack.krupan...@gmail.com wrote:
 
 And that Lucene index document limit includes deleted and updated
 documents, so even if your actual document count stays under 2^31-1,
 deleting and updating documents can push the apparent document count over
 the limit unless you very aggressively merge segments to expunge deleted
 documents.
 
 -- Jack Krupansky
 
 -- Jack Krupansky
 
 On Mon, Dec 29, 2014 at 12:54 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
 When you say 2B docs on a single Solr instance, are you talking only one
 shard?
 Because if you are, you're very close to the absolute upper limit of a
 shard, internally
 the doc id is an int or 2^31. 2^31 + 1 will cause all sorts of problems.
 
 But yeah, your 100B documents are going to use up a lot of servers...
 
 Best,
 Erick
 
 On Mon, Dec 29, 2014 at 7:24 AM, Bram Van Dam bram.van...@intix.eu
 wrote:
 Hi folks,
 
 I'm trying to get a feel of how large Solr can grow without slowing down
 too
 much. We're looking into a use-case with up to 100 billion documents
 (SolrCloud), and we're a little afraid that we'll end up requiring 100
 servers to pull it off.
 
 The largest index we currently have is ~2billion documents in a single
 Solr
 instance. Documents are smallish (5k each) and we have ~50 fields in the
 schema, with an index size of about 2TB. Performance is mostly OK. Cold
 searchers take a while, but most queries are alright after warming up. I
 wish I could provide more statistics, but I only have very limited
 access to
 the data (...banks...).
 
 I'd very grateful to anyone sharing statistics, especially on the larger
 end
 of the spectrum -- with or without SolrCloud.
 
 Thanks,
 
 - Bram
 


Re: Old facet value doesn't go away after index update

2014-12-19 Thread Bill Bell
Set mincount=1

Bill Bell
Sent from mobile


 On Dec 19, 2014, at 12:22 PM, Tang, Rebecca rebecca.t...@ucsf.edu wrote:
 
 Hi there,
 
 I have an index that has a field called collection_facet.
 
 There was a value 'Ness Motley Law Firm Documents' that we wanted to update 
 to 'Ness Motley Law Firm'.  There were 36,132 records with this value.  So I 
 re-indexed just the 36,132 records.  After the update, I ran a facet query 
 (q=*:*facet=truefacet.field=collection_facet) to see if the value got 
 updated and I saw
 Ness Motley Law Firm 36,132  -- as expected
 Ness Motley Law Firm Documents 0 — Why is this value still here even though 
 clearly there are no records with this value anymore?  I thought maybe it was 
 cached, so I restarted solr, but I still got the same results.
 
 facet_fields: { collection_facet: [
 … Ness Motley Law Firm, 36132,
 … Ness Motley Law Firm Documents, 0 ]
 
 
 
 Rebecca Tang
 Applications Developer, UCSF CKM
 Legacy Tobacco Document Librarylegacy.library.ucsf.edu/
 E: rebecca.t...@ucsf.edu


Too much Lucene code to refactor but I like SolrCloud

2014-12-03 Thread Bill Drake
I have an existing application that includes Lucene code. I want to add
high availability. From what I have read SolrCloud looks like an effective
approach. My problem is that there is a lot of Lucene code; out of 100+
java files in the application more than 20 of them are focused on Lucene
code.  Refactoring this much code seems very risky.

My thought was to migrate the index from Lucene 35 to Solr/Lucene 4.10 then
after making sure everything still works I would add in the HA. I looked at
solrj with the hope that it would look like Lucene but it did not look like
it would simplify the transition.

So the question is can I leave most of the existing Lucene code and just
make small changes to get the benefit of HA from SolrCloud. Is there a
better approach?


fq syntax for requiring all multiValued field values to be within a list?

2014-09-27 Thread White, Bill
Hello,

I've attempted to figure this out from reading the documentation but
without much luck.  I looked for a comprehensive query syntax specification
(e.g., with BNF and a list of operator semantics) but I'm unable to find
such a document (does such a thing exist? or is the syntax too much of a
moving target?)

I'm using 4.6.1, if that makes a difference, though upgrading is an option
if it necessary to make this work.

I've got a multiValued field color, which describes the colors of item in
the database.  Items can have zero or more colors.  What I want is to be
able to filter out all hits that contain colors not within a constraining
list, i.e., something like

NOT (color NOT IN (red,yellow,green)).

So the following would be passed by the filter:
(no value for 'color')
color: red
color: red, color: green

whereas these would be excluded:
color: red, color: blue
color: magenta


Nothing I've come up with so far, e.g. -(-color: red -color: green),
seems to work.

I've also looked into using a function query but it seems to lack operators
for dealing with string multivalued fields.

Ideas?

Thanks,
Bill


Re: fq syntax for requiring all multiValued field values to be within a list?

2014-09-27 Thread White, Bill
Not just that.  I'm looking for things which match either red or yellow or
green, but do NOT match ANY other color.  I can probably drop the
requirement related to having no color.


On Sat, Sep 27, 2014 at 3:28 PM, Yonik Seeley yo...@heliosearch.com wrote:

 On Sat, Sep 27, 2014 at 2:52 PM, White, Bill bwh...@ptfs.com wrote:
  Hello,
 
  I've attempted to figure this out from reading the documentation but
  without much luck.  I looked for a comprehensive query syntax
 specification
  (e.g., with BNF and a list of operator semantics) but I'm unable to find
  such a document (does such a thing exist? or is the syntax too much of a
  moving target?)
 
  I'm using 4.6.1, if that makes a difference, though upgrading is an
 option
  if it necessary to make this work.
 
  I've got a multiValued field color, which describes the colors of item
 in
  the database.  Items can have zero or more colors.  What I want is to be
  able to filter out all hits that contain colors not within a constraining
  list, i.e., something like
 
  NOT (color NOT IN (red,yellow,green)).
 
  So the following would be passed by the filter:
  (no value for 'color')
  color: red
  color: red, color: green
 
  whereas these would be excluded:
  color: red, color: blue
  color: magenta

 You're looking for things that either match red, yellow, or green, or
 have no color:

 color:(red yellow green) OR (*:* -color:*)

 -Yonik
 http://heliosearch.org - native code faceting, facet functions,
 sub-facets, off-heap data



Re: fq syntax for requiring all multiValued field values to be within a list?

2014-09-27 Thread White, Bill
Sorry, color is multivalued, so a given record might be both blue and red.
I don't want those to show up in the results.

On Sat, Sep 27, 2014 at 3:36 PM, White, Bill bwh...@ptfs.com wrote:

 Not just that.  I'm looking for things which match either red or yellow or
 green, but do NOT match ANY other color.  I can probably drop the
 requirement related to having no color.


 On Sat, Sep 27, 2014 at 3:28 PM, Yonik Seeley yo...@heliosearch.com
 wrote:

 On Sat, Sep 27, 2014 at 2:52 PM, White, Bill bwh...@ptfs.com wrote:
  Hello,
 
  I've attempted to figure this out from reading the documentation but
  without much luck.  I looked for a comprehensive query syntax
 specification
  (e.g., with BNF and a list of operator semantics) but I'm unable to find
  such a document (does such a thing exist? or is the syntax too much of a
  moving target?)
 
  I'm using 4.6.1, if that makes a difference, though upgrading is an
 option
  if it necessary to make this work.
 
  I've got a multiValued field color, which describes the colors of
 item in
  the database.  Items can have zero or more colors.  What I want is to be
  able to filter out all hits that contain colors not within a
 constraining
  list, i.e., something like
 
  NOT (color NOT IN (red,yellow,green)).
 
  So the following would be passed by the filter:
  (no value for 'color')
  color: red
  color: red, color: green
 
  whereas these would be excluded:
  color: red, color: blue
  color: magenta

 You're looking for things that either match red, yellow, or green, or
 have no color:

 color:(red yellow green) OR (*:* -color:*)

 -Yonik
 http://heliosearch.org - native code faceting, facet functions,
 sub-facets, off-heap data





Re: fq syntax for requiring all multiValued field values to be within a list?

2014-09-27 Thread White, Bill
Hmm, that won't work since color is free-form.

Is there a way to invoke (via fq) a user-defined function (hopefully
defined as part of the fq syntax, but alternatively, written in Java) and
have it applied to the resultset?

On Sat, Sep 27, 2014 at 3:41 PM, Yonik Seeley yo...@heliosearch.com wrote:

 On Sat, Sep 27, 2014 at 3:36 PM, White, Bill bwh...@ptfs.com wrote:
  Sorry, color is multivalued, so a given record might be both blue and
 red.
  I don't want those to show up in the results.

 I think the only way currently (out of the box) is to enumerate the
 other possible colors to exclude them.

 color:(red yellow green)  -color:(blue cyan xxx)

 -Yonik
 http://heliosearch.org - native code faceting, facet functions,
 sub-facets, off-heap data



  On Sat, Sep 27, 2014 at 3:36 PM, White, Bill bwh...@ptfs.com wrote:
 
  Not just that.  I'm looking for things which match either red or yellow
 or
  green, but do NOT match ANY other color.  I can probably drop the
  requirement related to having no color.
 
  On Sat, Sep 27, 2014 at 3:28 PM, Yonik Seeley yo...@heliosearch.com
  wrote:
 
  On Sat, Sep 27, 2014 at 2:52 PM, White, Bill bwh...@ptfs.com wrote:
   Hello,
  
   I've attempted to figure this out from reading the documentation but
   without much luck.  I looked for a comprehensive query syntax
  specification
   (e.g., with BNF and a list of operator semantics) but I'm unable to
 find
   such a document (does such a thing exist? or is the syntax too much
 of a
   moving target?)
  
   I'm using 4.6.1, if that makes a difference, though upgrading is an
  option
   if it necessary to make this work.
  
   I've got a multiValued field color, which describes the colors of
  item in
   the database.  Items can have zero or more colors.  What I want is
 to be
   able to filter out all hits that contain colors not within a
  constraining
   list, i.e., something like
  
   NOT (color NOT IN (red,yellow,green)).
  
   So the following would be passed by the filter:
   (no value for 'color')
   color: red
   color: red, color: green
  
   whereas these would be excluded:
   color: red, color: blue
   color: magenta
 
  You're looking for things that either match red, yellow, or green, or
  have no color:
 
  color:(red yellow green) OR (*:* -color:*)
 
  -Yonik
  http://heliosearch.org - native code faceting, facet functions,
  sub-facets, off-heap data
 
 
 



Re: fq syntax for requiring all multiValued field values to be within a list?

2014-09-27 Thread White, Bill
OK, let me try phrasing it better.

How do I exclude from search, any result which contains any value for
multivalued field 'color' which is not within a given constraint set
(e.g., red, green, yellow, burnt sienna), given that I do not what
any of the other possible values of 'color' are?

In pseudocode:

for all x in result.color
if x not in (red,green,yellow, burnt sienna)
filter out result

I don't see how range queries would work since I have no control over the
possible values of 'color', e.g., there could be a valid color lemon
yellow between green and red, and I don't want a result which has
(color: red, color: lemon yellow)

On Sat, Sep 27, 2014 at 4:02 PM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 On Sat, Sep 27, 2014 at 11:36 PM, White, Bill bwh...@ptfs.com wrote:

  but do NOT match ANY other color.


 Bill, I miss the whole picture, it's worth to rephrase the problem in one
 sentence.
 But regarding the quote above, you can try to use exclusive ranges

 https://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Range_Searches
 fq=-color:({* TO green} {green TO red} {red TO *})
 just don't forget to build ranges alphabetically

 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com



Re: fq syntax for requiring all multiValued field values to be within a list?

2014-09-27 Thread White, Bill
Thanks!

On Sat, Sep 27, 2014 at 4:18 PM, Yonik Seeley yo...@heliosearch.com wrote:

 On Sat, Sep 27, 2014 at 3:46 PM, White, Bill bwh...@ptfs.com wrote:
  Hmm, that won't work since color is free-form.
 
  Is there a way to invoke (via fq) a user-defined function (hopefully
  defined as part of the fq syntax, but alternatively, written in Java) and
  have it applied to the resultset?

 https://wiki.apache.org/solr/SolrPlugins#QParserPlugin

 -Yonik
 http://heliosearch.org - native code faceting, facet functions,
 sub-facets, off-heap data



Re: fq syntax for requiring all multiValued field values to be within a list?

2014-09-27 Thread White, Bill
Hmm.  If I understand correctly this builds a set out of open intervals
(exclusive ranges), that's a great idea!

It doesn't seem to work for me, though;  fq=-color:({* TO red} {red TO *})
is giving me results with color=burnt sienna

The field is defined as field name=color type=string indexed=true
stored=true multiValued=true /

On Sat, Sep 27, 2014 at 4:43 PM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 indeed!
 the exclusive range {green TO red} matches to the lemon yellow
 hence, the negation suppresses it from appearing
 fq=-color:{green TO red}
 then you need to suppress eg black and white also
 fq=-color:({* TO green} {green TO red} {red TO *})

 I have no control over the
  possible values of 'color',

 You don't need to control possible values, you just suppressing any values
 beside of the given green and red.
 Mind that either green or red passes that negation of exclusive ranges
 disjunction.


 On Sun, Sep 28, 2014 at 12:15 AM, White, Bill bwh...@ptfs.com wrote:

  OK, let me try phrasing it better.
 
  How do I exclude from search, any result which contains any value for
  multivalued field 'color' which is not within a given constraint set
  (e.g., red, green, yellow, burnt sienna), given that I do not
 what
  any of the other possible values of 'color' are?
 
  In pseudocode:
 
  for all x in result.color
  if x not in (red,green,yellow, burnt sienna)
  filter out result
 
  I don't see how range queries would work since I have no control over the
  possible values of 'color', e.g., there could be a valid color lemon
  yellow between green and red, and I don't want a result which has
  (color: red, color: lemon yellow)
 
  On Sat, Sep 27, 2014 at 4:02 PM, Mikhail Khludnev 
  mkhlud...@griddynamics.com wrote:
 
   On Sat, Sep 27, 2014 at 11:36 PM, White, Bill bwh...@ptfs.com wrote:
  
but do NOT match ANY other color.
  
  
   Bill, I miss the whole picture, it's worth to rephrase the problem in
 one
   sentence.
   But regarding the quote above, you can try to use exclusive ranges
  
  
 
 https://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Range_Searches
   fq=-color:({* TO green} {green TO red} {red TO *})
   just don't forget to build ranges alphabetically
  
   --
   Sincerely yours
   Mikhail Khludnev
   Principal Engineer,
   Grid Dynamics
  
   http://www.griddynamics.com
   mkhlud...@griddynamics.com
  
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com



Re: fq syntax for requiring all multiValued field values to be within a list?

2014-09-27 Thread White, Bill
It worked for me once I changed to

-color:({* TO red} OR {red TO *})

I'm not sure why the OR is needed, maybe it's my version? (4.6.1)

On Sat, Sep 27, 2014 at 5:22 PM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 hm. try to convert it to query q=-color:({* TO red} {red TO *}) and check
 the explanation from debugQuery=true

 I tried to play with my index

   q: *:*,
   facet.field: swatchColors_string_mv,
   fq: -swatchColors_string_mv:({* TO RED} {RED TO *}),

 I got the following facets:

   facet_fields: {
   swatchColors_string_mv: [
 RED,
 122,
 BLACK,
 0,
 BLUE,
 0,
 BROWN,
 0,
 GREEN,
 0,

 so, it works for me at least...



 On Sun, Sep 28, 2014 at 12:54 AM, White, Bill bwh...@ptfs.com wrote:

  Hmm.  If I understand correctly this builds a set out of open intervals
  (exclusive ranges), that's a great idea!
 
  It doesn't seem to work for me, though;  fq=-color:({* TO red} {red TO
 *})
  is giving me results with color=burnt sienna
 
  The field is defined as field name=color type=string indexed=true
  stored=true multiValued=true /
 
  On Sat, Sep 27, 2014 at 4:43 PM, Mikhail Khludnev 
  mkhlud...@griddynamics.com wrote:
 
   indeed!
   the exclusive range {green TO red} matches to the lemon yellow
   hence, the negation suppresses it from appearing
   fq=-color:{green TO red}
   then you need to suppress eg black and white also
   fq=-color:({* TO green} {green TO red} {red TO *})
  
   I have no control over the
possible values of 'color',
  
   You don't need to control possible values, you just suppressing any
  values
   beside of the given green and red.
   Mind that either green or red passes that negation of exclusive ranges
   disjunction.
  
  
   On Sun, Sep 28, 2014 at 12:15 AM, White, Bill bwh...@ptfs.com wrote:
  
OK, let me try phrasing it better.
   
How do I exclude from search, any result which contains any value for
multivalued field 'color' which is not within a given constraint
 set
(e.g., red, green, yellow, burnt sienna), given that I do not
   what
any of the other possible values of 'color' are?
   
In pseudocode:
   
for all x in result.color
if x not in (red,green,yellow, burnt sienna)
filter out result
   
I don't see how range queries would work since I have no control over
  the
possible values of 'color', e.g., there could be a valid color lemon
yellow between green and red, and I don't want a result which
 has
(color: red, color: lemon yellow)
   
On Sat, Sep 27, 2014 at 4:02 PM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:
   
 On Sat, Sep 27, 2014 at 11:36 PM, White, Bill bwh...@ptfs.com
  wrote:

  but do NOT match ANY other color.


 Bill, I miss the whole picture, it's worth to rephrase the problem
 in
   one
 sentence.
 But regarding the quote above, you can try to use exclusive ranges


   
  
 
 https://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Range_Searches
 fq=-color:({* TO green} {green TO red} {red TO *})
 just don't forget to build ranges alphabetically

 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com

   
  
  
  
   --
   Sincerely yours
   Mikhail Khludnev
   Principal Engineer,
   Grid Dynamics
  
   http://www.griddynamics.com
   mkhlud...@griddynamics.com
  
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com



Re: fq syntax for requiring all multiValued field values to be within a list?

2014-09-27 Thread White, Bill
Yes, that was it, thank you!

On Sat, Sep 27, 2014 at 5:28 PM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 http://wiki.apache.org/solr/SchemaXml#Default_query_parser_operator ?
 once again, debugQuery=true perfectly explains what's going on with q.

 On Sun, Sep 28, 2014 at 1:24 AM, White, Bill bwh...@ptfs.com wrote:

  It worked for me once I changed to
 
  -color:({* TO red} OR {red TO *})
 
  I'm not sure why the OR is needed, maybe it's my version? (4.6.1)
 
  On Sat, Sep 27, 2014 at 5:22 PM, Mikhail Khludnev 
  mkhlud...@griddynamics.com wrote:
 
   hm. try to convert it to query q=-color:({* TO red} {red TO *}) and
 check
   the explanation from debugQuery=true
  
   I tried to play with my index
  
 q: *:*,
 facet.field: swatchColors_string_mv,
 fq: -swatchColors_string_mv:({* TO RED} {RED TO *}),
  
   I got the following facets:
  
 facet_fields: {
 swatchColors_string_mv: [
   RED,
   122,
   BLACK,
   0,
   BLUE,
   0,
   BROWN,
   0,
   GREEN,
   0,
  
   so, it works for me at least...
  
  
  
   On Sun, Sep 28, 2014 at 12:54 AM, White, Bill bwh...@ptfs.com wrote:
  
Hmm.  If I understand correctly this builds a set out of open
 intervals
(exclusive ranges), that's a great idea!
   
It doesn't seem to work for me, though;  fq=-color:({* TO red} {red
 TO
   *})
is giving me results with color=burnt sienna
   
The field is defined as field name=color type=string
  indexed=true
stored=true multiValued=true /
   
On Sat, Sep 27, 2014 at 4:43 PM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:
   
 indeed!
 the exclusive range {green TO red} matches to the lemon yellow
 hence, the negation suppresses it from appearing
 fq=-color:{green TO red}
 then you need to suppress eg black and white also
 fq=-color:({* TO green} {green TO red} {red TO *})

 I have no control over the
  possible values of 'color',

 You don't need to control possible values, you just suppressing any
values
 beside of the given green and red.
 Mind that either green or red passes that negation of exclusive
  ranges
 disjunction.


 On Sun, Sep 28, 2014 at 12:15 AM, White, Bill bwh...@ptfs.com
  wrote:

  OK, let me try phrasing it better.
 
  How do I exclude from search, any result which contains any value
  for
  multivalued field 'color' which is not within a given constraint
   set
  (e.g., red, green, yellow, burnt sienna), given that I do
  not
 what
  any of the other possible values of 'color' are?
 
  In pseudocode:
 
  for all x in result.color
  if x not in (red,green,yellow, burnt sienna)
  filter out result
 
  I don't see how range queries would work since I have no control
  over
the
  possible values of 'color', e.g., there could be a valid color
  lemon
  yellow between green and red, and I don't want a result
 which
   has
  (color: red, color: lemon yellow)
 
  On Sat, Sep 27, 2014 at 4:02 PM, Mikhail Khludnev 
  mkhlud...@griddynamics.com wrote:
 
   On Sat, Sep 27, 2014 at 11:36 PM, White, Bill bwh...@ptfs.com
 
wrote:
  
but do NOT match ANY other color.
  
  
   Bill, I miss the whole picture, it's worth to rephrase the
  problem
   in
 one
   sentence.
   But regarding the quote above, you can try to use exclusive
  ranges
  
  
 

   
  
 
 https://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Range_Searches
   fq=-color:({* TO green} {green TO red} {red TO *})
   just don't forget to build ranges alphabetically
  
   --
   Sincerely yours
   Mikhail Khludnev
   Principal Engineer,
   Grid Dynamics
  
   http://www.griddynamics.com
   mkhlud...@griddynamics.com
  
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com

   
  
  
  
   --
   Sincerely yours
   Mikhail Khludnev
   Principal Engineer,
   Grid Dynamics
  
   http://www.griddynamics.com
   mkhlud...@griddynamics.com
  
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com



Re: Solr Dynamic Field Performance

2014-09-14 Thread Bill Bell
How about perf if you dynamically create 5000 fields ?

Bill Bell
Sent from mobile


 On Sep 14, 2014, at 10:06 AM, Erick Erickson erickerick...@gmail.com wrote:
 
 Dynamic fields, once they are actually _in_ a document, aren't any
 different than statically defined fields. Literally, there's no place
 in the search code that I know of that _ever_ has to check
 whether a field was dynamically or statically defined.
 
 AFAIK, the only additional cost would be figuring out which pattern
 matched at index time, which is such a tiny portion of the cost of
 indexing that I doubt you could measure it.
 
 Best,
 Erick
 
 On Sun, Sep 14, 2014 at 7:58 AM, Saumitra Srivastav
 saumitra.srivast...@gmail.com wrote:
 I have a collection with 200 fields and 300M docs running in cloud mode.
 Each doc have around 20 fields. I now have a use case where I need to
 replace these explicit fields with 6 dynamic fields. Each of these 200
 fields will match one of the 6 dynamic field.
 
 I am evaluating performance implications of switching to dynamicFields. I
 have tested with a smaller dataset(5M docs) but didn't noticed any indexing
 or query performance degradation.
 
 Query on dynamic fields will either be faceting, range query or full text
 search.
 
 Are there any known performance issues with using dynamicFields instead of
 explicit ones?
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Dynamic-Field-Performance-tp4158737.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to solve?

2014-09-06 Thread Bill Bell
Yeah we already use it. I will try to create a custom functionif I get it 
to work I will post.

The challenge for me is how to dynamically match and add them based in the 
faceting.

Here is a better example.

The doctor core has payload as name:val. The name are doctor specialties. I 
need to pull back by the name since the user faceted on a specialty. So far 
payloads work. But the user now wants to facet on another specialty. For 
example they are looking for a cardiologist and an internal medicine doctor and 
if the doctor practices at the same hospital I need to take the values and add 
them. Else take the max value for the 2 specialties. 

Make sense now ?

Seems like I need to create a payload and my own custom function.

Bill Bell
Sent from mobile


 On Sep 6, 2014, at 12:57 PM, Erick Erickson erickerick...@gmail.com wrote:
 
 Here's a blog with an end-to-end example. Jack's right, it takes some
 configuration and having first-class support in Solr would be a good
 thing...
 
 http://searchhub.org/2014/06/13/end-to-end-payload-example-in-solr/
 
 Best,
 Erick
 
 On Sat, Sep 6, 2014 at 10:24 AM, Jack Krupansky j...@basetechnology.com 
 wrote:
 Payload really don't have first class support in Solr. It's a solid feature
 of Lucene, but never expressed well in Solr. Any thoughts or proposals are
 welcome!
 
 (Hmmm... I wonder what the good folks at Heliosearch have up their sleeves
 in this area?!)
 
 -- Jack Krupansky
 
 -Original Message- From: William Bell
 Sent: Friday, September 5, 2014 10:03 PM
 To: solr-user@lucene.apache.org
 Subject: How to solve?
 
 
 We have a core with each document as a person.
 
 We want to boost based on the sweater color, but if the person has sweaters
 in their closet which are the same manufactuer we want to boost even more
 by adding them together.
 
 Peter Smit - Sweater: Blue = 1 : Nike, Sweater: Red = 2: Nike, Sweater:
 Blue=1 : Polo
 Tony S - Sweater: Red =2: Nike
 Bill O - Sweater:Red = 2: Polo, Blue=1: Polo
 
 Scores:
 
 Peter Smit - 1+2 = 3.
 Tony S - 2
 Bill O - 2 + 1
 
 I thought about using payloads.
 
 sweaters_payload
 Blue: Nike: 1
 Red: Nike: 2
 Blue: Polo: 1
 
 How do I query this?
 
 http://localhost:8983/solr/persons?q=*:*sort=??
 
 Ideas?
 
 
 
 
 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076


Re: embedded documents

2014-08-24 Thread Bill Bell
See my Jira. It supports it via json.fsuffix=_jsonwt=json

http://mail-archives.apache.org/mod_mbox/lucene-dev/201304.mbox/%3CJIRA.12641293.1365394604231.125944.1365397875874@arcas%3E

Bill Bell
Sent from mobile


 On Aug 24, 2014, at 6:43 AM, Jack Krupansky j...@basetechnology.com wrote:
 
 Indexing and query of raw JSON would be a valuable addition to Solr, so maybe 
 you could simply explain more precisely your data model and transformation 
 rules. For example, when multi-level nesting occurs, what does your loader do?
 
 Maybe if the fielld names were derived by concatenating the full path of JSON 
 key names, like titles_json.FR, field_naming nesting could be handled in a 
 fully automated manner.
 
 I had been thinking of filing a Jira proposing exactly that, so that even the 
 most deeply nested JSON maps could be supported, although combinations of 
 arrays and maps would be problematic.
 
 -- Jack Krupansky
 
 -Original Message- From: Michael Pitsounis
 Sent: Wednesday, August 20, 2014 7:14 PM
 To: solr-user@lucene.apache.org
 Subject: embedded documents
 
 Hello everybody,
 
 I had a requirement to store complicated json documents in solr.
 
 i have modified the JsonLoader to accept complicated json documents with
 arrays/objects as values.
 
 It stores the object/array and then flatten it and  indexes the fields.
 
 e.g  basic example document
 
 {
   titles_json:{FR:This is the FR title , EN:This is the EN
 title} ,
   id: 103,
   guid: 3b2f2998-85ac-4a4e-8867-beb551c0b3c6
  }
 
 It will store titles_json:{FR:This is the FR title , EN:This is the
 EN title}
 and then index fields
 
 titles.FR:This is the FR title
 titles.EN:This is the EN title
 
 
 Do you see any problems with this approach?
 
 
 
 Regards,
 Michael Pitsounis 


Re: SolrCloud Scale Struggle

2014-08-02 Thread Bill Bell
Seems way overkill. Are you using /get at all ? If you need the docs avail 
right away - why ? How about after 30 seconds ? How many docs do you get added 
per second during peak ? Even Google has a delay when you do Adwords. 

One idea is yo have an empty core that you insert into and then shard into the 
queries. So one fire would be called newdocs and then you would add this core 
into your query. There are a couple issues with this with scoring but it works 
nicely. I would not even use Solrcloud for that core.

Try to reduce number of Java running. Reduce memory and use one java per 
machine. 

Then if you need faster avail if docs you really need to ask why. Why not 
later? If it got search or just showing the user the info ? If for showing 
maybe query a not indexes table for the few not yet indexed ?? Or just store in 
a db to show the user the info and index later?

Bill Bell
Sent from mobile


 On Aug 1, 2014, at 4:19 AM, anand.mahajan an...@zerebral.co.in wrote:
 
 Hello all,
 
 Struggling to get this going with SolrCloud - 
 
 Requirement in brief :
 - Ingest about 4M Used Cars listings a day and track all unique cars for
 changes
 - 4M automated searches a day (during the ingestion phase to check if a doc
 exists in the index (based on values of 4-5 key fields) or it is a new one
 or an updated version)
 - Of the 4 M - About 3M Updates to existing docs (for every non-key value
 change)
 - About 1M inserts a day (I'm assuming these many new listings come in
 every day)
 - Daily Bulk CSV exports of inserts / updates in last 24 hours of various
 snapshots of the data to various clients
 
 My current deployment : 
 i) I'm using Solr 4.8 and have set up a SolrCloud with 6 dedicated machines
 - 24 Core + 96 GB RAM each.
 ii)There are over 190M docs in the SolrCloud at the moment (for all
 replicas its consuming overall disk 2340GB which implies - each doc is at
 about 5-8kb in size.)
 iii) The docs are split into 36 Shards - and 3 replica per shard (in all
 108 Solr Jetty processes split over 6 Servers leaving about 18 Jetty JVMs
 running on each host)
 iv) There are 60 fields per doc and all fields are stored at the moment  :( 
 (The backend is only Solr at the moment)
 v) The current shard/routing key is a combination of Car Year, Make and
 some other car level attributes that help classify the cars
 vi) We are mostly using the default Solr config as of now - no heavy caching
 as the search is pretty random in nature 
 vii) Autocommit is on - with maxDocs = 1
 
 Current throughput  Issues :
 With the above mentioned deployment the daily throughout is only at about
 1.5M on average (Inserts + Updates) - falling way short of what is required.
 Search is slow - Some queries take about 15 seconds to return - and since
 insert is dependent on at least one Search that degrades the write
 throughput too. (This is not a Solr issue - but the app demands it so)
 
 Questions :
 
 1. Autocommit with maxDocs = 1 - is that a goof up and could that be slowing
 down indexing? Its a requirement that all docs are available as soon as
 indexed.
 
 2. Should I have been better served had I deployed a Single Jetty Solr
 instance per server with multiple cores running inside? The servers do start
 to swap out after a couple of days of Solr uptime - right now we reboot the
 entire cluster every 4 days.
 
 3. The routing key is not able to effectively balance the docs on available
 shards - There are a few shards with just about 2M docs - and others over
 11M docs. Shall I split the larger shards? But I do not have more nodes /
 hardware to allocate to this deployment. In such case would splitting up the
 large shards give better read-write throughput? 
 
 4. To remain with the current hardware - would it help if I remove 1 replica
 each from a shard? But that would mean even when just 1 node goes down for a
 shard there would be only 1 live node left that would not serve the write
 requests.
 
 5. Also, is there a way to control where the Split Shard replicas would go?
 Is there a pattern / rule that Solr follows when it creates replicas for
 split shards?
 
 6. I read somewhere that creating a Core would cost the OS one thread and a
 file handle. Since a core repsents an index in its entirty would it not be
 allocated the configured number of write threads? (The dafault that is 8)
 
 7. The Zookeeper cluster is deployed on the same boxes as the Solr instance
 - Would separating the ZK cluster out help?
 
 Sorry for the long thread _ I thought of asking these all at once rather
 than posting separate ones.
 
 Thanks,
 Anand
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Scale Struggle

2014-08-02 Thread Bill Bell
Auto correct not good

Corrected below 

Bill Bell
Sent from mobile


 On Aug 2, 2014, at 11:11 AM, Bill Bell billnb...@gmail.com wrote:
 
 Seems way overkill. Are you using /get at all ? If you need the docs avail 
 right away - why ? How about after 30 seconds ? How many docs do you get 
 added per second during peak ? Even Google has a delay when you do Adwords. 
 
 One idea is to have an empty core that you insert into and then shard into 
 the queries. So one core would be called newdocs and then you would add this 
 core into your query. There are a couple issues with this with scoring but it 
 works nicely. I would not even use Solrcloud for that core.
 
 Try to reduce number of Java instances running. Reduce memory and use one 
 java per machine. 
 
 Then if you need faster avail of docs you really need to ask why. Why not 
 later? Do you need search or just showing the user the info ? If for showing 
 maybe query a indexed table for the few not yet indexed ?? Or just store in a 
 db to show the user the info and index later?
 
 Bill Bell
 Sent from mobile
 
 
 On Aug 1, 2014, at 4:19 AM, anand.mahajan an...@zerebral.co.in wrote:
 
 Hello all,
 
 Struggling to get this going with SolrCloud - 
 
 Requirement in brief :
 - Ingest about 4M Used Cars listings a day and track all unique cars for
 changes
 - 4M automated searches a day (during the ingestion phase to check if a doc
 exists in the index (based on values of 4-5 key fields) or it is a new one
 or an updated version)
 - Of the 4 M - About 3M Updates to existing docs (for every non-key value
 change)
 - About 1M inserts a day (I'm assuming these many new listings come in
 every day)
 - Daily Bulk CSV exports of inserts / updates in last 24 hours of various
 snapshots of the data to various clients
 
 My current deployment : 
 i) I'm using Solr 4.8 and have set up a SolrCloud with 6 dedicated machines
 - 24 Core + 96 GB RAM each.
 ii)There are over 190M docs in the SolrCloud at the moment (for all
 replicas its consuming overall disk 2340GB which implies - each doc is at
 about 5-8kb in size.)
 iii) The docs are split into 36 Shards - and 3 replica per shard (in all
 108 Solr Jetty processes split over 6 Servers leaving about 18 Jetty JVMs
 running on each host)
 iv) There are 60 fields per doc and all fields are stored at the moment  :( 
 (The backend is only Solr at the moment)
 v) The current shard/routing key is a combination of Car Year, Make and
 some other car level attributes that help classify the cars
 vi) We are mostly using the default Solr config as of now - no heavy caching
 as the search is pretty random in nature 
 vii) Autocommit is on - with maxDocs = 1
 
 Current throughput  Issues :
 With the above mentioned deployment the daily throughout is only at about
 1.5M on average (Inserts + Updates) - falling way short of what is required.
 Search is slow - Some queries take about 15 seconds to return - and since
 insert is dependent on at least one Search that degrades the write
 throughput too. (This is not a Solr issue - but the app demands it so)
 
 Questions :
 
 1. Autocommit with maxDocs = 1 - is that a goof up and could that be slowing
 down indexing? Its a requirement that all docs are available as soon as
 indexed.
 
 2. Should I have been better served had I deployed a Single Jetty Solr
 instance per server with multiple cores running inside? The servers do start
 to swap out after a couple of days of Solr uptime - right now we reboot the
 entire cluster every 4 days.
 
 3. The routing key is not able to effectively balance the docs on available
 shards - There are a few shards with just about 2M docs - and others over
 11M docs. Shall I split the larger shards? But I do not have more nodes /
 hardware to allocate to this deployment. In such case would splitting up the
 large shards give better read-write throughput? 
 
 4. To remain with the current hardware - would it help if I remove 1 replica
 each from a shard? But that would mean even when just 1 node goes down for a
 shard there would be only 1 live node left that would not serve the write
 requests.
 
 5. Also, is there a way to control where the Split Shard replicas would go?
 Is there a pattern / rule that Solr follows when it creates replicas for
 split shards?
 
 6. I read somewhere that creating a Core would cost the OS one thread and a
 file handle. Since a core repsents an index in its entirty would it not be
 allocated the configured number of write threads? (The dafault that is 8)
 
 7. The Zookeeper cluster is deployed on the same boxes as the Solr instance
 - Would separating the ZK cluster out help?
 
 Sorry for the long thread _ I thought of asking these all at once rather
 than posting separate ones.
 
 Thanks,
 Anand
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Latest jetty

2014-07-26 Thread Bill Bell
Since we are now on latest Java JDK can we move to Jetty 9?

Thoughts ?

Bill Bell
Sent from mobile



Solr atomic updates question

2014-07-08 Thread Bill Au
Solr atomic update allows for changing only one or more fields of a
document without having to re-index the entire document.  But what about
the case where I am sending in the entire document?  In that case the whole
document will be re-indexed anyway, right?  So I assume that there will be
no saving.  I am actually thinking that there will be a performance penalty
since atomic update requires Solr to first retrieve all the fields first
before updating.

Bill


Re: Solr atomic updates question

2014-07-08 Thread Bill Au
Thanks for that under-the-cover explanation.

I am not sure what you mean by mix atomic updates with regular field
values.  Can you give an example?

Thanks.

Bill


On Tue, Jul 8, 2014 at 6:56 PM, Steve McKay st...@b.abbies.us wrote:

 Atomic updates fetch the doc with RealTimeGet, apply the updates to the
 fetched doc, then reindex. Whether you use atomic updates or send the
 entire doc to Solr, it has to deleteById then add. The perf difference
 between the atomic updates and normal updates is likely minimal.

 Atomic updates are for when you have changes and want to apply them to a
 document without affecting the other fields. A regular add will replace an
 existing document completely. AFAIK Solr will let you mix atomic updates
 with regular field values, but I don't think it's a good idea.

 Steve

 On Jul 8, 2014, at 5:30 PM, Bill Au bill.w...@gmail.com wrote:

  Solr atomic update allows for changing only one or more fields of a
  document without having to re-index the entire document.  But what about
  the case where I am sending in the entire document?  In that case the
 whole
  document will be re-indexed anyway, right?  So I assume that there will
 be
  no saving.  I am actually thinking that there will be a performance
 penalty
  since atomic update requires Solr to first retrieve all the fields first
  before updating.
 
  Bill




Re: Solr atomic updates question

2014-07-08 Thread Bill Au
I see what you mean now.  Thanks for the example.  It makes things very
clear.

I have been thinking about the explanation in the original response more.
 According to that, both regular update with entire doc and atomic update
involves a delete by id followed by a add.  But both the Solr reference doc
(
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents)
says that:

The first is *atomic updates*. This approach allows changing only one or
more fields of a document without having to re-index the entire document.

But since Solr is doing a delete by id followed by a add, so without
having to re-index the entire document apply to the client side only?  On
the server side the add means that the entire document is re-indexed, right?

Bill


On Tue, Jul 8, 2014 at 7:32 PM, Steve McKay st...@b.abbies.us wrote:

 Take a look at this update XML:

 add
   doc
 field name=employeeId05991/field
 field name=employeeNameSteve McKay/field
 field name=office update=setWalla Walla/field
 field name=skills update=addPython/field
   /doc
 /add

 Let's say employeeId is the key. If there's a fourth field, salary, on the
 existing doc, should it be deleted or retained? With this update it will
 obviously be deleted:

 add
   doc
 field name=employeeId05991/field
 field name=employeeNameSteve McKay/field
   /doc
 /add

 With this XML it will be retained:

 add
   doc
 field name=employeeId05991/field
 field name=office update=setWalla Walla/field
 field name=skills update=addPython/field
   /doc
 /add

 I'm not willing to guess what will happen in the case where non-atomic and
 atomic updates are present on the same add because I haven't looked at that
 code since 4.0, but I think I could make a case for retaining salary or for
 discarding it. That by itself reeks--and it's also not well documented.
 Relying on iffy, poorly-documented behavior is asking for pain at upgrade
 time.

 Steve

 On Jul 8, 2014, at 7:02 PM, Bill Au bill.w...@gmail.com wrote:

  Thanks for that under-the-cover explanation.
 
  I am not sure what you mean by mix atomic updates with regular field
  values.  Can you give an example?
 
  Thanks.
 
  Bill
 
 
  On Tue, Jul 8, 2014 at 6:56 PM, Steve McKay st...@b.abbies.us wrote:
 
  Atomic updates fetch the doc with RealTimeGet, apply the updates to the
  fetched doc, then reindex. Whether you use atomic updates or send the
  entire doc to Solr, it has to deleteById then add. The perf difference
  between the atomic updates and normal updates is likely minimal.
 
  Atomic updates are for when you have changes and want to apply them to a
  document without affecting the other fields. A regular add will replace
 an
  existing document completely. AFAIK Solr will let you mix atomic updates
  with regular field values, but I don't think it's a good idea.
 
  Steve
 
  On Jul 8, 2014, at 5:30 PM, Bill Au bill.w...@gmail.com wrote:
 
  Solr atomic update allows for changing only one or more fields of a
  document without having to re-index the entire document.  But what
 about
  the case where I am sending in the entire document?  In that case the
  whole
  document will be re-indexed anyway, right?  So I assume that there will
  be
  no saving.  I am actually thinking that there will be a performance
  penalty
  since atomic update requires Solr to first retrieve all the fields
 first
  before updating.
 
  Bill
 
 




Re: stucked with log4j configuration

2014-04-12 Thread Bill Bell
Well I hope log4j2 is something Solr supports when GA

Bill Bell
Sent from mobile


 On Apr 12, 2014, at 7:26 AM, Aman Tandon amantandon...@gmail.com wrote:
 
 I have upgraded my solr4.2 to solr 4.7.1 but in my logs there is an error
 for log4j
 
 log4j: Could not find resource
 
 Please find the attachment of the screenshot of the error console
 https://drive.google.com/file/d/0B5GzwVkR3aDzdjE1b2tXazdxcGs/edit?usp=sharing
 -- 
 With Regards
 Aman Tandon


Re: boost results within 250km

2014-04-09 Thread Bill Bell
Just take geodist and use the map function and send to bf or boost 

Bill Bell
Sent from mobile


 On Apr 9, 2014, at 8:26 AM, Erick Erickson erickerick...@gmail.com wrote:
 
 Why do you want to do this? This sounds like an XY problem, you're
 asking how to do something specific without explaining why you care,
 perhaps there are other ways to do this.
 
 Best,
 Erick
 
 On Tue, Apr 8, 2014 at 11:30 PM, Aman Tandon amantandon...@gmail.com wrote:
 How can i gave the more boost to the results within 250km than others
 without using result filtering.


Re: Luke 4.6.1 released

2014-02-16 Thread Bill Bell
Yes it works with Solr 

Bill Bell
Sent from mobile


 On Feb 16, 2014, at 3:38 PM, Alexandre Rafalovitch arafa...@gmail.com wrote:
 
 Does it work with Solr? I couldn't tell what the description was from
 this repo and it's Solr relevance.
 
 I am sure all the long timers know, but for more recent Solr people,
 the additional information would be useful.
 
 Regards,
   Alex.
 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)
 
 
 On Mon, Feb 17, 2014 at 3:02 AM, Dmitry Kan solrexp...@gmail.com wrote:
 Hello!
 
 Luke 4.6.1 has been just released. Grab it here:
 
 https://github.com/DmitryKey/luke/releases/tag/4.6.1
 
 fixes:
 loading the jar from command line is now working fine.
 
 --
 Dmitry Kan
 Blog: http://dmitrykan.blogspot.com
 Twitter: twitter.com/dmitrykan


Status if 4.6.1?

2014-01-18 Thread Bill Bell
We just need the bug fix for Solr.xml 

https://issues.apache.org/jira/browse/SOLR-5543

Bill Bell
Sent from mobile



Re: question about DIH solr-data-config.xml and XML include

2014-01-14 Thread Bill Au
The problem is with the admin UI not following the XML include to find
entity so it found none.  DIH itself does support XML include as I can
issue the DIH commands via HTTP on the included entities successfully.

Bill


On Mon, Jan 13, 2014 at 8:03 PM, Shawn Heisey s...@elyograg.org wrote:

 On 1/13/2014 3:31 PM, Bill Au wrote:

 But when I use XML include, the Entity pull-down in the Dataimport section
 of the Solr admin UI is empty.  I know that happens when there is a syntax
 error in solr-data-config.xml.  Does DIH supports XML include?  Also I am
 not seeing any error message in the log even if I set log level to ALL.
  Is
 there any way to get DIH to log what it thinks is wrong
 solr-data-cofig.xml?


 Paying it forward.  Someone on this mailing list helped me with this.  I
 have tested this DIH configand found that it works:

 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig xmlns:xi=http://www.w3.org/2001/XInclude;
   dataSource type=JdbcDataSource
 driver=com.mysql.jdbc.Driver
 encoding=UTF-8
 url=jdbc:mysql://${dih.request.dbHost}:3306/${dih.request.dbSchema}?
 zeroDateTimeBehavior=convertToNull
 batchSize=-1
 user=REDACTED
 password=REDACTED/
   document
   xi:include href=test-dih-include.xml /
   /document
 /dataConfig

 The xlmns:xi attribute in the outer tag makes it possible to use the
 xi:include syntax later.

 I make extensive use of this in my solrconfig.xml file. There's almost no
 actual config in that file, everything is included from other files.

 When you look at the config in the admin UI, you will not see the included
 text, you'll only see the xi:include tag.

 Thanks,
 Shawn




question about DIH solr-data-config.xml and XML include

2014-01-13 Thread Bill Au
I am trying to simplify my Solr DIH configuration by using XML schema
include element.  Here is an example:

?xml version=1.0 standalone=no ?
!DOCTYPE doc [
!ENTITY dataSource SYSTEM include_datasource.xml
!ENTITY article SYSTEM include_entity1.xml
!ENTITY article SYSTEM include_entity2.xml
]
dataConfig
dataSource;
document
entity1;
entity2;
/document
/dataConfig


I know my included XML files are good because if I put them all into a
single XML file, DIH works as expected.

But when I use XML include, the Entity pull-down in the Dataimport section
of the Solr admin UI is empty.  I know that happens when there is a syntax
error in solr-data-config.xml.  Does DIH supports XML include?  Also I am
not seeing any error message in the log even if I set log level to ALL.  Is
there any way to get DIH to log what it thinks is wrong solr-data-cofig.xml?

BTW, the admin UI show the DIH config as shown above.  So I suspecting that
DIH isn't actually doing the XML include.

Bill


Re: Call to Solr via TCP

2013-12-10 Thread Bill Bell
Yeah open socket to port and send correct Get syntax and Solr will respond with 
results...



Bill Bell
Sent from mobile


 On Dec 10, 2013, at 2:50 PM, Doug Turnbull 
 dturnb...@opensourceconnections.com wrote:
 
 Zwer, is there a reason you need to do this? Its probably very hard to
 get solr to speak TCP. But if you're having a performance or
 infrastructure problem, the group might be able to help you with a far
 simpler solution.
 
 Sent from my Windows Phone From: Zwer
 Sent: 12/10/2013 12:15 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Call to Solr via TCP
 Maybe I asked incorrectly.
 
 
 Solr is Web Application, hosted by some servlet container and is reachable
 via HTTP.
 
 HTTP is an extension of TCP and I would like to know whether exists some
 lower way to communicate with application (i.e. Solr) hosted by Jetty?
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Call-to-Solr-via-TCP-tp4105932p4105935.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to work with remote solr savely?

2013-11-22 Thread Bill Bell
Do you have a sample jetty XML to setup basic auth for updates in Solr?

Sent from my iPad

 On Nov 22, 2013, at 7:34 AM, michael.boom my_sky...@yahoo.com wrote:
 
 Use HTTP basic authentication, setup in your servlet container
 (jetty/tomcat).
 
 That should work fine if you are *not* using SolrCloud.
 
 
 
 -
 Thanks,
 Michael
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-to-work-with-remote-solr-savely-tp4102612p4102613.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: useColdSearcher in SolrCloud config

2013-11-22 Thread Bill Bell
Wouldn't that be true means use cold searcher? It seems backwards to me...

Sent from my iPad

 On Nov 22, 2013, at 2:44 AM, ade-b adrian.bro...@gmail.com wrote:
 
 Hi
 
 The definition of useColdSearcher config element in solrconfig.xml is
 
 If a search request comes in and there is no current registered searcher,
 then immediately register the still warming searcher and use it.  If false
 then all requests will block until the first searcher is done warming.
 
 By the term 'block', I assume SOLR returns a non 200 response to requests.
 Does anybody know the exact response code returned when the server is
 blocking requests?
 
 If a new SOLR server is introduced into an existing array of SOLR servers
 (in SOLR Cloud setup), it will sync it's index from the leader. To save you
 having to specify warm-up queries in the solrconfig.xml file for first
 searchers, would/could the new server not auto warm it's caches from the
 caches of an existing server?
 
 Thanks
 Ade 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/useColdSearcher-in-SolrCloud-config-tp4102569.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: NullPointerException

2013-11-22 Thread Bill Bell
It seems to be a modified row and referenced in EvaluatorBag.

I am not familiar with either.

Sent from my iPad

 On Nov 22, 2013, at 3:05 AM, Adrien RUFFIE a.ruf...@e-deal.com wrote:
 
 Hello all,
 
 I have perform a full indexation with solr, but when I try to perform an 
 incrementation indexation I get the following exception (cf attachment).
 
 Any one have a idea of the problem ?
 
 Greate thank
 log.txt


Re: Reverse mm(min-should-match)

2013-11-22 Thread Bill Bell
This is an awesome idea!

Sent from my iPad

 On Nov 22, 2013, at 12:54 PM, Doug Turnbull 
 dturnb...@opensourceconnections.com wrote:
 
 Instead of specifying a percentage or number of query terms must match
 tokens in a field, I'd like to do the opposite -- specify how much of a
 field must match a query.
 
 The problem I'm trying to solve is to boost document titles that closely
 match the query string. If a title looks something like
 
 *Title: *[solr] [the] [worlds] [greatest] [search] [engine]
 
 I want to be able to specify how much of the field must match the query
 string. This differs from normal mm. Normal mm specifies a how much of the
 query must match a field.
 
 As an example, with this title, if I use normal mm=100% and perform the
 following query:
 
 mm=100%
 q=solr
 
 This will match the title above, as 100% of [solr] matches the field
 
 What I really want to get at is a reverse mm:
 
 Rmm=100%
 q=solr
 
 The title above will not match in this case. Only 1/6 of the tokens in the
 field match the query.
 
 However an exact search would match:
 
 Rmm=100%
 q=solr the worlds greatest search engine
 
 Here 100% of the query matches the title, so I'm good.
 
 Is there any way to achieve this in Solr?
 
 -- 
 Doug Turnbull
 Search  Big Data Architect
 OpenSource Connections http://o19s.com


Re: Jetty 9?

2013-11-07 Thread Bill Bell
So no Jetty 9 until Solr 5? Java 7 is at rel 40 Is that our commitment to 
not require Java 7 until Solr 5? 

Most people are probably already on Java 7...

Bill Bell
Sent from mobile


 On Nov 7, 2013, at 1:29 AM, Furkan KAMACI furkankam...@gmail.com wrote:
 
 Here is an issue points to that:
 https://issues.apache.org/jira/browse/SOLR-4839
 
 
 2013/11/7 William Bell billnb...@gmail.com
 
 When are we moving Solr to Jetty 9?
 
 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076
 


Re: Performance of rows and start parameters

2013-11-04 Thread Bill Bell
Do you want to look thru then all ? Have you considered Lucene API? Not sure if 
that is better but it might be.

Bill Bell
Sent from mobile


 On Nov 4, 2013, at 6:43 AM, michael.boom my_sky...@yahoo.com wrote:
 
 I saw that some time ago there was a JIRA ticket dicussing this, but still i
 found no relevant information on how to deal with it.
 
 When working with big nr of docs (e.g. 70M) in my case, I'm using
 start=0rows=30 in my requests.
 For the first req the query time is ok, the next one is visibily slower, the
 third even more slow and so on until i get some huge query times of up
 140secs, after a few hundreds requests. My test were done with SolrMeter at
 a rate of 1000qpm. Same thing happens at 100qpm, tough.
 
 Is there a best practice on how to do in this situation, or maybe an
 explanation why is the query time increasing, from request to request ?
 
 Thanks!
 
 
 
 -
 Thanks,
 Michael
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Performance-of-rows-and-start-parameters-tp4099194.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Core admin: create new core

2013-11-04 Thread Bill Bell
You could pre create a bunch of directories and base configs. Create as needed. 
Then use schema less API to set it up ... Or make changes in a script and 
reload the core..

Bill Bell
Sent from mobile


 On Nov 4, 2013, at 6:06 AM, Erick Erickson erickerick...@gmail.com wrote:
 
 Right, this has been an issue for a while, there's no current
 way to do this.
 
 Someday, I'll be able to work on SOLR-4779 which should
 go some toward making this work more easily. It's still not
 exactly what you're looking for, but it might work.
 
 Of course with SolrCloud you can specify a configuration
 set that is used for multiple collections.
 
 People are using Puppet or similar to automate this over
 large numbers of nodes, but that's not entirely satisfactory
 either in our case I suspect.
 
 FWIW,
 Erick
 
 
 On Mon, Nov 4, 2013 at 4:00 AM, Bram Van Dam bram.van...@intix.eu wrote:
 
 The core admin CREATE function requires that the new instance dir and
 schema/config exist already. Is there a particular reason for this? It
 would be incredible convenient if I could create a core with a new schema
 and new config simply by calling CREATE (maybe providing the contents of
 config.xml and schema.xml as base64 encoded strings in HTTP POST or
 something?).
 
 I'm guessing this isn't currently possible?
 
 Ta,
 
 - bram
 


Re: Proposal for new feature, cold replicas, brainstorming

2013-10-27 Thread Bill Bell
Yeah replicate to a DR site would be good too. 

Bill Bell
Sent from mobile


 On Oct 24, 2013, at 6:27 AM, yriveiro yago.rive...@gmail.com wrote:
 
 I'm wondering some time ago if it's possible have replicas of a shard
 synchronized but in an state that they can't accept queries only updates. 
 
 This replica in replication mode only awake to accept queries if it's the
 last alive replica and goes to replication mode when other replica becomes
 alive and synchronized.
 
 The motivation of this is simple, I want have replication but I don't want
 have n replicas actives with full resources allocated (cache and so on).
 This is usefull in enviroments where replication is needed but a high query
 throughput is not fundamental and the resources are limited.
 
 I know that right now is not possible, but I think that it's a feature that
 can be implemented in a easy way creating a new status for shards.
 
 The bottom line question is, I'm the only one with this kind of
 requeriments? Does it make sense one functionality like this?
 
 
 
 -
 Best regards
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Proposal-for-new-feature-cold-replicas-brainstorming-tp4097501.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr - what's the next big thing?

2013-10-26 Thread Bill Bell
Full JSON support deep complex object indexing and search Game changer 

Bill Bell
Sent from mobile


 On Oct 26, 2013, at 1:04 PM, Otis Gospodnetic otis.gospodne...@gmail.com 
 wrote:
 
 Hi,
 
 On Sat, Oct 26, 2013 at 5:58 AM, Saar Carmi saarca...@gmail.com wrote:
 LOL,  Jack.  I can imagine Otis saying that.
 
 Funny indeed, but not really.
 
 Otis,  with these marriage,  are we going to see map reduce based queries?
 
 Can you please describe what you mean by that?  Maybe with an example.
 
 Thanks,
 Otis
 --
 Performance Monitoring * Log Analytics * Search Analytics
 Solr  Elasticsearch Support * http://sematext.com/
 
 
 
 On Oct 25, 2013 10:03 PM, Jack Krupansky j...@basetechnology.com wrote:
 
 But a lot of that big yellow elephant stuff is in 4.x anyway.
 
 (Otis: I was afraid that you were going to say that the next big thing in
 Solr is... Elasticsearch!)
 
 -- Jack Krupansky
 
 -Original Message- From: Otis Gospodnetic
 Sent: Friday, October 25, 2013 2:43 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr - what's the next big thing?
 
 Saar,
 
 The marriage with the big yellow elephant is a big deal. It changes the
 scale.
 
 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Oct 25, 2013 5:32 AM, Saar Carmi saarca...@gmail.com wrote:
 
 If I am not mistaken the most impressive improvement of Solr 4.0 compared
 to previous versions was the Solr Cloud architecture.
 
 What would be the next big thing in Solr 5.0 ?
 
 Saar
 


Re: Spatial Distance Range

2013-10-22 Thread Bill Bell
Yes frange works 

Bill Bell
Sent from mobile


 On Oct 22, 2013, at 8:17 AM, Eric Grobler impalah...@googlemail.com wrote:
 
 Hi Everyone,
 
 Normally one would search for documents where the location is within a
 specified distance, for example widthin 5 km:
 fq={!geofilt pt=45.15,-93.85 sfield=store
 d=5}http://localhost:8983/solr/select?wt=jsonindent=truefl=name,storeq=*:*fq=%7B!geofilt%20pt=45.15,-93.85%20sfield=store%20d=5%7D
 
 It there a way to specify a range between 10 and 20 km?
 Something like:
 fq={!geofilt pt=45.15,-93.85 sfield=store distancefrom=10
 distanceupto=20}http://localhost:8983/solr/select?wt=jsonindent=truefl=name,storeq=*:*fq=%7B!geofilt%20pt=45.15,-93.85%20sfield=store%20d=5%7D
 
 Thanks
 Ericz


Re: Skipping caches on a /select

2013-10-17 Thread Bill Bell
But global on a qt would be awesome !!!

Bill Bell
Sent from mobile


 On Oct 17, 2013, at 2:43 PM, Yonik Seeley ysee...@gmail.com wrote:
 
 There isn't a global  cache=false... it's a local param that can be
 applied to any fq or q parameter independently.
 
 -Yonik
 
 
 On Thu, Oct 17, 2013 at 4:39 PM, Tim Vaillancourt t...@elementspace.com 
 wrote:
 Thanks Yonik,
 
 Does cache=false apply to all caches? The docs make it sound like it is
 for filterCache only, but I could be misunderstanding.
 
 When I force a commit and perform a /select a query many times with
 cache=false, I notice my query gets cached still, my guess is in the
 queryResultCache. At first the query takes 500ms+, then all subsequent
 requests take 0-1ms. I'll confirm this queryResultCache assumption today.
 
 Cheers,
 
 Tim
 
 
 On 16/10/13 06:33 PM, Yonik Seeley wrote:
 
 On Wed, Oct 16, 2013 at 6:18 PM, Tim Vaillancourtt...@elementspace.com
 wrote:
 
 I am debugging some /select queries on my Solr tier and would like to see
 if there is a way to tell Solr to skip the caches on a given /select
 query
 if it happens to ALREADY be in the cache. Live queries are being inserted
 and read from the caches, but I want my debug queries to bypass the cache
 entirely.
 
 I do know about the cache=false param (that causes the results of a
 select to not be INSERTED in to the cache), but what I am looking for
 instead is a way to tell Solr to not read the cache at all, even if there
 actually is a cached result for my query.
 
 Yeah, cache=false for q or fq should already not use the cache at
 all (read or write).
 
 -Yonik


DIH

2013-10-15 Thread Bill Bell
We have a custom Field processor in DIH and we are not CPU bound on one core... 
How do we thread it ?? We need to use more cores

The box has 32 cores and 1 is 100% CPU bound.

Ideas ?

Bill Bell
Sent from mobile



Re: DIH

2013-10-15 Thread Bill Bell
We are NOW CPU bound Thoughts ???

Bill Bell
Sent from mobile


 On Oct 15, 2013, at 8:49 PM, Bill Bell billnb...@gmail.com wrote:
 
 We have a custom Field processor in DIH and we are not CPU bound on one 
 core... How do we thread it ?? We need to use more cores
 
 The box has 32 cores and 1 is 100% CPU bound.
 
 Ideas ?
 
 Bill Bell
 Sent from mobile
 


Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

2013-10-11 Thread Bill Bell
Does this work ?
I can suggest -XX:-UseLoopPredicate to switch off predicates.

???

Which version of 7 is recommended ?

Bill Bell
Sent from mobile


 On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org wrote:
 
 *Don't* use JDK 7u40, it's been known to cause index corruption and
 SIGSEGV faults with Lucene: LUCENE-5212   This has not been unnoticed by
 Oracle.
 
 ~ David
 
 On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com wrote:
 
 2. Java version: There are huges performance winning between Java 5, 6
   and 7; we use Oracle JDK 7u40.
 


Re: Field with default value and stored=false, will be reset back to the default value in case of updating other fields

2013-10-09 Thread Bill Bell
You have to update the whole record including all fields...

Bill Bell
Sent from mobile


 On Oct 9, 2013, at 7:50 PM, deniz denizdurmu...@gmail.com wrote:
 
 hi all,
 
 I have encountered some problems and post it on stackoverflow here:
 http://stackoverflow.com/questions/19285251/solr-field-with-default-value-resets-itself-if-it-is-stored-false
  
 
 as you can see from the response, does it make sense to open a bug ticket
 for this? because, although i can workaround this by setting everything back
 to stored=true, it does not make sense to keep every field stored while i
 dont need to return them in the search result.. or will anyone can make more
 detailed explanations that this is expected and normal? 
 
 
 
 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Field-with-default-value-and-stored-false-will-be-reset-back-to-the-default-value-in-case-of-updatins-tp4094508.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: problem with data import handler delta import due to use of multiple datasource

2013-10-08 Thread Bill Au
I am using 4.3.  It is not related to bugs related to last_index_time.  The
problem is caused by the fact that the parent entity and child entity use
different data source (different databases on different hosts).

From the log output, I do see the the delta query of the child entity being
executed correctly and found all the rows that have been modified for the
child entity.  But it fails when it executed the parentDeltaQuery because
it is still using the database connection from the child entity (ie
datasource ds2 in my example above).

Is there a way to tell DIH to use a different datasource in the
parentDeltaQuery?

Bill


On Sat, Oct 5, 2013 at 10:28 PM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 Which version of Solr and what kind of SQL errors? There were some bugs in
 4.x related to last_index_time, but it does not sound related.

 Regards,
Alex.

 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


 On Sun, Oct 6, 2013 at 8:51 AM, Bill Au bill.w...@gmail.com wrote:

  Here is my DIH config:
 
  dataConfig
  dataSource name=ds1 type=JdbcDataSource
 driver=com.mysql.jdbc.Driver
  url=jdbc:mysql://localhost1/dbname1 user=db_username1
  password=db_password1/
  dataSource name=ds2 type=JdbcDataSource
 driver=com.mysql.jdbc.Driver
  url=jdbc:mysql://localhost2/dbname2 user=db_username2
  password=db_password2/
  document name=products
  entity name=item dataSource=ds1 query=select * from item
  field column=ID name=id /
  field column=NAME name=name /
 
  entity name=feature dataSource=ds2 query=select
  description from feature where item_id='${item.ID}'
  field name=features column=description /
  /entity
  /entity
  /document
  /dataConfig
 
  I am having trouble with delta import.  I think it is because the main
  entity and the sub-entity use different data source.  I have tried using
  both a delta query:
 
  deltaQuery=select id from item where id in (select item_id as id from
  feature where last_modified  '${dih.last_index_time}') or last_modified
  gt; '${dih.last_index_time}'
 
  and a parentDeltaQuery:
 
  entity name=feature pk=ITEM_ID query=select DESCRIPTION as features
  from FEATURE where ITEM_ID='${item.ID}' deltaQuery=select ITEM_ID from
  FEATURE where last_modified  '${dih.last_index_time}'
  parentDeltaQuery=select ID from item where ID=${feature.ITEM_ID}/
 
  I ended up with an SQL error for both.  Is there any way to make delta
  import work in my case?
 
  Bill
 



Re: problem with data import handler delta import due to use of multiple datasource

2013-10-08 Thread Bill Au
Thanks for the suggestion but that won't work as I have last_modified field
in both the parent entity and child entity as I want delta import to kick
in when either change.  That other approach has the same problem since the
parent and child entity uses different datasources.

Bill


On Tue, Oct 8, 2013 at 10:18 AM, Dyer, James
james.d...@ingramcontent.comwrote:

 Bill,

 I do not believe there is any way to tell it to use a different datasource
 for the parent delta query.

 If you used this approach, would it solve your problem:
 http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport ?

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Bill Au [mailto:bill.w...@gmail.com]
 Sent: Tuesday, October 08, 2013 8:50 AM
 To: solr-user@lucene.apache.org
 Subject: Re: problem with data import handler delta import due to use of
 multiple datasource

 I am using 4.3.  It is not related to bugs related to last_index_time.  The
 problem is caused by the fact that the parent entity and child entity use
 different data source (different databases on different hosts).

 From the log output, I do see the the delta query of the child entity being
 executed correctly and found all the rows that have been modified for the
 child entity.  But it fails when it executed the parentDeltaQuery because
 it is still using the database connection from the child entity (ie
 datasource ds2 in my example above).

 Is there a way to tell DIH to use a different datasource in the
 parentDeltaQuery?

 Bill


 On Sat, Oct 5, 2013 at 10:28 PM, Alexandre Rafalovitch
 arafa...@gmail.comwrote:

  Which version of Solr and what kind of SQL errors? There were some bugs
 in
  4.x related to last_index_time, but it does not sound related.
 
  Regards,
 Alex.
 
  Personal website: http://www.outerthoughts.com/
  LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
  - Time is the quality of nature that keeps events from happening all at
  once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
 
 
  On Sun, Oct 6, 2013 at 8:51 AM, Bill Au bill.w...@gmail.com wrote:
 
   Here is my DIH config:
  
   dataConfig
   dataSource name=ds1 type=JdbcDataSource
  driver=com.mysql.jdbc.Driver
   url=jdbc:mysql://localhost1/dbname1 user=db_username1
   password=db_password1/
   dataSource name=ds2 type=JdbcDataSource
  driver=com.mysql.jdbc.Driver
   url=jdbc:mysql://localhost2/dbname2 user=db_username2
   password=db_password2/
   document name=products
   entity name=item dataSource=ds1 query=select * from
 item
   field column=ID name=id /
   field column=NAME name=name /
  
   entity name=feature dataSource=ds2 query=select
   description from feature where item_id='${item.ID}'
   field name=features column=description /
   /entity
   /entity
   /document
   /dataConfig
  
   I am having trouble with delta import.  I think it is because the main
   entity and the sub-entity use different data source.  I have tried
 using
   both a delta query:
  
   deltaQuery=select id from item where id in (select item_id as id from
   feature where last_modified  '${dih.last_index_time}') or
 last_modified
   gt; '${dih.last_index_time}'
  
   and a parentDeltaQuery:
  
   entity name=feature pk=ITEM_ID query=select DESCRIPTION as
 features
   from FEATURE where ITEM_ID='${item.ID}' deltaQuery=select ITEM_ID
 from
   FEATURE where last_modified  '${dih.last_index_time}'
   parentDeltaQuery=select ID from item where ID=${feature.ITEM_ID}/
  
   I ended up with an SQL error for both.  Is there any way to make delta
   import work in my case?
  
   Bill
  
 




problem with data import handler delta import due to use of multiple datasource

2013-10-05 Thread Bill Au
Here is my DIH config:

dataConfig
dataSource name=ds1 type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost1/dbname1 user=db_username1
password=db_password1/
dataSource name=ds2 type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost2/dbname2 user=db_username2
password=db_password2/
document name=products
entity name=item dataSource=ds1 query=select * from item
field column=ID name=id /
field column=NAME name=name /

entity name=feature dataSource=ds2 query=select
description from feature where item_id='${item.ID}'
field name=features column=description /
/entity
/entity
/document
/dataConfig

I am having trouble with delta import.  I think it is because the main
entity and the sub-entity use different data source.  I have tried using
both a delta query:

deltaQuery=select id from item where id in (select item_id as id from
feature where last_modified  '${dih.last_index_time}') or last_modified
gt; '${dih.last_index_time}'

and a parentDeltaQuery:

entity name=feature pk=ITEM_ID query=select DESCRIPTION as features
from FEATURE where ITEM_ID='${item.ID}' deltaQuery=select ITEM_ID from
FEATURE where last_modified  '${dih.last_index_time}'
parentDeltaQuery=select ID from item where ID=${feature.ITEM_ID}/

I ended up with an SQL error for both.  Is there any way to make delta
import work in my case?

Bill


Re: Solr 4.5 spatial search - distance and score

2013-09-13 Thread Bill Bell
You can apply his 4.5 patches to 4.4 or take trunk and it is there

Bill Bell
Sent from mobile


On Sep 12, 2013, at 6:23 PM, Weber solrmaill...@fluidolabs.com wrote:

 I'm trying to get score by using a custom boost and also get the distance. I
 found David's code* to get it using Intersects, which I want to replace by
 {!geofilt} or geodist()
 
 *David's code: https://issues.apache.org/jira/browse/SOLR-4255
 
 He told me geodist() will be available again for this kind of field, which
 is a geohash type.
 
 Then, I'd like to know how it can be done today on 4.4 with {!geofilt} and
 how it will be done on 4.5 using geodist()
 
 Thanks in advance.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-4-5-spatial-search-distance-and-score-tp4089706.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Some highlighted snippets aren't being returned

2013-09-08 Thread Bill Bell
Zip up all your configs 

Bill Bell
Sent from mobile


On Sep 8, 2013, at 3:00 PM, Eric O'Hanlon elo2...@columbia.edu wrote:

 Hi again Everyone,
 
 I didn't get any replies to this, so I thought I'd re-send in case anyone 
 missed it and has any thoughts.
 
 Thanks,
 Eric
 
 On Aug 7, 2013, at 1:51 PM, Eric O'Hanlon elo2...@columbia.edu wrote:
 
 Hi Everyone,
 
 I'm facing an issue in which my solr query is returning highlighted snippets 
 for some, but not all results.  For reference, I'm searching through an 
 index that contains web crawls of human-rights-related websites.  I'm 
 running solr as a webapp under Tomcat and I've included the query's solr 
 params from the Tomcat log:
 
 ...
 webapp=/solr-4.2
 path=/select
 params={facet=truesort=score+descgroup.limit=10spellcheck.q=Unanganf.mimetype_code.facet.limit=7hl.simple.pre=codeq.alt=*:*f.organization_type__facet.facet.limit=6f.language__facet.facet.limit=6hl=truef.date_of_capture_.facet.limit=6group.field=original_urlhl.simple.post=/codefacet.field=domainfacet.field=date_of_capture_facet.field=mimetype_codefacet.field=geographic_focus__facetfacet.field=organization_based_in__facetfacet.field=organization_type__facetfacet.field=language__facetfacet.field=creator_name__facethl.fragsize=600f.creator_name__facet.facet.limit=6facet.mincount=1qf=text^1hl.fl=contentshl.fl=titlehl.fl=original_urlwt=rubyf.geographic_focus__facet.facet.limit=6defType=edismaxrows=10f.domain.facet.limit=6q=Unanganf.organization_based_in__facet.facet.limit=6q.op=ANDgroup=truehl.usePhraseHighlighter=true}
  hits=8 status=0 QTime=108
 ...
 
 For the query above (which can be simplified to say: find all documents that 
 contain the word unangan and return facets, highlights, etc.), I get five 
 search results.  Only three of these are returning highlighted snippets.  
 Here's the highlighting portion of the solr response (note: printed in 
 ruby notation because I'm receiving this response in a Rails app):
 
 
 highlighting=
 {20100602195444/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%202002%20tentang%20Perlindungan%20Anak.pdf=
   {},
  
 20100902203939/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%202002%20tentang%20Perlindungan%20Anak.pdf=
   {},
  
 20111202233029/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%202002%20tentang%20Perlindungan%20Anak.pdf=
   {},
  20100618201646/http://www.komnasham.go.id/portal/files/39-99.pdf=
   {contents=
 [...actual snippet is returned here...]},
  20100902235358/http://www.komnasham.go.id/portal/files/39-99.pdf=
   {contents=
 [...actual snippet is returned here...]},
  
 20110302213056/http://www.komnasham.go.id/publikasi/doc_download/2-uu-no-39-tahun-1999=
   {contents=
 [...actual snippet is returned here...]},
  
 20110302213102/http://www.komnasham.go.id/publikasi/doc_view/2-uu-no-39-tahun-1999?tmpl=componentformat=raw=
   {contents=
 [...actual snippet is returned here...]},
  
 20120303113654/http://www.iwgia.org/iwgia_files_publications_files/0028_Utimut_heritage.pdf=
   {}}
 
 
 I have eight (as opposed to five) results above because I'm also doing a 
 grouped query, grouping by a field called original_url, and this leads to 
 five grouped results.
 
 I've confirmed that my highlight-lacking results DO contain the word 
 unangan, as expected, and this term is appearing in a text field that's 
 indexed and stored, and being searched for all text searches.  For example, 
 one of the search results is for a crawl of this document: 
 http://www.iwgia.org/iwgia_files_publications_files/0028_Utimut_heritage.pdf
 
 And if you view that document on the web, you'll see that it does contain 
 unangan.
 
 Has anyone seen this before?  And does anyone have any good suggestions for 
 troubleshooting/fixing the problem?
 
 Thanks!
 
 - Eric
 


Re: Concat 2 fields in another field

2013-08-27 Thread Bill Bell
If for search just copyField into a multivalued field

Or do it on indexing using DIH or code. A rhino script works too.

Bill Bell
Sent from mobile


On Aug 27, 2013, at 7:15 AM, Jack Krupansky j...@basetechnology.com wrote:

 I have additional examples in the two most recent early access releases of my 
 book - variations on using the existing update processors.
 
 -- Jack Krupansky
 
 -Original Message- From: Federico Chiacchiaretta
 Sent: Tuesday, August 27, 2013 8:39 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Concat 2 fields in another field
 
 Hi,
 we do the same thing using an update request processor chain, this is the
 snippet from solrconfig.xml
 
 updateRequestProcessorChain name=concatenation
 processor class=solr.CloneFieldUpdateProcessorFactory str name=source
 firstname/str str name=destconcatfield/str /processor processor
 class=solr.CloneFieldUpdateProcessorFactory str name=sourcelastname/
 str str name=destconcatfield/str /processor processor class=
 solr.ConcatFieldUpdateProcessorFactory str name=fieldNameconcatfield
 /str str name=delimiter_/str /processor
 processor class=solr.LogUpdateProcessorFactory / processor class=
 solr.RunUpdateProcessorFactory /
 /updateRequestProcessorChain
 
 
 Regards,
 Federico Chiacchiaretta
 
 
 
 2013/8/27 Markus Jelsma markus.jel...@openindex.io
 
 You may be more interested in the ConcatFieldUpdateProcessorFactory:
 
 http://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/ConcatFieldUpdateProcessorFactory.html
 
 
 
 -Original message-
  From:Alok Bhandari alokomprakashbhand...@gmail.com
  Sent: Tuesday 27th August 2013 14:05
  To: solr-user@lucene.apache.org
  Subject: Re: Concat 2 fields in another field
 
  Thanks for reply.
 
  But I don't want to introduce any scripting in my code so want to know  is
  there any Java component available for the same.
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Concat-2-fields-in-another-field-tp4086786p4086791.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 


Re: Solr 4.2.1 update to 4.3/4.4 problem

2013-08-27 Thread Bill Bell
Index and query

analyzer type=index

Bill Bell
Sent from mobile


On Aug 26, 2013, at 5:42 AM, skorrapa korrapati.sus...@gmail.com wrote:

 I have also re indexed the data and tried. And also tried with the belowl
  fieldType name=string_lower_case class=solr.TextField
 sortMissingLast=true omitNorms=true
  analyzer type = index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
analyzer type = query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
analyzer type = select
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType
 This didnt work as well...
 
 
 
 On Mon, Aug 26, 2013 at 4:03 PM, skorrapa [via Lucene] 
 ml-node+s472066n4086601...@n3.nabble.com wrote:
 
 Hello All,
 
 I am still facing the same issue. Case insensitive search isnot working on
 Solr 4.3
 I am using the below configurations in schema.xml
 fieldType name=string_lower_case class=solr.TextField
 sortMissingLast=true omitNorms=true
  analyzer type = index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
analyzer type = query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
analyzer type = select
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType
 Basically I want my string which could have spaces or characters like '-'
 or \ to be searched upon case insensitively.
 Please help.
 
 
 --
 If you reply to this email, your message will be added to the discussion
 below:
 
 http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086601.html
 To unsubscribe from Solr 4.2.1 update to 4.3/4.4 problem, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4081896code=a29ycmFwYXRpLnN1c2htYUBnbWFpbC5jb218NDA4MTg5Nnw0MjEwNTY0Mzc=
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086606.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Bill Bell
This seems like a fairly large issue. Can you create a Jira issue ?

Bill Bell
Sent from mobile


On Jul 30, 2013, at 12:34 PM, Dotan Cohen dotanco...@gmail.com wrote:

 On Tue, Jul 30, 2013 at 9:21 PM, Aloke Ghoshal alghos...@gmail.com wrote:
 Does adding facet.mincount=2 help?
 
 
 
 In fact, when adding facet.mincount=20 (I know that some dupes are in
 the hundreds) I got the OutOfMemoryError in seconds instead of
 minutes.
 
 -- 
 Dotan Cohen
 
 http://gibberish.co.il
 http://what-is-what.com


Re: Performance question on Spatial Search

2013-07-29 Thread Bill Bell
Can you compare with the old geo handler as a baseline. ?

Bill Bell
Sent from mobile


On Jul 29, 2013, at 4:25 PM, Erick Erickson erickerick...@gmail.com wrote:

 This is very strange. I'd expect slow queries on
 the first few queries while these caches were
 warmed, but after that I'd expect things to
 be quite fast.
 
 For a 12G index and 256G RAM, you have on the
 surface a LOT of hardware to throw at this problem.
 You can _try_ giving the JVM, say, 18G but that
 really shouldn't be a big issue, your index files
 should be MMaped.
 
 Let's try the crude thing first and give the JVM
 more memory.
 
 FWIW
 Erick
 
 On Mon, Jul 29, 2013 at 4:45 PM, Steven Bower smb-apa...@alcyon.net wrote:
 I've been doing some performance analysis of a spacial search use case I'm
 implementing in Solr 4.3.0. Basically I'm seeing search times alot higher
 than I'd like them to be and I'm hoping people may have some suggestions
 for how to optimize further.
 
 Here are the specs of what I'm doing now:
 
 Machine:
 - 16 cores @ 2.8ghz
 - 256gb RAM
 - 1TB (RAID 1+0 on 10 SSD)
 
 Content:
 - 45M docs (not very big only a few fields with no large textual content)
 - 1 geo field (using config below)
 - index is 12gb
 - 1 shard
 - Using MMapDirectory
 
 Field config:
 
 fieldType name=geo class=solr.SpatialRecursivePrefixTreeFieldType
 distErrPct=0.025 maxDistErr=0.00045
 spatialContextFactory=com.spatial4j.core.context.jts.JtsSpatialContextFactory
 units=degrees/
 
 field  name=geopoint indexed=true multiValued=false
 required=false stored=true type=geo/
 
 
 What I've figured out so far:
 
 - Most of my time (98%) is being spent in
 java.nio.Bits.copyToByteArray(long,Object,long,long) which is being
 driven by BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock()
 which from what I gather is basically reading terms from the .tim file
 in blocks
 
 - I moved from Java 1.6 to 1.7 based upon what I read here:
 http://blog.vlad1.com/2011/10/05/looking-at-java-nio-buffer-performance/
 and it definitely had some positive impact (i haven't been able to
 measure this independantly yet)
 
 - I changed maxDistErr from 0.09 (which is 1m precision per docs)
 to 0.00045 (50m precision) ..
 
 - It looks to me that the .tim file are being memory mapped fully (ie
 they show up in pmap output) the virtual size of the jvm is ~18gb
 (heap is 6gb)
 
 - I've optimized the index but this doesn't have a dramatic impact on
 performance
 
 Changing the precision and the JVM upgrade yielded a drop from ~18s
 avg query time to ~9s avg query time.. This is fantastic but I want to
 get this down into the 1-2 second range.
 
 At this point it seems that basically i am bottle-necked on basically
 copying memory out of the mapped .tim file which leads me to think
 that the only solution to my problem would be to read less data or
 somehow read it more efficiently..
 
 If anyone has any suggestions of where to go with this I'd love to know
 
 
 thanks,
 
 steve


Re: Solr 4.3.0 DIH problem with MySQL datetime being imported with time as 00:00:00

2013-06-29 Thread Bill Au
I just double check my config.  We are using convertType=true.  Someone
else came up with the config so I am not sure why we are using it.  I will
try with it set to false to see if something else will break.  Thanks for
pointing that out.

This is my first time using DIH.  I really like what I have seen so far.

Bill


On Sat, Jun 29, 2013 at 1:45 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 The default in JdbcDataSource is to use ResultSet.getObject which
 returns the underlying database's type. The type specific methods in
 ResultSet are not invoked unless you are using convertType=true.

 Is MySQL actually returning java.sql.Timestamp objects?

 On Sat, Jun 29, 2013 at 5:22 AM, Bill Au bill.w...@gmail.com wrote:
  I am running Solr 4.3.0, using DIH to import data from MySQL.  I am
 running
  into a very strange problem where data from a datetime column being
  imported with the right date but the time is 00:00:00.  I tried using SQL
  DATE_FORMAT() and also DIH DateFormatTransformer but nothing works.  The
  raw debug response of DIH, it looks like the time porting of the datetime
  data is already 00:00:00 in Solr jdbc query result.
 
  So I looked at the source code of DIH JdbcDataSource class.  It is using
  java.sql.ResultSet and its getDate() method to handle date column.  The
  getDate() method returns java.sql.Date.  The java api doc for
 java.sql.Date
 
  http://docs.oracle.com/javase/6/docs/api/java/sql/Date.html
 
  states that:
 
  To conform with the definition of SQL DATE, the millisecond values
 wrapped
  by a java.sql.Date instance must be 'normalized' by setting the hours,
  minutes, seconds, and milliseconds to zero in the particular time zone
 with
  which the instance is associated.
 
  This seems to be describing exactly my problem.  Has anyone else notice
  this problem?  Has anyone use DIH to index SQL datetime successfully?  If
  so can you send me the relevant portion of the DIH config?
 
  Bill



 --
 Regards,
 Shalin Shekhar Mangar.



Re: Solr 4.3.0 DIH problem with MySQL datetime being imported with time as 00:00:00

2013-06-29 Thread Bill Au
Setting convertType=false does solve the datetime issue.  But there are now
other columns that were working before but not working now.  Since I have
already done some research into the datetime to date issue and not been
able to find a solution, I think I will have to keep convertType set to
false and deal with the other column type that are not working now.

Thanks for your help.

Bill


On Sat, Jun 29, 2013 at 10:24 AM, Bill Au bill.w...@gmail.com wrote:

 I just double check my config.  We are using convertType=true.  Someone
 else came up with the config so I am not sure why we are using it.  I will
 try with it set to false to see if something else will break.  Thanks for
 pointing that out.

 This is my first time using DIH.  I really like what I have seen so far.

 Bill


 On Sat, Jun 29, 2013 at 1:45 AM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

 The default in JdbcDataSource is to use ResultSet.getObject which
 returns the underlying database's type. The type specific methods in
 ResultSet are not invoked unless you are using convertType=true.

 Is MySQL actually returning java.sql.Timestamp objects?

 On Sat, Jun 29, 2013 at 5:22 AM, Bill Au bill.w...@gmail.com wrote:
  I am running Solr 4.3.0, using DIH to import data from MySQL.  I am
 running
  into a very strange problem where data from a datetime column being
  imported with the right date but the time is 00:00:00.  I tried using
 SQL
  DATE_FORMAT() and also DIH DateFormatTransformer but nothing works.  The
  raw debug response of DIH, it looks like the time porting of the
 datetime
  data is already 00:00:00 in Solr jdbc query result.
 
  So I looked at the source code of DIH JdbcDataSource class.  It is using
  java.sql.ResultSet and its getDate() method to handle date column.  The
  getDate() method returns java.sql.Date.  The java api doc for
 java.sql.Date
 
  http://docs.oracle.com/javase/6/docs/api/java/sql/Date.html
 
  states that:
 
  To conform with the definition of SQL DATE, the millisecond values
 wrapped
  by a java.sql.Date instance must be 'normalized' by setting the hours,
  minutes, seconds, and milliseconds to zero in the particular time zone
 with
  which the instance is associated.
 
  This seems to be describing exactly my problem.  Has anyone else notice
  this problem?  Has anyone use DIH to index SQL datetime successfully?
  If
  so can you send me the relevant portion of the DIH config?
 
  Bill



 --
 Regards,
 Shalin Shekhar Mangar.





Re: Solr 4.3.0 DIH problem with MySQL datetime being imported with time as 00:00:00

2013-06-29 Thread Bill Au
So disabling convertType does provide a workaround for my problem with
datetime column.  But the problem still exists when convertType is enabled
because DIH is not doing the conversion correctly for a solr date field.
 Solr date field does have a time portion but java.sql.Date does not.  So
DIH should not be calling ResultSet.getDate() for a solr date field.  It
should really be calling ResultSet.getTimestamp() instead.  Is the fix this
simple?  Am I missing anything?

If the fix is this simple I can submit and commit a patch to DIH.

Bill


On Sat, Jun 29, 2013 at 12:13 PM, Bill Au bill.w...@gmail.com wrote:

 Setting convertType=false does solve the datetime issue.  But there are
 now other columns that were working before but not working now.  Since I
 have already done some research into the datetime to date issue and not
 been able to find a solution, I think I will have to keep convertType set
 to false and deal with the other column type that are not working now.

 Thanks for your help.

 Bill


 On Sat, Jun 29, 2013 at 10:24 AM, Bill Au bill.w...@gmail.com wrote:

 I just double check my config.  We are using convertType=true.  Someone
 else came up with the config so I am not sure why we are using it.  I will
 try with it set to false to see if something else will break.  Thanks for
 pointing that out.

 This is my first time using DIH.  I really like what I have seen so far.

 Bill


 On Sat, Jun 29, 2013 at 1:45 AM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

 The default in JdbcDataSource is to use ResultSet.getObject which
 returns the underlying database's type. The type specific methods in
 ResultSet are not invoked unless you are using convertType=true.

 Is MySQL actually returning java.sql.Timestamp objects?

 On Sat, Jun 29, 2013 at 5:22 AM, Bill Au bill.w...@gmail.com wrote:
  I am running Solr 4.3.0, using DIH to import data from MySQL.  I am
 running
  into a very strange problem where data from a datetime column being
  imported with the right date but the time is 00:00:00.  I tried using
 SQL
  DATE_FORMAT() and also DIH DateFormatTransformer but nothing works.
  The
  raw debug response of DIH, it looks like the time porting of the
 datetime
  data is already 00:00:00 in Solr jdbc query result.
 
  So I looked at the source code of DIH JdbcDataSource class.  It is
 using
  java.sql.ResultSet and its getDate() method to handle date column.  The
  getDate() method returns java.sql.Date.  The java api doc for
 java.sql.Date
 
  http://docs.oracle.com/javase/6/docs/api/java/sql/Date.html
 
  states that:
 
  To conform with the definition of SQL DATE, the millisecond values
 wrapped
  by a java.sql.Date instance must be 'normalized' by setting the hours,
  minutes, seconds, and milliseconds to zero in the particular time zone
 with
  which the instance is associated.
 
  This seems to be describing exactly my problem.  Has anyone else notice
  this problem?  Has anyone use DIH to index SQL datetime successfully?
  If
  so can you send me the relevant portion of the DIH config?
 
  Bill



 --
 Regards,
 Shalin Shekhar Mangar.






Re: Solr 4.3.0 DIH problem with MySQL datetime being imported with time as 00:00:00

2013-06-29 Thread Bill Au
https://issues.apache.org/jira/browse/SOLR-4978


On Sat, Jun 29, 2013 at 2:33 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 Yes we need to use getTimestamp instead of getDate. Please create an issue.

 On Sat, Jun 29, 2013 at 11:48 PM, Bill Au bill.w...@gmail.com wrote:
  So disabling convertType does provide a workaround for my problem with
  datetime column.  But the problem still exists when convertType is
 enabled
  because DIH is not doing the conversion correctly for a solr date field.
   Solr date field does have a time portion but java.sql.Date does not.  So
  DIH should not be calling ResultSet.getDate() for a solr date field.  It
  should really be calling ResultSet.getTimestamp() instead.  Is the fix
 this
  simple?  Am I missing anything?
 
  If the fix is this simple I can submit and commit a patch to DIH.
 
  Bill
 
 
  On Sat, Jun 29, 2013 at 12:13 PM, Bill Au bill.w...@gmail.com wrote:
 
  Setting convertType=false does solve the datetime issue.  But there are
  now other columns that were working before but not working now.  Since I
  have already done some research into the datetime to date issue and not
  been able to find a solution, I think I will have to keep convertType
 set
  to false and deal with the other column type that are not working now.
 
  Thanks for your help.
 
  Bill
 
 
  On Sat, Jun 29, 2013 at 10:24 AM, Bill Au bill.w...@gmail.com wrote:
 
  I just double check my config.  We are using convertType=true.  Someone
  else came up with the config so I am not sure why we are using it.  I
 will
  try with it set to false to see if something else will break.  Thanks
 for
  pointing that out.
 
  This is my first time using DIH.  I really like what I have seen so
 far.
 
  Bill
 
 
  On Sat, Jun 29, 2013 at 1:45 AM, Shalin Shekhar Mangar 
  shalinman...@gmail.com wrote:
 
  The default in JdbcDataSource is to use ResultSet.getObject which
  returns the underlying database's type. The type specific methods in
  ResultSet are not invoked unless you are using convertType=true.
 
  Is MySQL actually returning java.sql.Timestamp objects?
 
  On Sat, Jun 29, 2013 at 5:22 AM, Bill Au bill.w...@gmail.com wrote:
   I am running Solr 4.3.0, using DIH to import data from MySQL.  I am
  running
   into a very strange problem where data from a datetime column being
   imported with the right date but the time is 00:00:00.  I tried
 using
  SQL
   DATE_FORMAT() and also DIH DateFormatTransformer but nothing works.
   The
   raw debug response of DIH, it looks like the time porting of the
  datetime
   data is already 00:00:00 in Solr jdbc query result.
  
   So I looked at the source code of DIH JdbcDataSource class.  It is
  using
   java.sql.ResultSet and its getDate() method to handle date column.
  The
   getDate() method returns java.sql.Date.  The java api doc for
  java.sql.Date
  
   http://docs.oracle.com/javase/6/docs/api/java/sql/Date.html
  
   states that:
  
   To conform with the definition of SQL DATE, the millisecond values
  wrapped
   by a java.sql.Date instance must be 'normalized' by setting the
 hours,
   minutes, seconds, and milliseconds to zero in the particular time
 zone
  with
   which the instance is associated.
  
   This seems to be describing exactly my problem.  Has anyone else
 notice
   this problem?  Has anyone use DIH to index SQL datetime
 successfully?
   If
   so can you send me the relevant portion of the DIH config?
  
   Bill
 
 
 
  --
  Regards,
  Shalin Shekhar Mangar.
 
 
 
 



 --
 Regards,
 Shalin Shekhar Mangar.



Solr 4.3.0 DIH problem with MySQL datetime being imported with time as 00:00:00

2013-06-28 Thread Bill Au
I am running Solr 4.3.0, using DIH to import data from MySQL.  I am running
into a very strange problem where data from a datetime column being
imported with the right date but the time is 00:00:00.  I tried using SQL
DATE_FORMAT() and also DIH DateFormatTransformer but nothing works.  The
raw debug response of DIH, it looks like the time porting of the datetime
data is already 00:00:00 in Solr jdbc query result.

So I looked at the source code of DIH JdbcDataSource class.  It is using
java.sql.ResultSet and its getDate() method to handle date column.  The
getDate() method returns java.sql.Date.  The java api doc for java.sql.Date

http://docs.oracle.com/javase/6/docs/api/java/sql/Date.html

states that:

To conform with the definition of SQL DATE, the millisecond values wrapped
by a java.sql.Date instance must be 'normalized' by setting the hours,
minutes, seconds, and milliseconds to zero in the particular time zone with
which the instance is associated.

This seems to be describing exactly my problem.  Has anyone else notice
this problem?  Has anyone use DIH to index SQL datetime successfully?  If
so can you send me the relevant portion of the DIH config?

Bill


SolrCloud excluding certain files in conf from zookeeper

2013-06-14 Thread Bill Au
When using SolrCloud, is it possible to exclude certain files in the conf
directory from being loaded into Zookeeper?

We are keeping our own solr related config files in the conf directory that
is actually different for each node.  Right now the copy in Zookeeper is
overriding the local copy.

Bill


question about the file data/index.properties

2013-05-15 Thread Bill Au
I am running 2 separate 4.3 SolrCloud clusters.  On one of them I noticed
the file data/index.properties on the replica nodes where the index
directory is named index.value of index property in index.properties.
 On the other cluster, the index directory is just named index.

Under what condition is index.properties created?  I am trying to
understand why there is a difference between my 2 SolrCloud clusters.

Bill


Re: question about the file data/index.properties

2013-05-15 Thread Bill Au
Thanks for that info.  So besides the two that I have already seen, are
there any more ways that the index directory can be named?  I am working on
some home-grown administration scripts which need to know the name of the
index directory.

Bill


On Wed, May 15, 2013 at 7:13 PM, Mark Miller markrmil...@gmail.com wrote:

 It's fairly meaningless from a user perspective, but it happens when an
 index is replicated that cannot be simply merged with the existing index
 files and needs a new directory.

 - Mark

 On May 15, 2013, at 5:38 PM, Bill Au bill.w...@gmail.com wrote:

  I am running 2 separate 4.3 SolrCloud clusters.  On one of them I noticed
  the file data/index.properties on the replica nodes where the index
  directory is named index.value of index property in index.properties.
  On the other cluster, the index directory is just named index.
 
  Under what condition is index.properties created?  I am trying to
  understand why there is a difference between my 2 SolrCloud clusters.
 
  Bill




Best practice for rebuild index in SolrCloud

2013-04-08 Thread Bill Au
We are using SolrCloud for replication and dynamic scaling but not
distribution so we are only using a single shard.  From time to time we
make changes to the index schema that requires rebuilding of the index.

Should I treat the rebuilding as just any other index operation?  It seems
to me it would be better if I can somehow take a node offline and rebuild
the index there, then put it back online and let the new index be
replicated from there.  But I am not sure how to do the latter.

Bill


multiple SolrCloud clusters with one ZooKeeper ensemble?

2013-03-28 Thread Bill Au
Can I use a single ZooKeeper ensemble for multiple SolrCloud clusters or
would each SolrCloud cluster requires its own ZooKeeper ensemble?

Bill


Re: multiple SolrCloud clusters with one ZooKeeper ensemble?

2013-03-28 Thread Bill Au
Thanks.

Now I have to go back and re-read the entire SolrCloud Wiki to see what
other info I missed and/or forgot.

Bill


On Thu, Mar 28, 2013 at 12:48 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : Can I use a single ZooKeeper ensemble for multiple SolrCloud clusters or
 : would each SolrCloud cluster requires its own ZooKeeper ensemble?

 https://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot

 (I'm going to FAQ this)


 -Hoss



Solr 4.1 SolrCloud with 1 shard and 3 replicas

2013-03-27 Thread Bill Au
I am running Solr 4.1.  I have set up SolrCloud with 1 leader and 3
replicas, 4 nodes total.  Do query requests send to a node only query the
replica on that node, or are they load-balanced to the entire cluster?

Bill


Re: Solr 4.1 SolrCloud with 1 shard and 3 replicas

2013-03-27 Thread Bill Au
Thanks for the info, Erik.

I had gone through the tutorial in the SolrCloud Wiki and verified that
queries are load balanced in the two shard cluster with shard replicas
setup.  I was wondering if I need to explicitly specify distrib=false in my
single shard setup.  Glad to see that Solr is doing the right thing by
default in my case.

Bill

ps thanks for a very informative webinar.  I am going to recommend it to my
co-workers once the recording is available


On Wed, Mar 27, 2013 at 3:26 PM, Erik Hatcher erik.hatc...@gmail.comwrote:

 Requests to a node in your example would be answered by that node (no need
 to distribute; it's a single shard system) and it would not internally be
 routed otherwise either.  Ultimately it is up to the client to load-balance
 the initial requests into a SolrCloud cluster, but internally in a
 multi-shard distributed search request it will be load balanced beyond that
 initial node.

 CloudSolrServer does load balance, so if you're using that client it'll
 randomly pick a shard to send to from the client-side.  If you're using
 some other mechanism, it'll request directly to whatever node that you've
 specified directly for that initial request.

 Erik

 p.s. Thanks for attending the webinar, Bill!   I saw your name as one of
 the question askers.  Hopefully all that stuff I made up is close to the
 truth :)



 On Mar 27, 2013, at 14:51 , Bill Au wrote:

  I am running Solr 4.1.  I have set up SolrCloud with 1 leader and 3
  replicas, 4 nodes total.  Do query requests send to a node only query the
  replica on that node, or are they load-balanced to the entire cluster?
 
  Bill




Re: [ANNOUNCE] Apache Solr 4.2 released

2013-03-17 Thread Bill Au
The Upgrading from Solr 4.1.0 section of the 4.2.0 CHANGES.txt says:

(No upgrade instructions yet)

To me that's not the same as no need to do anything.  I think the doc
should be updated with either specific instructions or states 4.2.0 is
backward compatible with 4.1.0 so there is no need to do anything.

Bill


On Sun, Mar 17, 2013 at 6:12 AM, sandeep a sundipk...@gmail.com wrote:

 Hi , please let me know how to upgrade solr from 4.1.0 to 4.2.0.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/ANNOUNCE-Apache-Solr-4-2-released-tp4046510p4048201.html
 Sent from the Solr - User mailing list archive at Nabble.com.


multiple facet.prefix for the same facet.field VS multiple facet.query

2013-02-21 Thread Bill Au
There have been requests for supporting multiple facet.prefix for the same
facet.field.  There is an open JIRA with a patch:

https://issues.apache.org/jira/browse/SOLR-1351

Wouldn't using multiple facet.query achieve the same result?  I mean
something like:

facet.query=lastName:A*facet.query=lastName:B*facet.query=lastName:C*


Bill


Re: multiple facet.prefix for the same facet.field VS multiple facet.query

2013-02-21 Thread Bill Au
Never mind.  I just realized the difference between the two.  Sorry for the
noise.

Bill


On Thu, Feb 21, 2013 at 8:42 AM, Bill Au bill.w...@gmail.com wrote:

 There have been requests for supporting multiple facet.prefix for the same
 facet.field.  There is an open JIRA with a patch:

 https://issues.apache.org/jira/browse/SOLR-1351

 Wouldn't using multiple facet.query achieve the same result?  I mean
 something like:

 facet.query=lastName:A*facet.query=lastName:B*facet.query=lastName:C*


 Bill




Re: Solr 4.0 SolrCloud with AWS Auto Scaling

2013-01-04 Thread Bill Au
thanks for pointing me to Solr's Zookeeper servlet.  I will look at the
source to see how I can use to fulfill my needs.

Bill


On Thu, Jan 3, 2013 at 6:43 PM, Mark Miller markrmil...@gmail.com wrote:

 Technically, you want to make sure zookeeper reports the node as live and
 active.

 You could use the same api that the UI uses for that - the
 localhost:port/solr/zookeeper (I think?) servlet.

 If you can't reach it for a node, it's obviously down - if you can reach
 it, parse the json and see if it notes the node as active?

 Not quite as clean as you'd like prob. Might be worth a JIRA issue to look
 at further options.

 - Mark

 On Jan 3, 2013, at 5:54 PM, Bill Au bill.w...@gmail.com wrote:

  Thanks, Mark.
 
  That does remove the node.  And it seems to do so permanently.  Even
 when I
  restart Solr after unloading, it does not join the SolrCloud cluster.
  And
  I can get it to re-join the cluster by creating the core.
 
  Anyone know if there is an API to determine the state of a node.  When
 AWS
  auto scaling add a new node, I need to make sure it has before active
  before I enable it in the load balancer.
 
  Bill
 
 
 
 
  On Thu, Jan 3, 2013 at 9:10 AM, Mark Miller markrmil...@gmail.com
 wrote:
 
 
  http://wiki.apache.org/solr/CoreAdmin#UNLOAD
 
  - Mark
 
  On Jan 3, 2013, at 9:06 AM, Bill Au bill.w...@gmail.com wrote:
 
  Mark,
 What do you mean by unload them?
 
  I am using an AWS load balancer with my auto scaling group in stead of
  using Solr's built-in load balancer.  I am no sharding my index.  I am
  using SolrCloud for replication only.  I am doing local search on each
  instance and sending all updates to the shard leader directly because I
  want to minimize traffic between nodes during search and update
 
  Bill
 
 
  On Wed, Jan 2, 2013 at 6:47 PM, Mark Miller markrmil...@gmail.com
  wrote:
 
 
  On Jan 2, 2013, at 5:51 PM, Bill Au bill.w...@gmail.com wrote:
 
  Is anyone running Solr 4.0 SolrCloud with AWS auto scaling?
 
  My concern is that as AWS auto scaling add and remove instances to
  SolrCloud, the number of nodes in SolrCloud Zookeeper config will
 grow
  indefinitely as removed instances will never be used again.  AWS auto
  scaling will keep on adding new instances, and there is no way to
  remove
  them from Zookeeper, right?
 
  You can unload them and that removes them.
 
  What's the effect of have all these phantom
  nodes?
 
  Unless they are only replicas, they would need to be removed.
 
  Also, unless you are using elastic ips,
  https://issues.apache.org/jira/browse/SOLR-4078 may be of interest.
 
  - Mark
 
 




Re: Solr 4.0 SolrCloud with AWS Auto Scaling

2013-01-03 Thread Bill Au
With AWS auto scaling, one can specify a minimum number of instances for an
auto scaling group.  So there should never be an insufficient number of
replicas.  Once can also specify a termination policy so that the newly
added nodes are removed first.

But with SolrCloud as long as there are enough replicas there is no wrong
node to remove, right?

AWS Beanstalk seems to be a wrapper for AWS auto scaling and other AWS
elastic services.  I am not sure if it offers the detail-grained control
that you have when using auto scaling directly.


On Wed, Jan 2, 2013 at 11:14 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 We've considered using AWS Beanstalk (hmm, what's the difference between
 AWS auto scaling and elastic beanstalk? not sure.) for search-lucene.com ,
 but the idea of something adding and removing nodes seems scary.  The
 scariest part to me is automatic removal of wrong nodes that ends up in
 data loss or insufficient number of replicas.

 But if somebody has done thing and has written up a how-to, I'd love to see
 it!

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Wed, Jan 2, 2013 at 5:51 PM, Bill Au bill.w...@gmail.com wrote:

  Is anyone running Solr 4.0 SolrCloud with AWS auto scaling?
 
  My concern is that as AWS auto scaling add and remove instances to
  SolrCloud, the number of nodes in SolrCloud Zookeeper config will grow
  indefinitely as removed instances will never be used again.  AWS auto
  scaling will keep on adding new instances, and there is no way to remove
  them from Zookeeper, right?  What's the effect of have all these phantom
  nodes?
 
  Bill
 



  1   2   3   4   5   >