Re: Index optimize runs in background.

2015-06-10 Thread Upayavira
Until somewhere around Lucene 3.5, you needed to optimise, because the
merge strategy used wasn't that clever and left lots of deletes in your
largest segment. Around that point, the TieredMergePolicy became the
default. Because its algorithm is much more sophisticated, it took away
the need to optimize in the majority of scenarios. In fact, it
transformed optimizing from being a necessary thing to being a "bad"
thing in most cases.

So yes, let the algorithm take care of it, so long as you are using the
TieredMergePolicy, which has been the default for over 2 years.

Upayavira

On Thu, Jun 11, 2015, at 07:01 AM, Walter Underwood wrote:
> Why would you care when the forced merge (not an “optimize”) is done?
> Start it and get back to work.
> 
> Or even better, never force merge and let the algorithm take care of it.
> Seriously, I’ve been giving this advice since before Lucene was written,
> because Ultraseek had the same approach for managing index segments.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
> On Jun 10, 2015, at 10:35 PM, Erick Erickson 
> wrote:
> 
> > If I knew, I would fix it ;). The sub-optimizes (i.e. the ones
> > sent out to each replica) should be sent in parallel and then
> > each thread should wait for completion from the replicas. There
> > is no real "check for optimize", I believe that the return from the
> > call is considered sufficient. If we can track down if there are
> > conditions under which this is not true we can fix it.
> > 
> > But until there's a way to reproduce it, it's pretty much speculation.
> > 
> > Best,
> > Erick
> > 
> > On Wed, Jun 10, 2015 at 10:14 PM, Modassar Ather  
> > wrote:
> >> Hi,
> >> 
> >> There are 5 cores and a separate server for indexing on this solrcloud. Can
> >> you please share your suggestions on:
> >>  How can indexer know that the optimize has completed even if the
> >> commit/optimize runs in background without going to the solr servers may be
> >> by using any solrj or other API?
> >> 
> >> I tried but could not find any API/handler to check if the optimizations is
> >> completed. Kindly share your inputs.
> >> 
> >> Thanks,
> >> Modassar
> >> 
> >> On Thu, Jun 4, 2015 at 9:36 PM, Erick Erickson 
> >> wrote:
> >> 
> >>> Can't get any failures to happen on my end so I really haven't a clue.
> >>> 
> >>> Best,
> >>> Erick
> >>> 
> >>> On Thu, Jun 4, 2015 at 3:17 AM, Modassar Ather 
> >>> wrote:
>  Hi,
>  
>  Please provide your inputs on optimize and commit running as background.
>  Your suggestion will be really helpful.
>  
>  Thanks,
>  Modassar
>  
>  On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather 
>  wrote:
>  
> > Erick! I could not find any underlying setting of 10 minutes.
> > It is not only optimize but commit is also behaving in the same fashion
> > and is taking lesser time than usually had taken.
> > As per my observation both are running in background.
> > 
> > On Fri, May 29, 2015 at 7:21 PM, Erick Erickson <
> >>> erickerick...@gmail.com>
> > wrote:
> > 
> >> I'm not talking about you setting a timeout, but the underlying
> >> connection timing out...
> >> 
> >> The "10 minutes then the indexer exits" comment points in that
> >>> direction.
> >> 
> >> Best,
> >> Erick
> >> 
> >> On Thu, May 28, 2015 at 11:43 PM, Modassar Ather <
> >>> modather1...@gmail.com>
> >> wrote:
> >>> I have not added any timeout in the indexer except zk client time out
> >> which
> >>> is 30 seconds. I am simply calling client.close() at the end of
> >> indexing.
> >>> The same code was not running in background for optimize with
> >> solr-4.10.3
> >>> and org.apache.solr.client.solrj.impl.CloudSolrServer.
> >>> 
> >>> On Fri, May 29, 2015 at 11:13 AM, Erick Erickson <
> >> erickerick...@gmail.com>
> >>> wrote:
> >>> 
>  Are you timing out on the client request? The theory here is that
> >>> it's
>  still a synchronous call, but you're just timing out at the client
>  level. At that point, the optimize is still running it's just the
>  connection has been dropped
>  
>  Shot in the dark.
>  Erick
>  
>  On Thu, May 28, 2015 at 10:31 PM, Modassar Ather <
> >> modather1...@gmail.com>
>  wrote:
> > I could not notice it but with my past experience of commit which
> >> used to
> > take around 2 minutes is now taking around 8 seconds. I think
> >>> this is
>  also
> > running as background.
> > 
> > On Fri, May 29, 2015 at 10:52 AM, Modassar Ather <
> >> modather1...@gmail.com
> > 
> > wrote:
> > 
> >> The indexer takes almost 2 hours to optimize. It has a
> >> multi-threaded
>  add
> >> of batches of documents to
> >> org.apache.s

AW: How to assign shard to specifc node?

2015-06-10 Thread MOIS Martin (MORPHO)
Thank you for your quick answer.

The two parameters createNodeSet and createNodeSet.shuffle seem to solve the 
problem:

http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=3&router.name=implicit&shards=shard1,shard2,shard3&router.field=shard&createNodeSet=node1,node2,node3&createNodeSet.shuffle=false

Best Regards,
Martin Mois

-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com] 
Gesendet: Mittwoch, 10. Juni 2015 17:45
An: solr-user@lucene.apache.org
Betreff: Re: How to assign shard to specifc node?

Take a look at the collections API CREATE command in more detail here:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api1

Admittedly this is 5.2 but you didn't mention what version of Solr you're using.
In particular the createNodeSet and createNodeSet.shuffle parameters.

Best,
Erick

On Wed, Jun 10, 2015 at 8:31 AM, MOIS Martin (MORPHO)  
wrote:
> Hello,
>
> I have a cluster with 3 nodes (node1, node2 and node3). Now I want to create 
> a new collection with 3 shards using `implicit` routing:
>
> 
> http://localhost:8983/solr/admin/collections?action=CREATE&name=mycoll
> ection&numShards=3&router.name=implicit&shards=shard1,shard2,shard3&ro
> uter.field=shard
>
> How can I control on which node each shard gets created? The goal is to 
> create shard1 on node1, shard2 on node2, etc..
>
> The background is that the actual raw data the index is created for should 
> reside on the same host. That means I have a "raw" record composed of 
> different data (documents, images, meta-data, etc.) for which I compute a 
> Lucene "document" that gets indexed. In order to reduce network traffic I 
> want to process the "raw" record on node1 and insert the resulting Lucene 
> document into shard1 that resides on node1. If shard1 would reside on node2, 
> the Lucene document would have to be send from node1 to node2 which causes 
> for big record sets a lot of inter node communication.
>
> Thanks in advance.
>
> Best Regards,
> Martin Mois
> #
> " This e-mail and any attached documents may contain confidential or 
> proprietary information. If you are not the intended recipient, you are 
> notified that any dissemination, copying of this e-mail and any attachments 
> thereto or use of their contents by any means whatsoever is strictly 
> prohibited. If you have received this e-mail in error, please advise the 
> sender immediately and delete this e-mail and all attached documents from 
> your computer system."
> #
#
" This e-mail and any attached documents may contain confidential or 
proprietary information. If you are not the intended recipient, you are 
notified that any dissemination, copying of this e-mail and any attachments 
thereto or use of their contents by any means whatsoever is strictly 
prohibited. If you have received this e-mail in error, please advise the sender 
immediately and delete this e-mail and all attached documents from your 
computer system."
#


Re: Index optimize runs in background.

2015-06-10 Thread Walter Underwood
Why would you care when the forced merge (not an “optimize”) is done? Start it 
and get back to work.

Or even better, never force merge and let the algorithm take care of it. 
Seriously, I’ve been giving this advice since before Lucene was written, 
because Ultraseek had the same approach for managing index segments.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Jun 10, 2015, at 10:35 PM, Erick Erickson  wrote:

> If I knew, I would fix it ;). The sub-optimizes (i.e. the ones
> sent out to each replica) should be sent in parallel and then
> each thread should wait for completion from the replicas. There
> is no real "check for optimize", I believe that the return from the
> call is considered sufficient. If we can track down if there are
> conditions under which this is not true we can fix it.
> 
> But until there's a way to reproduce it, it's pretty much speculation.
> 
> Best,
> Erick
> 
> On Wed, Jun 10, 2015 at 10:14 PM, Modassar Ather  
> wrote:
>> Hi,
>> 
>> There are 5 cores and a separate server for indexing on this solrcloud. Can
>> you please share your suggestions on:
>>  How can indexer know that the optimize has completed even if the
>> commit/optimize runs in background without going to the solr servers may be
>> by using any solrj or other API?
>> 
>> I tried but could not find any API/handler to check if the optimizations is
>> completed. Kindly share your inputs.
>> 
>> Thanks,
>> Modassar
>> 
>> On Thu, Jun 4, 2015 at 9:36 PM, Erick Erickson 
>> wrote:
>> 
>>> Can't get any failures to happen on my end so I really haven't a clue.
>>> 
>>> Best,
>>> Erick
>>> 
>>> On Thu, Jun 4, 2015 at 3:17 AM, Modassar Ather 
>>> wrote:
 Hi,
 
 Please provide your inputs on optimize and commit running as background.
 Your suggestion will be really helpful.
 
 Thanks,
 Modassar
 
 On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather 
 wrote:
 
> Erick! I could not find any underlying setting of 10 minutes.
> It is not only optimize but commit is also behaving in the same fashion
> and is taking lesser time than usually had taken.
> As per my observation both are running in background.
> 
> On Fri, May 29, 2015 at 7:21 PM, Erick Erickson <
>>> erickerick...@gmail.com>
> wrote:
> 
>> I'm not talking about you setting a timeout, but the underlying
>> connection timing out...
>> 
>> The "10 minutes then the indexer exits" comment points in that
>>> direction.
>> 
>> Best,
>> Erick
>> 
>> On Thu, May 28, 2015 at 11:43 PM, Modassar Ather <
>>> modather1...@gmail.com>
>> wrote:
>>> I have not added any timeout in the indexer except zk client time out
>> which
>>> is 30 seconds. I am simply calling client.close() at the end of
>> indexing.
>>> The same code was not running in background for optimize with
>> solr-4.10.3
>>> and org.apache.solr.client.solrj.impl.CloudSolrServer.
>>> 
>>> On Fri, May 29, 2015 at 11:13 AM, Erick Erickson <
>> erickerick...@gmail.com>
>>> wrote:
>>> 
 Are you timing out on the client request? The theory here is that
>>> it's
 still a synchronous call, but you're just timing out at the client
 level. At that point, the optimize is still running it's just the
 connection has been dropped
 
 Shot in the dark.
 Erick
 
 On Thu, May 28, 2015 at 10:31 PM, Modassar Ather <
>> modather1...@gmail.com>
 wrote:
> I could not notice it but with my past experience of commit which
>> used to
> take around 2 minutes is now taking around 8 seconds. I think
>>> this is
 also
> running as background.
> 
> On Fri, May 29, 2015 at 10:52 AM, Modassar Ather <
>> modather1...@gmail.com
> 
> wrote:
> 
>> The indexer takes almost 2 hours to optimize. It has a
>> multi-threaded
 add
>> of batches of documents to
>> org.apache.solr.client.solrj.impl.CloudSolrClient.
>> Once all the documents are indexed it invokes commit and
>>> optimize. I
 have
>> seen that the optimize goes into background after 10 minutes and
>> indexer
>> exits.
>> I am not sure why this 10 minutes it hangs on indexer. This
>> behavior I
>> have seen in multiple iteration of the indexing of same data.
>> 
>> There is nothing significant I found in log which I can share. I
>> can see
>> following in log.
>> org.apache.solr.update.DirectUpdateHandler2; start
>> 
 
>> 
>>> commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
>> 
>> On Wed, May 27, 2015 at 10:59 PM, Erick Erickson <
 erickerick...@gmail.com>
>> wrote:
>> 
>

Re: Index optimize runs in background.

2015-06-10 Thread Erick Erickson
If I knew, I would fix it ;). The sub-optimizes (i.e. the ones
sent out to each replica) should be sent in parallel and then
each thread should wait for completion from the replicas. There
is no real "check for optimize", I believe that the return from the
call is considered sufficient. If we can track down if there are
conditions under which this is not true we can fix it.

But until there's a way to reproduce it, it's pretty much speculation.

Best,
Erick

On Wed, Jun 10, 2015 at 10:14 PM, Modassar Ather  wrote:
> Hi,
>
> There are 5 cores and a separate server for indexing on this solrcloud. Can
> you please share your suggestions on:
>   How can indexer know that the optimize has completed even if the
> commit/optimize runs in background without going to the solr servers may be
> by using any solrj or other API?
>
> I tried but could not find any API/handler to check if the optimizations is
> completed. Kindly share your inputs.
>
> Thanks,
> Modassar
>
> On Thu, Jun 4, 2015 at 9:36 PM, Erick Erickson 
> wrote:
>
>> Can't get any failures to happen on my end so I really haven't a clue.
>>
>> Best,
>> Erick
>>
>> On Thu, Jun 4, 2015 at 3:17 AM, Modassar Ather 
>> wrote:
>> > Hi,
>> >
>> > Please provide your inputs on optimize and commit running as background.
>> > Your suggestion will be really helpful.
>> >
>> > Thanks,
>> > Modassar
>> >
>> > On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather 
>> > wrote:
>> >
>> >> Erick! I could not find any underlying setting of 10 minutes.
>> >> It is not only optimize but commit is also behaving in the same fashion
>> >> and is taking lesser time than usually had taken.
>> >> As per my observation both are running in background.
>> >>
>> >> On Fri, May 29, 2015 at 7:21 PM, Erick Erickson <
>> erickerick...@gmail.com>
>> >> wrote:
>> >>
>> >>> I'm not talking about you setting a timeout, but the underlying
>> >>> connection timing out...
>> >>>
>> >>> The "10 minutes then the indexer exits" comment points in that
>> direction.
>> >>>
>> >>> Best,
>> >>> Erick
>> >>>
>> >>> On Thu, May 28, 2015 at 11:43 PM, Modassar Ather <
>> modather1...@gmail.com>
>> >>> wrote:
>> >>> > I have not added any timeout in the indexer except zk client time out
>> >>> which
>> >>> > is 30 seconds. I am simply calling client.close() at the end of
>> >>> indexing.
>> >>> > The same code was not running in background for optimize with
>> >>> solr-4.10.3
>> >>> > and org.apache.solr.client.solrj.impl.CloudSolrServer.
>> >>> >
>> >>> > On Fri, May 29, 2015 at 11:13 AM, Erick Erickson <
>> >>> erickerick...@gmail.com>
>> >>> > wrote:
>> >>> >
>> >>> >> Are you timing out on the client request? The theory here is that
>> it's
>> >>> >> still a synchronous call, but you're just timing out at the client
>> >>> >> level. At that point, the optimize is still running it's just the
>> >>> >> connection has been dropped
>> >>> >>
>> >>> >> Shot in the dark.
>> >>> >> Erick
>> >>> >>
>> >>> >> On Thu, May 28, 2015 at 10:31 PM, Modassar Ather <
>> >>> modather1...@gmail.com>
>> >>> >> wrote:
>> >>> >> > I could not notice it but with my past experience of commit which
>> >>> used to
>> >>> >> > take around 2 minutes is now taking around 8 seconds. I think
>> this is
>> >>> >> also
>> >>> >> > running as background.
>> >>> >> >
>> >>> >> > On Fri, May 29, 2015 at 10:52 AM, Modassar Ather <
>> >>> modather1...@gmail.com
>> >>> >> >
>> >>> >> > wrote:
>> >>> >> >
>> >>> >> >> The indexer takes almost 2 hours to optimize. It has a
>> >>> multi-threaded
>> >>> >> add
>> >>> >> >> of batches of documents to
>> >>> >> >> org.apache.solr.client.solrj.impl.CloudSolrClient.
>> >>> >> >> Once all the documents are indexed it invokes commit and
>> optimize. I
>> >>> >> have
>> >>> >> >> seen that the optimize goes into background after 10 minutes and
>> >>> indexer
>> >>> >> >> exits.
>> >>> >> >> I am not sure why this 10 minutes it hangs on indexer. This
>> >>> behavior I
>> >>> >> >> have seen in multiple iteration of the indexing of same data.
>> >>> >> >>
>> >>> >> >> There is nothing significant I found in log which I can share. I
>> >>> can see
>> >>> >> >> following in log.
>> >>> >> >> org.apache.solr.update.DirectUpdateHandler2; start
>> >>> >> >>
>> >>> >>
>> >>>
>> commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
>> >>> >> >>
>> >>> >> >> On Wed, May 27, 2015 at 10:59 PM, Erick Erickson <
>> >>> >> erickerick...@gmail.com>
>> >>> >> >> wrote:
>> >>> >> >>
>> >>> >> >>> All strange of course. What do your Solr logs show when this
>> >>> happens?
>> >>> >> >>> And how reproducible is this?
>> >>> >> >>>
>> >>> >> >>> Best,
>> >>> >> >>> Erick
>> >>> >> >>>
>> >>> >> >>> On Wed, May 27, 2015 at 4:00 AM, Upayavira 
>> wrote:
>> >>> >> >>> > In this case, optimising makes sense, once the index is
>> >>> generated,
>> >>> >> you
>> >>> >> >>> > are not updating It.
>> >>> >> >>> >
>> >>> >> >>> > Upayavira
>> >>> >> >>> >
>> >>> >> >>> > 

Re: Index optimize runs in background.

2015-06-10 Thread Modassar Ather
Hi,

There are 5 cores and a separate server for indexing on this solrcloud. Can
you please share your suggestions on:
  How can indexer know that the optimize has completed even if the
commit/optimize runs in background without going to the solr servers may be
by using any solrj or other API?

I tried but could not find any API/handler to check if the optimizations is
completed. Kindly share your inputs.

Thanks,
Modassar

On Thu, Jun 4, 2015 at 9:36 PM, Erick Erickson 
wrote:

> Can't get any failures to happen on my end so I really haven't a clue.
>
> Best,
> Erick
>
> On Thu, Jun 4, 2015 at 3:17 AM, Modassar Ather 
> wrote:
> > Hi,
> >
> > Please provide your inputs on optimize and commit running as background.
> > Your suggestion will be really helpful.
> >
> > Thanks,
> > Modassar
> >
> > On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather 
> > wrote:
> >
> >> Erick! I could not find any underlying setting of 10 minutes.
> >> It is not only optimize but commit is also behaving in the same fashion
> >> and is taking lesser time than usually had taken.
> >> As per my observation both are running in background.
> >>
> >> On Fri, May 29, 2015 at 7:21 PM, Erick Erickson <
> erickerick...@gmail.com>
> >> wrote:
> >>
> >>> I'm not talking about you setting a timeout, but the underlying
> >>> connection timing out...
> >>>
> >>> The "10 minutes then the indexer exits" comment points in that
> direction.
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Thu, May 28, 2015 at 11:43 PM, Modassar Ather <
> modather1...@gmail.com>
> >>> wrote:
> >>> > I have not added any timeout in the indexer except zk client time out
> >>> which
> >>> > is 30 seconds. I am simply calling client.close() at the end of
> >>> indexing.
> >>> > The same code was not running in background for optimize with
> >>> solr-4.10.3
> >>> > and org.apache.solr.client.solrj.impl.CloudSolrServer.
> >>> >
> >>> > On Fri, May 29, 2015 at 11:13 AM, Erick Erickson <
> >>> erickerick...@gmail.com>
> >>> > wrote:
> >>> >
> >>> >> Are you timing out on the client request? The theory here is that
> it's
> >>> >> still a synchronous call, but you're just timing out at the client
> >>> >> level. At that point, the optimize is still running it's just the
> >>> >> connection has been dropped
> >>> >>
> >>> >> Shot in the dark.
> >>> >> Erick
> >>> >>
> >>> >> On Thu, May 28, 2015 at 10:31 PM, Modassar Ather <
> >>> modather1...@gmail.com>
> >>> >> wrote:
> >>> >> > I could not notice it but with my past experience of commit which
> >>> used to
> >>> >> > take around 2 minutes is now taking around 8 seconds. I think
> this is
> >>> >> also
> >>> >> > running as background.
> >>> >> >
> >>> >> > On Fri, May 29, 2015 at 10:52 AM, Modassar Ather <
> >>> modather1...@gmail.com
> >>> >> >
> >>> >> > wrote:
> >>> >> >
> >>> >> >> The indexer takes almost 2 hours to optimize. It has a
> >>> multi-threaded
> >>> >> add
> >>> >> >> of batches of documents to
> >>> >> >> org.apache.solr.client.solrj.impl.CloudSolrClient.
> >>> >> >> Once all the documents are indexed it invokes commit and
> optimize. I
> >>> >> have
> >>> >> >> seen that the optimize goes into background after 10 minutes and
> >>> indexer
> >>> >> >> exits.
> >>> >> >> I am not sure why this 10 minutes it hangs on indexer. This
> >>> behavior I
> >>> >> >> have seen in multiple iteration of the indexing of same data.
> >>> >> >>
> >>> >> >> There is nothing significant I found in log which I can share. I
> >>> can see
> >>> >> >> following in log.
> >>> >> >> org.apache.solr.update.DirectUpdateHandler2; start
> >>> >> >>
> >>> >>
> >>>
> commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> >>> >> >>
> >>> >> >> On Wed, May 27, 2015 at 10:59 PM, Erick Erickson <
> >>> >> erickerick...@gmail.com>
> >>> >> >> wrote:
> >>> >> >>
> >>> >> >>> All strange of course. What do your Solr logs show when this
> >>> happens?
> >>> >> >>> And how reproducible is this?
> >>> >> >>>
> >>> >> >>> Best,
> >>> >> >>> Erick
> >>> >> >>>
> >>> >> >>> On Wed, May 27, 2015 at 4:00 AM, Upayavira 
> wrote:
> >>> >> >>> > In this case, optimising makes sense, once the index is
> >>> generated,
> >>> >> you
> >>> >> >>> > are not updating It.
> >>> >> >>> >
> >>> >> >>> > Upayavira
> >>> >> >>> >
> >>> >> >>> > On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
> >>> >> >>> >> Our index has almost 100M documents running on SolrCloud of 5
> >>> shards
> >>> >> >>> and
> >>> >> >>> >> each shard has an index size of about 170+GB (for the record,
> >>> we are
> >>> >> >>> not
> >>> >> >>> >> using stored fields - our documents are pretty large). We
> >>> perform a
> >>> >> >>> full
> >>> >> >>> >> indexing every weekend and during the week there are no
> updates
> >>> >> made to
> >>> >> >>> >> the
> >>> >> >>> >> index. Most of the queries that we run are pretty complex
> with
> >>> >> hundreds
> >>> >> >>> >> of
> >>> >> >>> >> terms using PhraseQuery, BooleanQue

Show all fields in Solr highlighting output

2015-06-10 Thread Zheng Lin Edwin Yeo
Hi,

Is it possible to list all the fields in the highlighting portion in the
output?
Currently,even when I *, it only shows fields where
highlighting is possible, and fields which highlighting is not possible is
not shown.

I would like to have the output where all the fields, regardless if
highlighting is possible or not, to be shown together.


Regards,
Edwin


Re: Indexing issue - index get deleted

2015-06-10 Thread Chris Hostetter

: The guys was using delta import anyway, so maybe the problem is
: different and not related to the clean.

that's not what the logs say.

Here's what i see...

Log begins with server startup @ "Jun 10, 2015 11:14:56 AM"

The DeletionPolicy for the "shopclue_prod" core is initialized at "Jun 
10, 2015 11:15:04 AM" and we see a few interesting things here we note 
for the future as we keep reading...

1) There is currently "commits:num=1" commits on disk
2) the current index dir in use is "index.20150311161021822"
3) the current segment & generation are "segFN=segments_1a,generation=46"

Immediately after this, we see some searcher warming using a searcher with 
this same segments file, and then this searcher is registered ("Jun 10, 
2015 11:15:05 AM") and the core is registered.

Next we see some replication polling, and we see what look like some 
simple monitoring requests for "q=*" which return hits=85898 being 
repeated over and over.

At "Jun 10, 2015 11:16:30 AM" we see some requests for /dataimport that 
look like they are coming from the UI. and then at "Jun 10, 2015 11:17:01 
AM" we see a request for a full import started.

We have no idea what the data import configuration file looks like, so we 
have no idea if clean=false is being used or not.  it's certianly not 
specified in the URL.

We see some more monitoring URLs returning hits=85898 and some more 
/repliation status calls, and then @ "Jun 10, 2015 11:18:02 AM" we see the 
first commit executed since hte server started up.  

there's no indication that this commit came from an external request (eg 
"/update") so probably was made by some internal request.  One 
possiblility is that it came from DIH finishing -- but i doubt it, i'm 
fairly sure that would have involved more logging then this.  A more 
probably scenerio is that it came from an autoCommit setting -- the fact 
that it is almost exactly 60 seconds after DIH started -- and almost 
exactly 60 seconds after DIH may have done a deleteAll query due to 
clean=true -- makes it seem very likely that this was a 1 minute 
autoCommit)

(but since we don't have either hte data import config, or the 
solrconfig.xml, we have no way of knowing -- it's all just guess work.)

Very importantly, note that this commit is not opening a new searcher...

Jun 10, 2015 11:18:02 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start 
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

Here are some other interesting things to note from the logging 
that comes from the DeletionPolicy when this commit happens...

1) it now notes that there are commits:num=2 on disk
2) the current index dir hasn't changed (index.20150311161021822) so 
some weird replication command didn't swap the world out from under us
3) the newest segment/generation are "segFN=segments_1b,generation=47"
4) the newest commit has no other files in it besides the segments file.

this means, with out a doubt, there are no documents in this commits view 
of the index.  they have all been deleted by something.


At this point the *old* searcher (for commit generation 46) is still in 
use however -- nothing has done an openSearcher=true.

we see more /dataimport status requests, and other requests that appear to 
come from the Solr UI, and more monitoring queries that still return 
hits=85898 because the same searcher is in use.

At "Jun 10, 2015 11:27:04 AM" we see another commit happen -- again, no 
indication that this came from an outside /update request, so it might be 
from DIH, or it might be from an autoCommit setting.  the fact that it is 
nearly exactly 10 minutes after DIH started (and probably did a clean=true 
deleteAll query) makes it seem extremely likely this is an autoSoftCommit 
setting kicking in.

Very importantly, note that this softCommit *does* open a new searcher...

Jun 10, 2015 11:27:04 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start 
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}


In less then a second, this new searcher is warmed up and the next time we 
see a q=* monitoring query get logged, it returns hits=0.

Note that at no point in the logs, after the DataImporter is started, do 
we see it log anything other then that it has initiated the request to 
MySQL -- we do see some logs starting ~ "Jun 10, 2015 11:41:19 AM" 
indicating that someone was using the Web UI to look at the dataimport 
handler's status report.  it would be really nice to know what that person 
saw at that point -- because my guess is DIH was still running and was 
staled waiting for MySql, and hadn't even started adding docs to Solr (if 
it had, i'm certian there would have been some log of it).

So instead, the combination of a (probable) DIH clean=true option and a 
(near certainty) autoCommit=60sec and autoSoftCommit=10min ment that a new 
commit was created after the cl

Re: TZ & rounding

2015-06-10 Thread Chris Hostetter

: So my question is: can I get offset of time if I use NOW/MINUTE and not 
NOW/DAY rounding?

i'm sorry, but your question is still too terse, vague, and ambiguious for 
me to really make much sense of it; and the example queries you provided 
really don't have enough context for me to understand what it is about 
these queries is like or dislike what you ultimately want.

please re-read my previous request for clarification of your question -- 
in particular, stop assuming anything about Solr, and just tell us about 
your data, and the type of problem you are trying to solve in normal 
sentences with some real concrete examples...


> You've given an example of a query you are currently using -- but you 
> haven't given us any of hte other info we really need ot try and  
> understand your question (ie: what do some of the documents you've 
> indexed look like? what results do you get from your query? what do you 
> see in those results that isn't matching your goal? what documents are 
> matched by your query (or by a facet) that you don't want to be matched? 
> what documents aren't matched that you want to be matched?
>
> The best thing to do would be if you could just describe for us in words 
> what you want, and give a specific example -- such as "If it's 4:37PM in  
> my local timezone Foo/Bar, i want to send a query to Solr and have it 
> return results based on teh time range X to Y with facets like "



-Hoss
http://www.lucidworks.com/


RE: The best way to exclude "seen" results from search queries

2015-06-10 Thread Reitzel, Charles
I don't see any way around storing which recommendations have been delivered to 
each user.  Sounds like a separate collection with the unique ID created from 
the combination of the user ID and the recommendation ID (with the IDs also 
available as a separate, searchable and returnable fields).   

You could then use a so-called "join" query to exclude any recommendations in 
the other collection.

-Original Message-
From: amid [mailto:a...@donanza.com] 
Sent: Wednesday, June 10, 2015 1:27 PM
To: solr-user@lucene.apache.org
Subject: The best way to exclude "seen" results from search queries

Hi,

We have a solr index with ~1M documents.
We want to give the ability to our users to filter results from queries - 
meaning they will not shown again for any query of this specific user (we 
currently have 10K users).

You can think of a scenario like a "recommendation engine" which you don't want 
to give recommendation more than once for each user.

What is the best way to implement this feature (Performance & Memory)?

Thanks,
Ami



--
View this message in context: 
http://lucene.472066.n3.nabble.com/The-best-way-to-exclude-seen-results-from-search-queries-tp4211022.html
Sent from the Solr - User mailing list archive at Nabble.com.

*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*



Re: File paths in Zookeeper managed config files

2015-06-10 Thread Shawn Heisey
On 6/10/2015 2:47 PM, Peter Scholze wrote:
> I'm using Zookeeper 3.4.6 in the context of SolrCloud 5. When
> uploading a config file containing the following, I get an "Invalid
> Path String" error.
>
>  words="/netapp/dokubase/seeval/dicts/stopwords/stopwords_de.txt"
> ignoreCase="true"/>
>
> leads obviously to
>
> Invalid path string
> \"/configs/newspapers//netapp/dokubase/seeval/dicts/stopwords/stopwords_de.txt\"
> caused by empty node name specified @20"
>
> Is there any way to prevent Zookeeper from doing so?

When your config is in zookeeper, which is a requirement for SolrCloud,
your entire config will be in zookeeper -- you cannot refer to files
that are stored elsewhere.  Upload stopwords_de.txt to zookeeper along
with the rest of your config and just reference it with:

words="stopwords_de.txt"

An idea for a feature request, if it's not already possible: Make it
possible to use URI syntax to refer to resources stored in other places,
such as:

file:/path/to/some/file

As third-party example of an implementation that allows this, I pass
this option on startup when I start my Solr 4.x servers:

-Dlog4j.configuration=file:etc/log4j.properties

Thanks,
Shawn



Re: Adding applicative cache to SolrSearcher

2015-06-10 Thread Chris Hostetter
: 
: The problem is SlowCompositeReaderWrapper.wrap(searcher.getIndexReader());
: you hardly ever need to to this, at least because Solr already does it.

Specifically you should just use...

searcher.getLeafReader().getSortedSetDocValues(your_field_anme)

...instead of doing all this wrapping yourself.

If the field docValues="true" declared, this will all be precomputed at 
index time and super fast.  if not, then the UninvertingReader logic will 
kick in once per searcher -- if you want to "pre-warm" it just configure a 
requests that exercises your code as part of a firstSearcher and 
newSearcher event listeners.



-Hoss
http://www.lucidworks.com/


File paths in Zookeeper managed config files

2015-06-10 Thread Peter Scholze

Hi all,

I'm using Zookeeper 3.4.6 in the context of SolrCloud 5. When uploading 
a config file containing the following, I get an "Invalid Path String" 
error.


words="/netapp/dokubase/seeval/dicts/stopwords/stopwords_de.txt" 
ignoreCase="true"/>


leads obviously to

Invalid path string 
\"/configs/newspapers//netapp/dokubase/seeval/dicts/stopwords/stopwords_de.txt\" 
caused by empty node name specified @20"


Is there any way to prevent Zookeeper from doing so?

Thanks in advance,
best regards

Peter



Re: The best way to exclude "seen" results from search queries

2015-06-10 Thread Mikhail Khludnev
start with negating and bypassing caches by
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser
eg
fq=-{!terms f=p_id cache=false}1,3,5,already,seen
note:
Elastic can even store such filters via
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-filter.html#_terms_lookup_mechanism


On Wed, Jun 10, 2015 at 8:26 PM, amid  wrote:

> Hi,
>
> We have a solr index with ~1M documents.
> We want to give the ability to our users to filter results from queries -
> meaning they will not shown again for any query of this specific user (we
> currently have 10K users).
>
> You can think of a scenario like a "recommendation engine" which you don't
> want to give recommendation more than once for each user.
>
> What is the best way to implement this feature (Performance & Memory)?
>
> Thanks,
> Ami
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/The-best-way-to-exclude-seen-results-from-search-queries-tp4211022.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Adding applicative cache to SolrSearcher

2015-06-10 Thread Mikhail Khludnev
Hello,

The problem is SlowCompositeReaderWrapper.wrap(searcher.getIndexReader());
you hardly ever need to to this, at least because Solr already does it.

DocValues need to be accessed per segment, leaf/atomic/reader/context
provided to collector.
eg look at DocTermsIndexDocValues.strVal(int)
DocTermsIndexDocValues.open(LeafReaderContext, String) or
TermOrdValComparator and TopFieldCollector.


On Wed, Jun 10, 2015 at 6:59 PM, adfel70  wrote:

>
> I am using RankQuery to implement my applicative scorer that returns a
> score
> based on the value of specific field (lets call it 'score_field') that is
> stored for every document.
> The RankQuery creates a collector, and for every collected docId I retrieve
> the value of score_field, calculate the score and add the doc id into
> priority queue:
>
> public class MyScorerrankQuet extends RankQuery {
> ...
>
> @Override
> public TopDocsCollector getTopDocsCollector(int i,
> SolrIndexerSearcher.QueryCommand cmd, IndexSearcher searcher) {
> ...
> return new MyCollector(...)
> }
> }
>
> public class MyCollector  extends TopDocsCollector{
> MyScorer scorer;
> SortedDocValues scoreFieldValues;   //Initialized in constrctor
>
> public MyCollector(){
> scorer = new MyScorer();
> scorer.start(); //the scorer's API needs to call start()
> before every query and close() at the end of the query
> AtomicReader r =
> SlowCompositeReaderWrapper.wrap(searcher.getIndexReader());
> scoreFieldValues = DocValues.getSorted(r, "score_field");
>  /*
> THIS CALL IS TIME CONSUMING! */
> }
>
> @Override
> public void collect(int id){
> int docID = docBase + id;
> //1. get specific field from the doc using DocValues and
> calculate score using my scorer
> String value = scoreFieldValues.get(docID).utf8ToString();
> scorer.calcScore(value);
> //2. add docId and score (ScoreDoc object) into
> PriorityQueue.
> }
> }
>
>
> I used DocValues to get the value of score_field. Currently its being
> instantiate in collector's constructor - which is performance killer,
> because it is being called for EVERY query, even if the index is static (no
> commits). I want to make the DocValue.getStored() call only when it is
> really necessary, but I dont know where to put that code. Is there a place
> to plug that code so when a new searcher is being opened I can add my this
> applicative cache?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Adding-applicative-cache-to-SolrSearcher-tp4211012.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





SolrCloud No Active Slice

2015-06-10 Thread James Webster
I'm having a config issue, I'm posting the error from Solrj which also
includes the cluster state JSON:

 org.apache.solr.common.SolrException: No active slice servicing hash code
2ee4d125 in DocCollection(rfp365)={
  "shards":{"shard1":{
  "range":"-",
  "state":"active",
  "replicas":{
"core_node1":{
  "state":"active",
  "core":"rfp365_shard1_replica1",
  "node_name":"172.31.58.150:8983_solr",
  "base_url":"http://172.31.58.150:8983/solr"},
"core_node2":{
  "state":"active",
  "core":"rfp365_shard1_replica2",
  "node_name":"172.31.60.137:8983_solr",
  "base_url":"http://172.31.60.137:8983/solr"},
"core_node3":{
  "state":"active",
  "core":"rfp365_shard1_replica3",
  "node_name":"172.31.58.65:8983_solr",
  "base_url":"http://172.31.58.65:8983/solr";,
  "leader":"true",
  "replicationFactor":"3",
  "router":{"name":"compositeId"},
  "maxShardsPerNode":"1",
  "autoAddReplicas":"true"}
at
org.apache.solr.common.cloud.HashBasedRouter.hashToSlice(HashBasedRouter.java:65)
at
org.apache.solr.common.cloud.HashBasedRouter.getTargetSlice(HashBasedRouter.java:39)
at
org.apache.solr.client.solrj.request.UpdateRequest.getRoutes(UpdateRequest.java:206)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:581)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:948)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:839)
... 6 more

All nodes are active in the solr admin, not sure where to go from here.

Thanks in advance!


SolrCloud No Active Slice

2015-06-10 Thread James Webster
I'm having a config issue, I'm posting the error from Solrj which also
includes the cluster state JSON:

 org.apache.solr.common.SolrException: No active slice servicing hash code
2ee4d125 in DocCollection(rfp365)={
  "shards":{"shard1":{
  "range":"-",
  "state":"active",
  "replicas":{
"core_node1":{
  "state":"active",
  "core":"rfp365_shard1_replica1",
  "node_name":"172.31.58.150:8983_solr",
  "base_url":"http://172.31.58.150:8983/solr"},
"core_node2":{
  "state":"active",
  "core":"rfp365_shard1_replica2",
  "node_name":"172.31.60.137:8983_solr",
  "base_url":"http://172.31.60.137:8983/solr"},
"core_node3":{
  "state":"active",
  "core":"rfp365_shard1_replica3",
  "node_name":"172.31.58.65:8983_solr",
  "base_url":"http://172.31.58.65:8983/solr";,
  "leader":"true",
  "replicationFactor":"3",
  "router":{"name":"compositeId"},
  "maxShardsPerNode":"1",
  "autoAddReplicas":"true"}
at
org.apache.solr.common.cloud.HashBasedRouter.hashToSlice(HashBasedRouter.java:65)
at
org.apache.solr.common.cloud.HashBasedRouter.getTargetSlice(HashBasedRouter.java:39)
at
org.apache.solr.client.solrj.request.UpdateRequest.getRoutes(UpdateRequest.java:206)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:581)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:948)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:839)
... 6 more

All nodes are active in the solr admin, not sure where to go from here.

Thanks in advance!
James


The best way to exclude "seen" results from search queries

2015-06-10 Thread amid
Hi,

We have a solr index with ~1M documents.
We want to give the ability to our users to filter results from queries -
meaning they will not shown again for any query of this specific user (we
currently have 10K users).

You can think of a scenario like a "recommendation engine" which you don't
want to give recommendation more than once for each user.

What is the best way to implement this feature (Performance & Memory)?

Thanks,
Ami



--
View this message in context: 
http://lucene.472066.n3.nabble.com/The-best-way-to-exclude-seen-results-from-search-queries-tp4211022.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Date Format Conversion Function Query

2015-06-10 Thread Ali Nazemian
Thank you very much.
It seems that document transformer is the perfect extension point for this
conversion. I will try to implement that.
Best regards.

On Wed, Jun 10, 2015 at 3:54 PM, Upayavira  wrote:

> Another technology that might make more sense is a Doc Transformer.
>
> You also specify them in the fl parameter. I would imagine you could
> specify
>
> fl=id,[persian f=gregorian_Date]
>
> See here for more cases:
>
>
> https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents
>
> This does not exist right now, but would make a good contribution to
> Solr itself, I'd say.
>
> Upayavira
>
> On Wed, Jun 10, 2015, at 09:57 AM, Alessandro Benedetti wrote:
> > Erick will correct me if I am wrong but this function query I don't think
> > it exists.
> > But maybe can be a nice contribution.
> > It should take in input a date format and a field and give in response
> > the
> > new formatted Date.
> >
> > The would be simple to use it :
> >
> > fl=id,persian_date:dateFormat("/mm/dd",gregorian_Date)
> >
> > The date format is an example in input is an example.
> >
> > Cheers
> >
> > 2015-06-10 7:24 GMT+01:00 Ali Nazemian :
> >
> > > Dear Erick,
> > > Hi,
> > > Actually I want to convert date format from Geregorian calendar (solr
> > > default) to Perisan calendar. You may ask why i do not do that at
> client
> > > side? Here is why:
> > >
> > > I want to provide a way to extract data from solr in the csv format. I
> know
> > > that solr has csv ResponseWriter that could be used in this case. But
> my
> > > problem is that the date format in solr index is provided by Geregorian
> > > calendar and I want to put that in Persian calendar. Therefore I was
> > > thinking of a function query to do that at query time for me.
> > >
> > > Regards.
> > >
> > > On Tue, Jun 9, 2015 at 10:55 PM, Erick Erickson <
> erickerick...@gmail.com>
> > > wrote:
> > >
> > > > I'm not sure what you're asking for, give us an example input/output
> > > pair?
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > > On Tue, Jun 9, 2015 at 8:47 AM, Ali Nazemian 
> > > > wrote:
> > > > > Dear all,
> > > > > Hi,
> > > > > I was wondering is there any function query for converting date
> format
> > > in
> > > > > Solr? If no, how can I implement such function query myself?
> > > > >
> > > > > --
> > > > > A.Nazemian
> > > >
> > >
> > >
> > >
> > > --
> > > A.Nazemian
> > >
> >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
>



-- 
A.Nazemian


Re: How to tell when Collector finishes collect loop?

2015-06-10 Thread adfel70
I need to execute close() because the scorer is being opened in a context of
a query and caches some data in that scope - of the specific query. The way
to clear this cache, which is only relevant for that query, is to call
close(). I think this API is not so good, but I assume that the scorer's
code will not change soon...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-tell-when-Collector-finishes-collect-loop-tp4209447p4211016.html
Sent from the Solr - User mailing list archive at Nabble.com.


Adding applicative cache to SolrSearcher

2015-06-10 Thread adfel70

I am using RankQuery to implement my applicative scorer that returns a score
based on the value of specific field (lets call it 'score_field') that is
stored for every document.
The RankQuery creates a collector, and for every collected docId I retrieve
the value of score_field, calculate the score and add the doc id into
priority queue:

public class MyScorerrankQuet extends RankQuery { 
... 

@Override 
public TopDocsCollector getTopDocsCollector(int i,
SolrIndexerSearcher.QueryCommand cmd, IndexSearcher searcher) { 
... 
return new MyCollector(...) 
} 
} 

public class MyCollector  extends TopDocsCollector{ 
MyScorer scorer; 
SortedDocValues scoreFieldValues;   //Initialized in constrctor

public MyCollector(){ 
scorer = new MyScorer(); 
scorer.start(); //the scorer's API needs to call start()
before every query and close() at the end of the query
AtomicReader r =
SlowCompositeReaderWrapper.wrap(searcher.getIndexReader());
scoreFieldValues = DocValues.getSorted(r, "score_field");   
/*
THIS CALL IS TIME CONSUMING! */
} 

@Override 
public void collect(int id){ 
int docID = docBase + id;
//1. get specific field from the doc using DocValues and
calculate score using my scorer 
String value = scoreFieldValues.get(docID).utf8ToString();
scorer.calcScore(value);
//2. add docId and score (ScoreDoc object) into
PriorityQueue. 
} 
} 


I used DocValues to get the value of score_field. Currently its being
instantiate in collector's constructor - which is performance killer,
because it is being called for EVERY query, even if the index is static (no
commits). I want to make the DocValue.getStored() call only when it is
really necessary, but I dont know where to put that code. Is there a place
to plug that code so when a new searcher is being opened I can add my this
applicative cache?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-applicative-cache-to-SolrSearcher-tp4211012.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing issue - index get deleted

2015-06-10 Thread Alessandro Benedetti
Just taking a look to the code :
"

if (requestParams.containsKey("clean")) {
  clean = StrUtils.parseBool( (String) requestParams.get("clean"), true);
} else if (DataImporter.DELTA_IMPORT_CMD.equals(command) ||
DataImporter.IMPORT_CMD.equals(command)) {
  clean = false;
} else  {
  clean = debug ? false : true;
}"


Which make sense, as I would be surprised to see a delta import with a
default cleaning.

The guys was using delta import anyway, so maybe the problem is
different and not related to the clean.

But he needs definitely to give us more information .

Cheers


2015-06-10 12:11 GMT+01:00 Upayavira :

> I was only speaking about full import regarding the default of
> clean=true. However, looking at the source code, it doesn't seem to
> differentiate especially between a full and a delta in relation to the
> default of clean=true, which would be pretty crappy. However, I'd need
> to try it.
>
> Upayavira
>
> On Wed, Jun 10, 2015, at 11:57 AM, Alessandro Benedetti wrote:
> > Wow, Upaya, I didn't know that clean was default=true in the delta import
> > as well!
> > I did know it was default in the full import, but I agree with you that
> > having a default to true for delta import is very dangerous !
> >
> > But assuming the user was using the delta import so far, if cleaning
> > every
> > time, how was possible to have a coherent index ?
> >
> > Using a delta import with clean=true should produce a non consistent
> > index
> > with only a subset ( the latest modified) of the entire data set !
> >
> > Cheers
> >
> > 2015-06-10 11:46 GMT+01:00 Upayavira :
> >
> > > Note the clean= parameter to the DIH. It defaults to true. It will wipe
> > > your index before it runs. Perhaps it succeeded at wiping, but failed
> to
> > > connect to your database. Hence an empty DB?
> > >
> > > clean=true is, IMO, a very dangerous default option.
> > >
> > > Upayavira
> > >
> > > On Wed, Jun 10, 2015, at 10:59 AM, Midas A wrote:
> > > > Hi Alessandro,
> > > >
> > > > Please find the answers inline and help me out to figure out this
> > > > problem.
> > > >
> > > > 1) Solr version : *4.2.1*
> > > > 2) Solr architecture :* Master -slave/ Replication with
> requestHandler*
> > > >
> > > > 3) Kind of data source indexed : *Mysql *
> > > > 4) What happened to the datasource ? any change in there ? : *No
> change *
> > > > 5) Was the index actually deleted ? All docs deleted ? Index file
> > > > segments
> > > > deleted ? Index corrupted ? : *all docs deleted , segment files  are
> > > > there.
> > > > index file is also there .*
> > > > 6) What about system resources ?
> > > > * JVM: 30 GB*
> > > > * RAM: 48 GB*
> > > >
> > > > *CPU : 8 core*
> > > >
> > > >
> > > > On Wed, Jun 10, 2015 at 2:13 PM, Alessandro Benedetti <
> > > > benedetti.ale...@gmail.com> wrote:
> > > >
> > > > > Let me try to help you, first of all I would like to encourage
> people
> > > to
> > > > > post more information about their scenario than "This is my log,
> index
> > > > > deleted, help me" :)
> > > > >
> > > > > This kind of Info can be really useful :
> > > > >
> > > > > 1) Solr version
> > > > > 2) Solr architecture ( Solr Cloud ? Solr Cloud configuration ?
> Manual
> > > > > Sharding ? Manual Replication ? where the problem happened ? )
> > > > > 3) Kind of data source indexed
> > > > > 4) What happened to the datasource ? any change in there ?
> > > > > 5) Was the index actually deleted ? All docs deleted ? Index file
> > > segments
> > > > > deleted ? Index corrupted ?
> > > > > 6) What about system resources ?
> > > > >
> > > > > These questions are only few example one that everyone should
> always
> > > post
> > > > > along their mysterious problem !
> > > > >
> > > > > Hope this helps,
> > > > >
> > > > > Cheers
> > > > >
> > > > >
> > > > > 2015-06-10 9:15 GMT+01:00 Midas A :
> > > > >
> > > > > >
> > > > > > We are running full import and delta import crons .
> > > > > >
> > > > > > Fulll index once a day
> > > > > >
> > > > > > delta index : every 10 mins
> > > > > >
> > > > > >
> > > > > > last night my index automatically deleted(numdocs=0).
> > > > > >
> > > > > > attaching logs for review .
> > > > > >
> > > > > > please suggest to resolve the issue.
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > --
> > > > >
> > > > > Benedetti Alessandro
> > > > > Visiting card : http://about.me/alessandro_benedetti
> > > > >
> > > > > "Tyger, tyger burning bright
> > > > > In the forests of the night,
> > > > > What immortal hand or eye
> > > > > Could frame thy fearful symmetry?"
> > > > >
> > > > > William Blake - Songs of Experience -1794 England
> > > > >
> > >
> >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
>


Re: How to assign shard to specifc node?

2015-06-10 Thread Erick Erickson
Take a look at the collections API CREATE command in more detail here:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api1

Admittedly this is 5.2 but you didn't mention what version of Solr
you're using.
In particular the createNodeSet and createNodeSet.shuffle parameters.

Best,
Erick

On Wed, Jun 10, 2015 at 8:31 AM, MOIS Martin (MORPHO)
 wrote:
> Hello,
>
> I have a cluster with 3 nodes (node1, node2 and node3). Now I want to create 
> a new collection with 3 shards using `implicit` routing:
>
> 
> http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=3&router.name=implicit&shards=shard1,shard2,shard3&router.field=shard
>
> How can I control on which node each shard gets created? The goal is to 
> create shard1 on node1, shard2 on node2, etc..
>
> The background is that the actual raw data the index is created for should 
> reside on the same host. That means I have a "raw" record composed of 
> different data (documents, images, meta-data, etc.) for which I compute a 
> Lucene "document" that gets indexed. In order to reduce network traffic I 
> want to process the "raw" record on node1 and insert the resulting Lucene 
> document into shard1 that resides on node1. If shard1 would reside on node2, 
> the Lucene document would have to be send from node1 to node2 which causes 
> for big record sets a lot of inter node communication.
>
> Thanks in advance.
>
> Best Regards,
> Martin Mois
> #
> " This e-mail and any attached documents may contain confidential or 
> proprietary information. If you are not the intended recipient, you are 
> notified that any dissemination, copying of this e-mail and any attachments 
> thereto or use of their contents by any means whatsoever is strictly 
> prohibited. If you have received this e-mail in error, please advise the 
> sender immediately and delete this e-mail and all attached documents from 
> your computer system."
> #


How to assign shard to specifc node?

2015-06-10 Thread MOIS Martin (MORPHO)
Hello,

I have a cluster with 3 nodes (node1, node2 and node3). Now I want to create a 
new collection with 3 shards using `implicit` routing:


http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=3&router.name=implicit&shards=shard1,shard2,shard3&router.field=shard

How can I control on which node each shard gets created? The goal is to create 
shard1 on node1, shard2 on node2, etc..

The background is that the actual raw data the index is created for should 
reside on the same host. That means I have a "raw" record composed of different 
data (documents, images, meta-data, etc.) for which I compute a Lucene 
"document" that gets indexed. In order to reduce network traffic I want to 
process the "raw" record on node1 and insert the resulting Lucene document into 
shard1 that resides on node1. If shard1 would reside on node2, the Lucene 
document would have to be send from node1 to node2 which causes for big record 
sets a lot of inter node communication.

Thanks in advance.

Best Regards,
Martin Mois
#
" This e-mail and any attached documents may contain confidential or 
proprietary information. If you are not the intended recipient, you are 
notified that any dissemination, copying of this e-mail and any attachments 
thereto or use of their contents by any means whatsoever is strictly 
prohibited. If you have received this e-mail in error, please advise the sender 
immediately and delete this e-mail and all attached documents from your 
computer system."
#


Re: Assign rich-text document's title name from clustering results

2015-06-10 Thread Alessandro Benedetti
I agree with Upayavira,
Title extraction is an activity independent from Solr.
Furthermore I would say it's easy to extract the title before the Solr
Indexng stage.

When we send the content arrives to Solr Update processors it is already a
String.
If you want to do some clever title extraction, formatting of your original
document definitely helps and it is lost at that point.
A nice fit for Title extraction is your :
Indexing App or
Apache Tika if you would like to add a particular customisation.

Remember Apache Tika is integrated in Solr to provide Content Extraction
from rich text documents.

Cheers

2015-06-10 11:57 GMT+01:00 Upayavira :

> It depends a lot on what the documents are. Some document formats have
> metadata that stores a title. Perhaps you can just extract that.
>
> If not, once you've extracted the content, perhaps you could just have a
> special field that is the first n words (followed by an ellipsis).
>
> If you use a clustering algorithm that makes a guess at a name for a
> cluster, you will get a list of names or categories, not something that
> most people would think of as a title.
>
> This really doesn't strike me (yet) as a Solr problem. The problem is
> what info there is in these documents and how you can derive a title (or
> some form of summary?) from them.
>
> If they are all Word documents, do they start with a "Heading" style? In
> which case you could extract that. As I say, most likely this will have
> to be done outside of Solr.
>
> Upayavira
>
> On Wed, Jun 10, 2015, at 10:31 AM, Zheng Lin Edwin Yeo wrote:
> > The main objective here is actually to assign a title to the documents as
> > they are being indexed.
> >
> > We actually found that the cluster labels provides a good information on
> > the key points of the documents, but I'm not sure if we can get a good
> > cluster labels with a single documents.
> >
> > Besides getting from cluster labels, is there other methods which we can
> > use to assign a title?
> >
> >
> > Regards,
> > Edwin
> >
> >
> > On 10 June 2015 at 17:16, Alessandro Benedetti
> > 
> > wrote:
> >
> > > Hi Edwin,
> > > let's do this step by step.
> > >
> > > Clustering is problem solved by unsupervised machine learning
> algorithms.
> > > The scope of clustering is to group per similarity a corpus of
> documents,
> > > trying to have meaningful groups for a human being.
> > > Solr currently provides different approaches for *Query Time
> Clustering* (
> > > also known Online Clustering).
> > > There's an out of the box integration that allows you to use
> clustering at
> > > query time on the query results.
> > > Different algorithms can be selected, mainly provided by Carrots2 .
> > >
> > > This algorithms also provide a guess for the cluster name.
> > >
> > > Given this introduction let me see your problem.
> > >
> > > 1) The first part can be solved with a custom UpdateProcessor that will
> > > process the document and add the automatic new title.
> > > Now the problem is, how we want to extract this new title ?
> > > Honestly I can not understand how clustering can fit here …
> > >
> > > 2) Index time clustering is not yet provided in Solr ( I remember
> there was
> > > only an interface ready, but no implementation) .
> > > You should cluster the content before indexing it in Solr using a
> machine
> > > Learning library.
> > > Indexing time clustering is delicate. What will happen to the next
> re-Index
> > > ? Should we cluster everything again ?
> > > This topic must be investigated more.
> > >
> > > Anyway, let me know as the original problem maybe does not require the
> > > clustering.
> > >
> > > Cheers
> > >
> > >
> > > 2015-06-10 4:13 GMT+01:00 Zheng Lin Edwin Yeo :
> > >
> > > > Hi,
> > > >
> > > > I'm currently using Solr 5.1, and I'm thinking of ways to allow the
> > > system
> > > > to automatically give the rich-text documents that are being indexed
> a
> > > > title automatically, instead of user entering it in manually, as we
> might
> > > > have to index a whole folder of documents together, so it is not
> wise for
> > > > the user to enter the title one by one.
> > > >
> > > > I would like to check, if it's possible to run the clustering, get
> the
> > > > results, and use the top score label to be the title of the document?
> > > > Apparently, we need to run the clustering prior to the indexing, so
> I'm
> > > not
> > > > sure if that is possible.
> > > >
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > >
> > >
> > >
> > > --
> > > --
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> > >
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of t

Re: Solr date variable resolver is not working with MySql

2015-06-10 Thread Shawn Heisey
On 6/10/2015 6:43 AM, abhijit bashetti wrote:



>  >= to_date('[?, '28/05/2015 11:13:50']', 'DD/MM/ HH24:MI:SS')



> Anyone knows where is the problem? Why is the variable resolver not working 
> as expected?
> Note : to_date is function written by us in MySql.
> I have checked out the solr code from svn... trying it by adding logs to 
> it...but the logs are not reflected and i m not able to move it.
> I am not very sure very it is wrong ...but just a wild guess that something 
> is wrong at variableResolverImpl.replaceTokens or at the 
> TemplateString.fillTokens...
> I will keep on it but if you know/get a chance to look at it, it would be of 
> great help from your end..

You should not need to use "to_date" with MySQL.  Do you know whether
MySQL will accept DD/MM/ for dates on your system?  If it will, you
should be able to use this directly in your query:

doc.index_state_modification_date >= '${dih.request.lastIndexDate}'

If I were you, I would use /MM/DD for dates, just to be sure that
locale settings won't cause it to be misinterpreted.

Thanks,
Shawn



Solr date variable resolver is not working with MySql

2015-06-10 Thread abhijit bashetti

I have used Solr 3.3 version as Data Import Handler(DIH) with Oracle.Its 
working fine for me. 
Now I am trying the same with Mysql.With the change in database, I have changed 
the query used in data-config.xml for MySql.
The query has variables which are passed url in http.The same thing works fine 
in Oracle with variable resolver but not in MySql.
The query is :
SELECT DISTINCT doc.document_id,doc.first_version_id,doc.acl_id,fol.folder_id 
FROM ds_document_c doc, ds_folder folWHERE doc.cabinet_id = 
${dataimporter.request.cabinetId} AND fol.folder_id = 
doc.document_folder_id AND doc.index_state_modification_date >= 
to_date('${dataimporter.request.lastIndexDate}', 'DD/MM/ HH24:MI:SS')
and the Url is : 
localhost:8983/solr/dataimport?command=full-import&clean=true&commit=true&cabinetId=17083360&lastIndexDate='24/05/2015
 00:00:00'
Solr is building the query as below :  SELECT DISTINCT 
doc.document_id ,doc.first_version_id,doc.acl_id,fol.folder_id FROM 
ds_document_c doc,ds_folder fol WHERE doc.cabinet_id = 24AND fol.folder_id = 
doc.document_folder_id AND doc.index_state_modification_date >= to_date('[?, 
'28/05/2015 11:13:50']', 'DD/MM/ HH24:MI:SS')
I am not able to figure it our why the date variable is not resloved properly 
in this case.
Because of `to_date('[?, '28/05/2015 11:13:50']'` is not in a proper MySql 
syntax, I am getting MySql Syntax error. 
I get the following error You have an error in your SQL syntax; check the 
manual that corresponds to your MySQL server version for the right syntax to 
use near '[?, '28/05/2015 11:13:50'], 'DD/MM/ HH24:MI:SS')))' at line 1

Anyone knows where is the problem? Why is the variable resolver not working as 
expected?
Note : to_date is function written by us in MySql.
I have checked out the solr code from svn... trying it by adding logs to 
it...but the logs are not reflected and i m not able to move it.
I am not very sure very it is wrong ...but just a wild guess that something is 
wrong at variableResolverImpl.replaceTokens or at the 
TemplateString.fillTokens...
I will keep on it but if you know/get a chance to look at it, it would be of 
great help from your end..


Regards,Abhijit

Re: TZ & rounding

2015-06-10 Thread jon kerling
Thank you for your reply.

So my question is: can I get offset of time if I use NOW/MINUTE and not NOW/DAY 
rounding?

You said  " TZ affects what timezone is used when defining the concept of a 
"day" for 
the purposes of rounding by day. " I understand from this answer that query 
like I mentioned could not be treated as I want: different TZ give me different 
results.

Here you see different time zones but the results is always the same, even 
though it is not accurate to get the same response for last 5 hours in GMT time 
zone and in GMT+07 time zone.
http://localhost:8983/solr/CORE_1/select?q=*:*&fq=realDate:[NOW/MINUTE-5HOURS%20TO%20NOW/MINUTE-1MINUTES]&rows=20&NOW=143393400&facet.range=realDate&facet=true&facet.range.start=NOW/MINUTE-5HOURS&facet.range.end=NOW/MINUTE-1MINUTES&facet.range.gap=%2B1HOUR&fl=realDate&fl=f1&TZ=GMT

http://localhost:8983/solr/CORE_1/select?q=*:*&fq=realDate:[NOW/MINUTE-5HOURS%20TO%20NOW/MINUTE-1MINUTES]&rows=20&NOW=143393400&facet.range=realDate&facet=true&facet.range.start=NOW/MINUTE-5HOURS&facet.range.end=NOW/MINUTE-1MINUTES&facet.range.gap=%2B1HOUR&fl=realDate&fl=f1&TZ=America/Los_Angeles



   
  0
  8
  
 true
 
    realDate
    f1
 
 143393400
 *:*
 NOW/MINUTE-5HOURS
 realDate
 GMT
 +1HOUR
 NOW/MINUTE-1MINUTES
 realDate:[NOW/MINUTE-5HOURS TO 
NOW/MINUTE-1MINUTES]
 20
  
   
   
  
 BO1
 2015-06-10T07:00:00Z
  
   
   
  
  
  
  
 
    
   0
   1
   0
   0
   0
    
    +1HOUR
    2015-06-10T06:00:00Z
    2015-06-10T11:00:00Z
 
  
  
   


if I would use the same Query with DAY rounding I'll get different results 
since the TimeZone feature will work: 
http://localhost:8983/solr/CORE_1/select?q=*:*&fq=realDate:[NOW/DAY-5HOURS%20TO%20NOW/DAY-1MINUTES]&rows=20&NOW=143393400&facet.range=realDate&facet=true&facet.range.start=NOW/DAY-5HOURS&facet.range.end=NOW/DAY-1MINUTES&facet.range.gap=%2B1HOUR&fl=realDate&fl=f1&TZ=GMT



   
  0
  7
  
 true
 
    realDate
    f1
 
 143393400
 *:*
 NOW/DAY-5HOURS
 realDate
 GMT
 +1HOUR
 NOW/DAY-1MINUTES
 realDate:[NOW/DAY-5HOURS TO NOW/DAY-1MINUTES]
 20
  
   
   
   
  
  
  
  
 
    
   0
   0
   0
   0
   0
    
    +1HOUR
    2015-06-09T19:00:00Z
    2015-06-10T00:00:00Z
 
  
  
   


here you can see the offset:

http://localhost:8983/solr/CORE_1/select?q=*:*&fq=realDate:[NOW/DAY-5HOURS%20TO%20NOW/DAY-1MINUTES]&rows=20&NOW=143393400&facet.range=realDate&facet=true&facet.range.start=NOW/DAY-5HOURS&facet.range.end=NOW/DAY-1MINUTES&facet.range.gap=%2B1HOUR&fl=realDate&fl=f1&TZ=America/Los_Angeles



   
  0
  8
  
 true
 
    realDate
    f1
 
 143393400
 *:*
 NOW/DAY-5HOURS
 realDate
 America/Los_Angeles
 +1HOUR
 NOW/DAY-1MINUTES
 realDate:[NOW/DAY-5HOURS TO NOW/DAY-1MINUTES]
 20
  
   
   
  
 BO4
 2015-06-10T05:00:00Z
  
   
   
  
  
  
  
 
    
   0
   0
   0
   1
   0
    
    +1HOUR
    2015-06-10T02:00:00Z
    2015-06-10T07:00:00Z
 
  
  
   



Thank you,
Jon. 



 On Tuesday, June 9, 2015 7:42 PM, Chris Hostetter 
 wrote:
   

 
: So, are you saying that you are expected to store UTC dates in your
: index, but if you happen to know that a user is in a different timezone,
: you can round those dates for them according to their timezone instead
: of UTC?
: 
: That's how I'd interpret it, but useful to confirm.

Date formatting and Date Math are two completley different things.

0) All dates, in the index, are stored in UTC.

1) All dates, if expressed as Strings, must be formatted in UTC when 
provided to TrieDateField either to index or for the purposes of query 
parsing.  (if you use SolrJ to send a Date object there is 
no String representation and UTC is irelevant, likewise things like 
ParseDateFieldUpdateProcessorFactory can parse other formats and be 
configured with other timezones)

2) By default, all date math expressions are evaluated relative to the UTC 
TimeZone, but the TZ parameter can be specified to override this 
behaviour, by forcing all date based addition and rounding to be relative 
to the specified time zone.

https://cwiki.apache.org/confluence/display/solr/Working+with+Dates



-Hoss
http://www.lucidworks.com/


  

Re: Velocity UI and hyperlink

2015-06-10 Thread Erik Hatcher
In cloud mode, configurations live in ZooKeeper.

By doing the 
-Dvelocity.template.base.dir=/example/files/conf/velocity/ trick 
(or baking that into your solrconfig setup for the VelocityResponseWriter) you 
can have the templates on the file system instead though.

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com 




> On Jun 10, 2015, at 9:00 AM, Sznajder ForMailingList 
>  wrote:
> 
> Hi Erik
> 
> When running solr in simple mode on my laptop, I found the *vm files under
> under server/solr/COLLECTION_NAME/conf
> 
> however, when running on my server in cloud mode (with only one node), I do
> not find these conf/ directory under server.
> 
> Does it sit on another place?
> 
> thanks!
> 
> On Tue, Jun 9, 2015 at 3:34 AM, Erik Hatcher  wrote:
> 
>> Do note that changing the file copied under solr/server is risky, as you
>> may delete and recreate the collection and lose your changes.  If you use
>> the system property trick mentioned below, you can develop without having
>> to recreate the collection but once you do it’ll incorporate the changes.
>> 
>> —
>> Erik Hatcher, Senior Solutions Architect
>> http://www.lucidworks.com 
>> 
>> 
>> 
>> 
>>> On Jun 8, 2015, at 5:37 PM, Sznajder ForMailingList <
>> bs4mailingl...@gmail.com> wrote:
>>> 
>>> Hi
>>> 
>>> I am using 5.1
>>> 
>>> Currently, I defined a directory solr-conf/ .
>>> Under this directory, I have a velocity directory containing my different
>>> *.vm files.
>>> 
>>> When I create a collection, I am creating via
>>> bin\solr create -c COLL_NAME -d PATH_TO_SOLR_CONF
>>> 
>>> Your indication was helpful : changing the file copied under solr\server
>>> was the right way!
>>> 
>>> thanks again!
>>> 
>>> Ben
>>> 
>>> On Tue, Jun 9, 2015 at 12:25 AM, Erik Hatcher 
>>> wrote:
>>> 
 What version of Solr?   And where is the file you’re changing?
 
 With Solr 5.2, one example of what you’re trying to do is under
 example/files.  In the README we have this:
 
   bin/solr start
 
>> -Dvelocity.template.base.dir=/example/files/conf/velocity/
 
 When you create a collection it clones the configuration (in 5x; under
 server/solr/…) so if you wanted to in-place edit you’d edit those files
 rather than the original configuration which would require a collection
 re-create.
 
 With the above command-line, you can have templates anywhere you like
>> and
 edit them in place, and they override any in the configuration of the
>> Solr
 collection.
 
 See
 
>> https://cwiki.apache.org/confluence/display/solr/Response+Writers#ResponseWriters-VelocityResponseWriter
 for perhaps some more details.  If there’s any way I can make this
>> easier,
 let me know.
 
 If the above info doesn’t work or apply because you’re on a different
 version of Solr, provide more details and I’ll help from there.
 
 —
 Erik Hatcher, Senior Solutions Architect
 http://www.lucidworks.com
 
 
 
 
> On Jun 8, 2015, at 5:07 PM, Sznajder ForMailingList <
 bs4mailingl...@gmail.com> wrote:
> 
> Thanks!!
> 
> However, each time I change a *.vm file, I do not succeed to see the
 change
> on my browser until, I delete + recreate the collectoin and re-index.
> 
> Isn't there a way to immediately see the display change?
> 
> Best regards
> 
> On Mon, Jun 8, 2015 at 11:46 PM, Erik Hatcher 
> wrote:
> 
>> Benjamin -
>> 
>> The templates for VelocityResponseWriter (/browse, etc) are under
>> conf/velocity.  Find the template that generates the piece you want to
>> affect (which may be hit.vm or hit_.vm? - depends on which
>> version of Solr you’re using and which configuration you’ve started
 with to
>> be more precise) and modify it to render a hyperlink around
>> $doc.getFirstValue(“url”), maybe something like:
>> 
>> http://www.lucidworks.com
>> 
>> 
>> 
>> 
>>> On Jun 8, 2015, at 4:29 PM, Sznajder ForMailingList <
>> bs4mailingl...@gmail.com> wrote:
>>> 
>>> Hi
>>> 
>>> I would like one of the fields, I display in the results of Velocity
 UI,
>> to
>>> be a hyperlink.
>>> 
>>> In my example, I am storing a field "url" containing the link to the
>> online
>>> page of the indexed document and I would like to have this displayed
>> field
>>> a hyperlink to this page.
>>> 
>>> Could you please indicate me waht should I change to get that?
>>> 
>>> thanks!
>>> 
>>> Benjamin
>> 
>> 
 
 
>> 
>> 



Re: Velocity UI and hyperlink

2015-06-10 Thread Sznajder ForMailingList
Hi Erik

When running solr in simple mode on my laptop, I found the *vm files under
under server/solr/COLLECTION_NAME/conf

however, when running on my server in cloud mode (with only one node), I do
not find these conf/ directory under server.

Does it sit on another place?

thanks!

On Tue, Jun 9, 2015 at 3:34 AM, Erik Hatcher  wrote:

> Do note that changing the file copied under solr/server is risky, as you
> may delete and recreate the collection and lose your changes.  If you use
> the system property trick mentioned below, you can develop without having
> to recreate the collection but once you do it’ll incorporate the changes.
>
> —
> Erik Hatcher, Senior Solutions Architect
> http://www.lucidworks.com 
>
>
>
>
> > On Jun 8, 2015, at 5:37 PM, Sznajder ForMailingList <
> bs4mailingl...@gmail.com> wrote:
> >
> > Hi
> >
> > I am using 5.1
> >
> > Currently, I defined a directory solr-conf/ .
> > Under this directory, I have a velocity directory containing my different
> > *.vm files.
> >
> > When I create a collection, I am creating via
> > bin\solr create -c COLL_NAME -d PATH_TO_SOLR_CONF
> >
> > Your indication was helpful : changing the file copied under solr\server
> > was the right way!
> >
> > thanks again!
> >
> > Ben
> >
> > On Tue, Jun 9, 2015 at 12:25 AM, Erik Hatcher 
> > wrote:
> >
> >> What version of Solr?   And where is the file you’re changing?
> >>
> >> With Solr 5.2, one example of what you’re trying to do is under
> >> example/files.  In the README we have this:
> >>
> >>bin/solr start
> >>
> -Dvelocity.template.base.dir=/example/files/conf/velocity/
> >>
> >> When you create a collection it clones the configuration (in 5x; under
> >> server/solr/…) so if you wanted to in-place edit you’d edit those files
> >> rather than the original configuration which would require a collection
> >> re-create.
> >>
> >> With the above command-line, you can have templates anywhere you like
> and
> >> edit them in place, and they override any in the configuration of the
> Solr
> >> collection.
> >>
> >> See
> >>
> https://cwiki.apache.org/confluence/display/solr/Response+Writers#ResponseWriters-VelocityResponseWriter
> >> for perhaps some more details.  If there’s any way I can make this
> easier,
> >> let me know.
> >>
> >> If the above info doesn’t work or apply because you’re on a different
> >> version of Solr, provide more details and I’ll help from there.
> >>
> >> —
> >> Erik Hatcher, Senior Solutions Architect
> >> http://www.lucidworks.com
> >>
> >>
> >>
> >>
> >>> On Jun 8, 2015, at 5:07 PM, Sznajder ForMailingList <
> >> bs4mailingl...@gmail.com> wrote:
> >>>
> >>> Thanks!!
> >>>
> >>> However, each time I change a *.vm file, I do not succeed to see the
> >> change
> >>> on my browser until, I delete + recreate the collectoin and re-index.
> >>>
> >>> Isn't there a way to immediately see the display change?
> >>>
> >>> Best regards
> >>>
> >>> On Mon, Jun 8, 2015 at 11:46 PM, Erik Hatcher 
> >>> wrote:
> >>>
>  Benjamin -
> 
>  The templates for VelocityResponseWriter (/browse, etc) are under
>  conf/velocity.  Find the template that generates the piece you want to
>  affect (which may be hit.vm or hit_.vm? - depends on which
>  version of Solr you’re using and which configuration you’ve started
> >> with to
>  be more precise) and modify it to render a hyperlink around
>  $doc.getFirstValue(“url”), maybe something like:
> 
>  http://www.lucidworks.com
> 
> 
> 
> 
> > On Jun 8, 2015, at 4:29 PM, Sznajder ForMailingList <
>  bs4mailingl...@gmail.com> wrote:
> >
> > Hi
> >
> > I would like one of the fields, I display in the results of Velocity
> >> UI,
>  to
> > be a hyperlink.
> >
> > In my example, I am storing a field "url" containing the link to the
>  online
> > page of the indexed document and I would like to have this displayed
>  field
> > a hyperlink to this page.
> >
> > Could you please indicate me waht should I change to get that?
> >
> > thanks!
> >
> > Benjamin
> 
> 
> >>
> >>
>
>


Re: Solr date variable resolver is not working with MySql

2015-06-10 Thread Alexandre Rafalovitch
Some reason, you email is complete unreadable with a lot of nbsp
instead of spaces. Maybe it is trying to send as broken HTML?

You may want to try to reformat the message and resend.

Regards,
Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 10 June 2015 at 22:43, abhijit bashetti
 wrote:
>
> I have used Solr 3.3 version as Data Import Handler(DIH) with Oracle.Its 
> working fine for me. 
> Now I am trying the same with Mysql.With the change in database, I have 
> changed the query used in data-config.xml for MySql.
> The query has variables which are passed url in http.The same thing works 
> fine in Oracle with variable resolver but not in MySql.
> The query is :
>     SELECT DISTINCT doc.document_id ,      
> doc.first_version_id,      doc.acl_id,       
>                    
> fol.folder_id               
>       FROM ds_document_c doc,      
> ds_folder fol    WHERE doc.cabinet_id = 
> ${dataimporter.request.cabinetId}      AND fol.folder_id = 
> doc.document_folder_id      AND 
> doc.index_state_modification_date >= 
> to_date('${dataimporter.request.lastIndexDate}', 'DD/MM/ HH24:MI:SS')
> and the Url  is : 
>     
> localhost:8983/solr/dataimport?command=full-import&clean=true&commit=true&cabinetId=17083360&lastIndexDate='24/05/2015
>  00:00:00'
> Solr is building the query as below :  
>     SELECT DISTINCT doc.document_id ,      
> doc.first_version_id,      doc.acl_id,       
>                    
> fol.folder_id               
>       FROM ds_document_c doc,      
> ds_folder fol    WHERE doc.cabinet_id = 24    AND 
> fol.folder_id = doc.document_folder_id    AND 
> doc.index_state_modification_date >= to_date('[?, '28/05/2015 11:13:50']', 
> 'DD/MM/ HH24:MI:SS')
> I am not able to figure it our why the date variable is not resloved properly 
> in this case.
> Because of `to_date('[?, '28/05/2015 11:13:50']'` is not in a proper MySql 
> syntax, I am getting MySql Syntax error. 
> I get the following error 
>     You have an error in your SQL syntax; check the manual that 
> corresponds to your MySQL server version for the right syntax to use near 
> '[?, '28/05/2015 11:13:50'], 'DD/MM/ HH24:MI:SS')))' at line 1
>
> Anyone knows where is the problem? Why is the variable resolver not working 
> as expected?
> Note : to_date is function written by us in MySql.
> I have checked out the solr code from svn... trying it by adding logs to 
> it...but the logs are not reflected and i m not able to move it.
> I am not very sure very it is wrong ...but just a wild guess that something 
> is wrong at variableResolverImpl.replaceTokens or at the 
> TemplateString.fillTokens...
> I will keep on it but if you know/get a chance to look at it, it would be of 
> great help from your end..
> Regards,Abhijit


Solr date variable resolver is not working with MySql

2015-06-10 Thread abhijit bashetti

I have used Solr 3.3 version as Data Import Handler(DIH) with Oracle.Its 
working fine for me. 
Now I am trying the same with Mysql.With the change in database, I have changed 
the query used in data-config.xml for MySql.
The query has variables which are passed url in http.The same thing works fine 
in Oracle with variable resolver but not in MySql.
The query is :
    SELECT DISTINCT doc.document_id ,      
doc.first_version_id,      doc.acl_id,       
                   
fol.folder_id               
      FROM ds_document_c doc,      ds_folder 
fol    WHERE doc.cabinet_id = ${dataimporter.request.cabinetId}  
    AND fol.folder_id = doc.document_folder_id      
AND doc.index_state_modification_date >= 
to_date('${dataimporter.request.lastIndexDate}', 'DD/MM/ HH24:MI:SS')
and the Url  is : 
    
localhost:8983/solr/dataimport?command=full-import&clean=true&commit=true&cabinetId=17083360&lastIndexDate='24/05/2015
 00:00:00'
Solr is building the query as below :  
    SELECT DISTINCT doc.document_id ,      
doc.first_version_id,      doc.acl_id,       
                   
fol.folder_id               
      FROM ds_document_c doc,      ds_folder 
fol    WHERE doc.cabinet_id = 24    AND fol.folder_id = 
doc.document_folder_id    AND doc.index_state_modification_date >= 
to_date('[?, '28/05/2015 11:13:50']', 'DD/MM/ HH24:MI:SS')
I am not able to figure it our why the date variable is not resloved properly 
in this case.
Because of `to_date('[?, '28/05/2015 11:13:50']'` is not in a proper MySql 
syntax, I am getting MySql Syntax error. 
I get the following error 
    You have an error in your SQL syntax; check the manual that 
corresponds to your MySQL server version for the right syntax to use near '[?, 
'28/05/2015 11:13:50'], 'DD/MM/ HH24:MI:SS')))' at line 1

Anyone knows where is the problem? Why is the variable resolver not working as 
expected?
Note : to_date is function written by us in MySql.
I have checked out the solr code from svn... trying it by adding logs to 
it...but the logs are not reflected and i m not able to move it.
I am not very sure very it is wrong ...but just a wild guess that something is 
wrong at variableResolverImpl.replaceTokens or at the 
TemplateString.fillTokens...
I will keep on it but if you know/get a chance to look at it, it would be of 
great help from your end..
Regards,Abhijit

Re: Date Format Conversion Function Query

2015-06-10 Thread Upayavira
Another technology that might make more sense is a Doc Transformer.

You also specify them in the fl parameter. I would imagine you could
specify

fl=id,[persian f=gregorian_Date]

See here for more cases:

https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents

This does not exist right now, but would make a good contribution to
Solr itself, I'd say.

Upayavira

On Wed, Jun 10, 2015, at 09:57 AM, Alessandro Benedetti wrote:
> Erick will correct me if I am wrong but this function query I don't think
> it exists.
> But maybe can be a nice contribution.
> It should take in input a date format and a field and give in response
> the
> new formatted Date.
> 
> The would be simple to use it :
> 
> fl=id,persian_date:dateFormat("/mm/dd",gregorian_Date)
> 
> The date format is an example in input is an example.
> 
> Cheers
> 
> 2015-06-10 7:24 GMT+01:00 Ali Nazemian :
> 
> > Dear Erick,
> > Hi,
> > Actually I want to convert date format from Geregorian calendar (solr
> > default) to Perisan calendar. You may ask why i do not do that at client
> > side? Here is why:
> >
> > I want to provide a way to extract data from solr in the csv format. I know
> > that solr has csv ResponseWriter that could be used in this case. But my
> > problem is that the date format in solr index is provided by Geregorian
> > calendar and I want to put that in Persian calendar. Therefore I was
> > thinking of a function query to do that at query time for me.
> >
> > Regards.
> >
> > On Tue, Jun 9, 2015 at 10:55 PM, Erick Erickson 
> > wrote:
> >
> > > I'm not sure what you're asking for, give us an example input/output
> > pair?
> > >
> > > Best,
> > > Erick
> > >
> > > On Tue, Jun 9, 2015 at 8:47 AM, Ali Nazemian 
> > > wrote:
> > > > Dear all,
> > > > Hi,
> > > > I was wondering is there any function query for converting date format
> > in
> > > > Solr? If no, how can I implement such function query myself?
> > > >
> > > > --
> > > > A.Nazemian
> > >
> >
> >
> >
> > --
> > A.Nazemian
> >
> 
> 
> 
> -- 
> --
> 
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England


Re: Indexing issue - index get deleted

2015-06-10 Thread Upayavira
I was only speaking about full import regarding the default of
clean=true. However, looking at the source code, it doesn't seem to
differentiate especially between a full and a delta in relation to the
default of clean=true, which would be pretty crappy. However, I'd need
to try it.

Upayavira

On Wed, Jun 10, 2015, at 11:57 AM, Alessandro Benedetti wrote:
> Wow, Upaya, I didn't know that clean was default=true in the delta import
> as well!
> I did know it was default in the full import, but I agree with you that
> having a default to true for delta import is very dangerous !
> 
> But assuming the user was using the delta import so far, if cleaning
> every
> time, how was possible to have a coherent index ?
> 
> Using a delta import with clean=true should produce a non consistent
> index
> with only a subset ( the latest modified) of the entire data set !
> 
> Cheers
> 
> 2015-06-10 11:46 GMT+01:00 Upayavira :
> 
> > Note the clean= parameter to the DIH. It defaults to true. It will wipe
> > your index before it runs. Perhaps it succeeded at wiping, but failed to
> > connect to your database. Hence an empty DB?
> >
> > clean=true is, IMO, a very dangerous default option.
> >
> > Upayavira
> >
> > On Wed, Jun 10, 2015, at 10:59 AM, Midas A wrote:
> > > Hi Alessandro,
> > >
> > > Please find the answers inline and help me out to figure out this
> > > problem.
> > >
> > > 1) Solr version : *4.2.1*
> > > 2) Solr architecture :* Master -slave/ Replication with requestHandler*
> > >
> > > 3) Kind of data source indexed : *Mysql *
> > > 4) What happened to the datasource ? any change in there ? : *No change *
> > > 5) Was the index actually deleted ? All docs deleted ? Index file
> > > segments
> > > deleted ? Index corrupted ? : *all docs deleted , segment files  are
> > > there.
> > > index file is also there .*
> > > 6) What about system resources ?
> > > * JVM: 30 GB*
> > > * RAM: 48 GB*
> > >
> > > *CPU : 8 core*
> > >
> > >
> > > On Wed, Jun 10, 2015 at 2:13 PM, Alessandro Benedetti <
> > > benedetti.ale...@gmail.com> wrote:
> > >
> > > > Let me try to help you, first of all I would like to encourage people
> > to
> > > > post more information about their scenario than "This is my log, index
> > > > deleted, help me" :)
> > > >
> > > > This kind of Info can be really useful :
> > > >
> > > > 1) Solr version
> > > > 2) Solr architecture ( Solr Cloud ? Solr Cloud configuration ? Manual
> > > > Sharding ? Manual Replication ? where the problem happened ? )
> > > > 3) Kind of data source indexed
> > > > 4) What happened to the datasource ? any change in there ?
> > > > 5) Was the index actually deleted ? All docs deleted ? Index file
> > segments
> > > > deleted ? Index corrupted ?
> > > > 6) What about system resources ?
> > > >
> > > > These questions are only few example one that everyone should always
> > post
> > > > along their mysterious problem !
> > > >
> > > > Hope this helps,
> > > >
> > > > Cheers
> > > >
> > > >
> > > > 2015-06-10 9:15 GMT+01:00 Midas A :
> > > >
> > > > >
> > > > > We are running full import and delta import crons .
> > > > >
> > > > > Fulll index once a day
> > > > >
> > > > > delta index : every 10 mins
> > > > >
> > > > >
> > > > > last night my index automatically deleted(numdocs=0).
> > > > >
> > > > > attaching logs for review .
> > > > >
> > > > > please suggest to resolve the issue.
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > --
> > > >
> > > > Benedetti Alessandro
> > > > Visiting card : http://about.me/alessandro_benedetti
> > > >
> > > > "Tyger, tyger burning bright
> > > > In the forests of the night,
> > > > What immortal hand or eye
> > > > Could frame thy fearful symmetry?"
> > > >
> > > > William Blake - Songs of Experience -1794 England
> > > >
> >
> 
> 
> 
> -- 
> --
> 
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England


Re: Indexing issue - index get deleted

2015-06-10 Thread Alessandro Benedetti
Wow, Upaya, I didn't know that clean was default=true in the delta import
as well!
I did know it was default in the full import, but I agree with you that
having a default to true for delta import is very dangerous !

But assuming the user was using the delta import so far, if cleaning every
time, how was possible to have a coherent index ?

Using a delta import with clean=true should produce a non consistent index
with only a subset ( the latest modified) of the entire data set !

Cheers

2015-06-10 11:46 GMT+01:00 Upayavira :

> Note the clean= parameter to the DIH. It defaults to true. It will wipe
> your index before it runs. Perhaps it succeeded at wiping, but failed to
> connect to your database. Hence an empty DB?
>
> clean=true is, IMO, a very dangerous default option.
>
> Upayavira
>
> On Wed, Jun 10, 2015, at 10:59 AM, Midas A wrote:
> > Hi Alessandro,
> >
> > Please find the answers inline and help me out to figure out this
> > problem.
> >
> > 1) Solr version : *4.2.1*
> > 2) Solr architecture :* Master -slave/ Replication with requestHandler*
> >
> > 3) Kind of data source indexed : *Mysql *
> > 4) What happened to the datasource ? any change in there ? : *No change *
> > 5) Was the index actually deleted ? All docs deleted ? Index file
> > segments
> > deleted ? Index corrupted ? : *all docs deleted , segment files  are
> > there.
> > index file is also there .*
> > 6) What about system resources ?
> > * JVM: 30 GB*
> > * RAM: 48 GB*
> >
> > *CPU : 8 core*
> >
> >
> > On Wed, Jun 10, 2015 at 2:13 PM, Alessandro Benedetti <
> > benedetti.ale...@gmail.com> wrote:
> >
> > > Let me try to help you, first of all I would like to encourage people
> to
> > > post more information about their scenario than "This is my log, index
> > > deleted, help me" :)
> > >
> > > This kind of Info can be really useful :
> > >
> > > 1) Solr version
> > > 2) Solr architecture ( Solr Cloud ? Solr Cloud configuration ? Manual
> > > Sharding ? Manual Replication ? where the problem happened ? )
> > > 3) Kind of data source indexed
> > > 4) What happened to the datasource ? any change in there ?
> > > 5) Was the index actually deleted ? All docs deleted ? Index file
> segments
> > > deleted ? Index corrupted ?
> > > 6) What about system resources ?
> > >
> > > These questions are only few example one that everyone should always
> post
> > > along their mysterious problem !
> > >
> > > Hope this helps,
> > >
> > > Cheers
> > >
> > >
> > > 2015-06-10 9:15 GMT+01:00 Midas A :
> > >
> > > >
> > > > We are running full import and delta import crons .
> > > >
> > > > Fulll index once a day
> > > >
> > > > delta index : every 10 mins
> > > >
> > > >
> > > > last night my index automatically deleted(numdocs=0).
> > > >
> > > > attaching logs for review .
> > > >
> > > > please suggest to resolve the issue.
> > > >
> > > >
> > >
> > >
> > > --
> > > --
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> > >
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Assign rich-text document's title name from clustering results

2015-06-10 Thread Upayavira
It depends a lot on what the documents are. Some document formats have
metadata that stores a title. Perhaps you can just extract that.

If not, once you've extracted the content, perhaps you could just have a
special field that is the first n words (followed by an ellipsis).

If you use a clustering algorithm that makes a guess at a name for a
cluster, you will get a list of names or categories, not something that
most people would think of as a title.

This really doesn't strike me (yet) as a Solr problem. The problem is
what info there is in these documents and how you can derive a title (or
some form of summary?) from them. 

If they are all Word documents, do they start with a "Heading" style? In
which case you could extract that. As I say, most likely this will have
to be done outside of Solr.
 
Upayavira

On Wed, Jun 10, 2015, at 10:31 AM, Zheng Lin Edwin Yeo wrote:
> The main objective here is actually to assign a title to the documents as
> they are being indexed.
> 
> We actually found that the cluster labels provides a good information on
> the key points of the documents, but I'm not sure if we can get a good
> cluster labels with a single documents.
> 
> Besides getting from cluster labels, is there other methods which we can
> use to assign a title?
> 
> 
> Regards,
> Edwin
> 
> 
> On 10 June 2015 at 17:16, Alessandro Benedetti
> 
> wrote:
> 
> > Hi Edwin,
> > let's do this step by step.
> >
> > Clustering is problem solved by unsupervised machine learning algorithms.
> > The scope of clustering is to group per similarity a corpus of documents,
> > trying to have meaningful groups for a human being.
> > Solr currently provides different approaches for *Query Time Clustering* (
> > also known Online Clustering).
> > There's an out of the box integration that allows you to use clustering at
> > query time on the query results.
> > Different algorithms can be selected, mainly provided by Carrots2 .
> >
> > This algorithms also provide a guess for the cluster name.
> >
> > Given this introduction let me see your problem.
> >
> > 1) The first part can be solved with a custom UpdateProcessor that will
> > process the document and add the automatic new title.
> > Now the problem is, how we want to extract this new title ?
> > Honestly I can not understand how clustering can fit here …
> >
> > 2) Index time clustering is not yet provided in Solr ( I remember there was
> > only an interface ready, but no implementation) .
> > You should cluster the content before indexing it in Solr using a machine
> > Learning library.
> > Indexing time clustering is delicate. What will happen to the next re-Index
> > ? Should we cluster everything again ?
> > This topic must be investigated more.
> >
> > Anyway, let me know as the original problem maybe does not require the
> > clustering.
> >
> > Cheers
> >
> >
> > 2015-06-10 4:13 GMT+01:00 Zheng Lin Edwin Yeo :
> >
> > > Hi,
> > >
> > > I'm currently using Solr 5.1, and I'm thinking of ways to allow the
> > system
> > > to automatically give the rich-text documents that are being indexed a
> > > title automatically, instead of user entering it in manually, as we might
> > > have to index a whole folder of documents together, so it is not wise for
> > > the user to enter the title one by one.
> > >
> > > I would like to check, if it's possible to run the clustering, get the
> > > results, and use the top score label to be the title of the document?
> > > Apparently, we need to run the clustering prior to the indexing, so I'm
> > not
> > > sure if that is possible.
> > >
> > >
> > > Regards,
> > > Edwin
> > >
> >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >


Re: Indexing issue - index get deleted

2015-06-10 Thread Alessandro Benedetti
Let me answer in line, to get more info :

2015-06-10 10:59 GMT+01:00 Midas A :

> Hi Alessandro,
>
> Please find the answers inline and help me out to figure out this problem.
>
> 1) Solr version : *4.2.1*
> 2) Solr architecture :* Master -slave/ Replication with requestHandler*
>
>

Where happened the issue ?
Have you read this :
The SQL Entity Processor

The SqlEntityProcessor is the default processor. The associated data source

should
be a JDBC URL.

The entity attributes specific to this processor are shown in the table
below.

Attribute

Use

query

Required. The SQL query used to select rows.

deltaQuery

SQL query used if the operation is delta-import. This query selects the
primary keys of the rows which will be parts of the delta-update. The pks
will be available to the deltaImportQuery through the variable
${dataimporter.delta.}.

parentDeltaQuery

SQL query used if the operation is delta-import.

deletedPkQuery

SQL query used if the operation is delta-import.

deltaImportQuery

SQL query used if the operation is delta-import. If this is not present,
DIH tries to construct the import query by(after identifying the delta)
modifying the 'query' (this is error prone). There is a namespace
${dataimporter.delta.} which can be used in this query. For
example, select * from tbl where id=${dataimporter.delta.id}.

It is from Solr official wiki.
You should be sure you adhere to the proper configurations.

> 3) Kind of data source indexed : *Mysql *
>
what about your delta query ? that one is the responsible for the delta
indexing

> 4) What happened to the datasource ? any change in there ? : *No change *
>
Nothing relevant happened there ? any deletion or weird update to the
database ?

> 5) Was the index actually deleted ? All docs deleted ? Index file segments
> deleted ? Index corrupted ? : *all docs deleted , segment files  are there.
> index file is also there .*
>
So a deletion + commit happened, but still no merge purging the index
deleted content ?


> 6) What about system resources ?
> * JVM: 30 GB*
> * RAM: 48 GB*
>
> *CPU : 8 core*
>

eheheh not interested in your current resources, I have no indication of
the size of your data, My question was more related to check if the system
was healthy from the system resource point of view.

Cheers

>
> On Wed, Jun 10, 2015 at 2:13 PM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
>
> > Let me try to help you, first of all I would like to encourage people to
> > post more information about their scenario than "This is my log, index
> > deleted, help me" :)
> >
> > This kind of Info can be really useful :
> >
> > 1) Solr version
> > 2) Solr architecture ( Solr Cloud ? Solr Cloud configuration ? Manual
> > Sharding ? Manual Replication ? where the problem happened ? )
> > 3) Kind of data source indexed
> > 4) What happened to the datasource ? any change in there ?
> > 5) Was the index actually deleted ? All docs deleted ? Index file
> segments
> > deleted ? Index corrupted ?
> > 6) What about system resources ?
> >
> > These questions are only few example one that everyone should always post
> > along their mysterious problem !
> >
> > Hope this helps,
> >
> > Cheers
> >
> >
> > 2015-06-10 9:15 GMT+01:00 Midas A :
> >
> > >
> > > We are running full import and delta import crons .
> > >
> > > Fulll index once a day
> > >
> > > delta index : every 10 mins
> > >
> > >
> > > last night my index automatically deleted(numdocs=0).
> > >
> > > attaching logs for review .
> > >
> > > please suggest to resolve the issue.
> > >
> > >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Indexing issue - index get deleted

2015-06-10 Thread Upayavira
Note the clean= parameter to the DIH. It defaults to true. It will wipe
your index before it runs. Perhaps it succeeded at wiping, but failed to
connect to your database. Hence an empty DB?

clean=true is, IMO, a very dangerous default option.

Upayavira

On Wed, Jun 10, 2015, at 10:59 AM, Midas A wrote:
> Hi Alessandro,
> 
> Please find the answers inline and help me out to figure out this
> problem.
> 
> 1) Solr version : *4.2.1*
> 2) Solr architecture :* Master -slave/ Replication with requestHandler*
> 
> 3) Kind of data source indexed : *Mysql *
> 4) What happened to the datasource ? any change in there ? : *No change *
> 5) Was the index actually deleted ? All docs deleted ? Index file
> segments
> deleted ? Index corrupted ? : *all docs deleted , segment files  are
> there.
> index file is also there .*
> 6) What about system resources ?
> * JVM: 30 GB*
> * RAM: 48 GB*
> 
> *CPU : 8 core*
> 
> 
> On Wed, Jun 10, 2015 at 2:13 PM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
> 
> > Let me try to help you, first of all I would like to encourage people to
> > post more information about their scenario than "This is my log, index
> > deleted, help me" :)
> >
> > This kind of Info can be really useful :
> >
> > 1) Solr version
> > 2) Solr architecture ( Solr Cloud ? Solr Cloud configuration ? Manual
> > Sharding ? Manual Replication ? where the problem happened ? )
> > 3) Kind of data source indexed
> > 4) What happened to the datasource ? any change in there ?
> > 5) Was the index actually deleted ? All docs deleted ? Index file segments
> > deleted ? Index corrupted ?
> > 6) What about system resources ?
> >
> > These questions are only few example one that everyone should always post
> > along their mysterious problem !
> >
> > Hope this helps,
> >
> > Cheers
> >
> >
> > 2015-06-10 9:15 GMT+01:00 Midas A :
> >
> > >
> > > We are running full import and delta import crons .
> > >
> > > Fulll index once a day
> > >
> > > delta index : every 10 mins
> > >
> > >
> > > last night my index automatically deleted(numdocs=0).
> > >
> > > attaching logs for review .
> > >
> > > please suggest to resolve the issue.
> > >
> > >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >


Re: Indexing issue - index get deleted

2015-06-10 Thread Midas A
Hi Alessandro,

Please find the answers inline and help me out to figure out this problem.

1) Solr version : *4.2.1*
2) Solr architecture :* Master -slave/ Replication with requestHandler*

3) Kind of data source indexed : *Mysql *
4) What happened to the datasource ? any change in there ? : *No change *
5) Was the index actually deleted ? All docs deleted ? Index file segments
deleted ? Index corrupted ? : *all docs deleted , segment files  are there.
index file is also there .*
6) What about system resources ?
* JVM: 30 GB*
* RAM: 48 GB*

*CPU : 8 core*


On Wed, Jun 10, 2015 at 2:13 PM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> Let me try to help you, first of all I would like to encourage people to
> post more information about their scenario than "This is my log, index
> deleted, help me" :)
>
> This kind of Info can be really useful :
>
> 1) Solr version
> 2) Solr architecture ( Solr Cloud ? Solr Cloud configuration ? Manual
> Sharding ? Manual Replication ? where the problem happened ? )
> 3) Kind of data source indexed
> 4) What happened to the datasource ? any change in there ?
> 5) Was the index actually deleted ? All docs deleted ? Index file segments
> deleted ? Index corrupted ?
> 6) What about system resources ?
>
> These questions are only few example one that everyone should always post
> along their mysterious problem !
>
> Hope this helps,
>
> Cheers
>
>
> 2015-06-10 9:15 GMT+01:00 Midas A :
>
> >
> > We are running full import and delta import crons .
> >
> > Fulll index once a day
> >
> > delta index : every 10 mins
> >
> >
> > last night my index automatically deleted(numdocs=0).
> >
> > attaching logs for review .
> >
> > please suggest to resolve the issue.
> >
> >
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: Assign rich-text document's title name from clustering results

2015-06-10 Thread Zheng Lin Edwin Yeo
The main objective here is actually to assign a title to the documents as
they are being indexed.

We actually found that the cluster labels provides a good information on
the key points of the documents, but I'm not sure if we can get a good
cluster labels with a single documents.

Besides getting from cluster labels, is there other methods which we can
use to assign a title?


Regards,
Edwin


On 10 June 2015 at 17:16, Alessandro Benedetti 
wrote:

> Hi Edwin,
> let's do this step by step.
>
> Clustering is problem solved by unsupervised machine learning algorithms.
> The scope of clustering is to group per similarity a corpus of documents,
> trying to have meaningful groups for a human being.
> Solr currently provides different approaches for *Query Time Clustering* (
> also known Online Clustering).
> There's an out of the box integration that allows you to use clustering at
> query time on the query results.
> Different algorithms can be selected, mainly provided by Carrots2 .
>
> This algorithms also provide a guess for the cluster name.
>
> Given this introduction let me see your problem.
>
> 1) The first part can be solved with a custom UpdateProcessor that will
> process the document and add the automatic new title.
> Now the problem is, how we want to extract this new title ?
> Honestly I can not understand how clustering can fit here …
>
> 2) Index time clustering is not yet provided in Solr ( I remember there was
> only an interface ready, but no implementation) .
> You should cluster the content before indexing it in Solr using a machine
> Learning library.
> Indexing time clustering is delicate. What will happen to the next re-Index
> ? Should we cluster everything again ?
> This topic must be investigated more.
>
> Anyway, let me know as the original problem maybe does not require the
> clustering.
>
> Cheers
>
>
> 2015-06-10 4:13 GMT+01:00 Zheng Lin Edwin Yeo :
>
> > Hi,
> >
> > I'm currently using Solr 5.1, and I'm thinking of ways to allow the
> system
> > to automatically give the rich-text documents that are being indexed a
> > title automatically, instead of user entering it in manually, as we might
> > have to index a whole folder of documents together, so it is not wise for
> > the user to enter the title one by one.
> >
> > I would like to check, if it's possible to run the clustering, get the
> > results, and use the top score label to be the title of the document?
> > Apparently, we need to run the clustering prior to the indexing, so I'm
> not
> > sure if that is possible.
> >
> >
> > Regards,
> > Edwin
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: AngularJS

2015-06-10 Thread Upayavira


On Wed, Jun 10, 2015, at 05:52 AM, William Bell wrote:
> Finding DIH issue with the new AngularJS DIH section, while indexing...
> 
> 1,22613/s ?
> 
> Last Update: 22:50:50
> *Indexing since 0:1:38.204*
> Requests: 1, Fetched: 1,22613/s, Skipped: 0, Processed: 1,22613/s
> Started: 3 minutes ago

Ahh, great - real feedback! :-)

What does the old UI say at that point? Could you use "inspect element"
in your browser, and paste a few nodes around this for both the old and
the new UI?

We can, and probably should, do this in a JIRA ticket. You willing to
file one?

Many thanks!

Upayavira



Re: Indexing documents in Chinese

2015-06-10 Thread Zheng Lin Edwin Yeo
I've tried to use solr.HMMChineseTokenizerFactory with the following
configurations:


  



  


It is able to be indexed, but when I tried to search for the words, it
matches many more other words and not just the words that I search. Why is
this so?

For example, the query
http://localhost:8983/edm/collection3/highlight?q=我国

actually matches

"title":["我国1月份的制造业产值同比仅增长0"],


Regards,
Edwin



On 10 June 2015 at 14:40, Alexandre Rafalovitch  wrote:

> You may find the series of article on CJK analysis/search helpful:
> http://discovery-grindstone.blogspot.com.au/
>
> It's a little out of date, but should be a very solid intro.
>
> Regards,
>Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 10 June 2015 at 16:35, Zheng Lin Edwin Yeo 
> wrote:
> > Hi,
> >
> > I'm trying to index rich-text documents that are in chinese. Currently,
> > there's no problem with indexing, but there's problem with the searching.
> >
> > Does anyone knows what is the best Tokenizer and Filter Factory to use?
> I'm
> > now using the solr.StandardTokenizerFactory which I heard that it's not
> > very good for chinese. Is that true?
> >
> >
> > Regards,
> > Edwin
>


Re: Assign rich-text document's title name from clustering results

2015-06-10 Thread Alessandro Benedetti
Hi Edwin,
let's do this step by step.

Clustering is problem solved by unsupervised machine learning algorithms.
The scope of clustering is to group per similarity a corpus of documents,
trying to have meaningful groups for a human being.
Solr currently provides different approaches for *Query Time Clustering* (
also known Online Clustering).
There's an out of the box integration that allows you to use clustering at
query time on the query results.
Different algorithms can be selected, mainly provided by Carrots2 .

This algorithms also provide a guess for the cluster name.

Given this introduction let me see your problem.

1) The first part can be solved with a custom UpdateProcessor that will
process the document and add the automatic new title.
Now the problem is, how we want to extract this new title ?
Honestly I can not understand how clustering can fit here …

2) Index time clustering is not yet provided in Solr ( I remember there was
only an interface ready, but no implementation) .
You should cluster the content before indexing it in Solr using a machine
Learning library.
Indexing time clustering is delicate. What will happen to the next re-Index
? Should we cluster everything again ?
This topic must be investigated more.

Anyway, let me know as the original problem maybe does not require the
clustering.

Cheers


2015-06-10 4:13 GMT+01:00 Zheng Lin Edwin Yeo :

> Hi,
>
> I'm currently using Solr 5.1, and I'm thinking of ways to allow the system
> to automatically give the rich-text documents that are being indexed a
> title automatically, instead of user entering it in manually, as we might
> have to index a whole folder of documents together, so it is not wise for
> the user to enter the title one by one.
>
> I would like to check, if it's possible to run the clustering, get the
> results, and use the top score label to be the title of the document?
> Apparently, we need to run the clustering prior to the indexing, so I'm not
> sure if that is possible.
>
>
> Regards,
> Edwin
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Date Format Conversion Function Query

2015-06-10 Thread Alessandro Benedetti
Erick will correct me if I am wrong but this function query I don't think
it exists.
But maybe can be a nice contribution.
It should take in input a date format and a field and give in response the
new formatted Date.

The would be simple to use it :

fl=id,persian_date:dateFormat("/mm/dd",gregorian_Date)

The date format is an example in input is an example.

Cheers

2015-06-10 7:24 GMT+01:00 Ali Nazemian :

> Dear Erick,
> Hi,
> Actually I want to convert date format from Geregorian calendar (solr
> default) to Perisan calendar. You may ask why i do not do that at client
> side? Here is why:
>
> I want to provide a way to extract data from solr in the csv format. I know
> that solr has csv ResponseWriter that could be used in this case. But my
> problem is that the date format in solr index is provided by Geregorian
> calendar and I want to put that in Persian calendar. Therefore I was
> thinking of a function query to do that at query time for me.
>
> Regards.
>
> On Tue, Jun 9, 2015 at 10:55 PM, Erick Erickson 
> wrote:
>
> > I'm not sure what you're asking for, give us an example input/output
> pair?
> >
> > Best,
> > Erick
> >
> > On Tue, Jun 9, 2015 at 8:47 AM, Ali Nazemian 
> > wrote:
> > > Dear all,
> > > Hi,
> > > I was wondering is there any function query for converting date format
> in
> > > Solr? If no, how can I implement such function query myself?
> > >
> > > --
> > > A.Nazemian
> >
>
>
>
> --
> A.Nazemian
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Indexing issue - index get deleted

2015-06-10 Thread Alessandro Benedetti
Let me try to help you, first of all I would like to encourage people to
post more information about their scenario than "This is my log, index
deleted, help me" :)

This kind of Info can be really useful :

1) Solr version
2) Solr architecture ( Solr Cloud ? Solr Cloud configuration ? Manual
Sharding ? Manual Replication ? where the problem happened ? )
3) Kind of data source indexed
4) What happened to the datasource ? any change in there ?
5) Was the index actually deleted ? All docs deleted ? Index file segments
deleted ? Index corrupted ?
6) What about system resources ?

These questions are only few example one that everyone should always post
along their mysterious problem !

Hope this helps,

Cheers


2015-06-10 9:15 GMT+01:00 Midas A :

>
> We are running full import and delta import crons .
>
> Fulll index once a day
>
> delta index : every 10 mins
>
>
> last night my index automatically deleted(numdocs=0).
>
> attaching logs for review .
>
> please suggest to resolve the issue.
>
>


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England