Indexing getting failed after some millions of documents

2014-05-20 Thread Tim Burner
Hi Everyone,

I have installed Solr-4.6 Cloud with external Zookeeper-3.4.5 and Tomcat-7,
the configuration is as mentioned below.

Single Machine Cluster Setup with 3 shards and 2 Replica deployed on 3
Tomcats with 3 Zookeeper.

Everything is installed good and fine, I start with the index and till I
reach some millions of documents(~1.6M) the indexing stops saying "*#503
Service Unavailable" *and the Cloud Dashboard log says

*"ERROR DistributedUpdateProcessor ClusterState says we are the leader,​
but locally we don't think so"*


*"ERROR SolrCore org.apache.solr.common.SolrException: ClusterState says we
are the leader (http://host:port1/solr/recollection_shard1_replica1),​ but
locally we don't think so. Request came from
http://host:port2/solr/recollection_shard2_replica1/"*


*"ERROR ZkController Error registering
SolrCore:org.apache.solr.common.SolrException: Error getting leader from zk
for shard shard2"*

Any suggestions/advice would be appreciated!

Thanks!
Tim


Re: Vague Behavior while setting Solr Cloud

2014-05-20 Thread Tim Burner
Thanks Shawn,

I much appreciate your help I got it fixed, actually there were some
background process already running for tomcat which weren't stopped by the
time I faced these issues.

Thanks again!
Tim


On Tue, May 20, 2014 at 11:33 PM, Shawn Heisey  wrote:

> On 5/20/2014 7:10 AM, Tim Burner wrote:
> > I am trying to setup Solr Cloud referring to the blog
> > http://myjeeva.com/solrcloud-cluster-single-collection-deployment.html
> >
> > if I complete the set in one go, then it seems to be going fine.
> >
> > when the setup is complete and I am trying to restart Solr by restarted
> > Tomcat instance, it does not deploy and moreover the shards and replicas
> > are not up.
>
> You've given us nearly zero information about what the problem is.  All
> we know right now is that you restart tomcat and Solr doesn't deploy.
> See this wiki page:
>
> http://wiki.apache.org/solr/UsingMailingLists
>
> Getting specific, we'll need tomcat logs, Solr logs, versions of
> everything.  We might also need your config and schema, depending on
> what the other information reveals.
>
> Thanks,
> Shawn
>
>


Re: Solr Cloud Shards and Replica not reviving after restarting

2014-05-20 Thread Tim Burner
Thanks Erick,

I much appreciate your help I got it fixed, actually there were some
background process already running for tomcat which weren't stopped by the
time I faced these issues.

Thanks again!


On Wed, May 21, 2014 at 8:25 AM, Erick Erickson wrote:

> First thing I'd look at is the log on the server. It's possible that
> you've changed the configuration such that Solr can't start. Shot in
> the dark, but that's where I'd start looking.
>
> Best,
> Erick
>
> On Tue, May 20, 2014 at 4:45 AM, Tim Burner  wrote:
> > Hi Everyone,
> >
> > I have installed Solr Cloud 4.6.2 with external Zookeeper and Tomcat,
> > having 3 shards with 2 replica each. I tried indexing some documents
> which
> > went easy.
> >
> > After which I restarted my Tomcat, and now the Shards are not getting up,
> > its coming up with bunch of Exceptions. First exception was "*no servers
> > hosting shard:"*
> >
> > All the replica and leader are down and not responding, its even giving
> >
> > RecoveryStrategy Error while trying to recover.
> >
> core=recollection_shard1_replica1:org.apache.solr.client.solrj.SolrServerException:
> > Server refused connection at: http://192.168.2.183:9090/solr
> >
> > It would be great if you can help me out solving this issue. Expert
> advice
> > needed.
> >
> > Thanks in Advance!
>


Re: Solr performance: multiValued filed vs separate fields

2014-05-20 Thread rulinma
I think multiValue is copied multi values, index is bigger and query easy,
but performance may worse, but it depends on how to using.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-performance-multiValued-filed-vs-separate-fields-tp4136121p4137289.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr-server refresh index

2014-05-20 Thread Erick Erickson
Well, you can always make documents in Solr visible by issuing a hard
commit or waiting for your hard commit (openSeacher=true) or soft
commit interval to expire.

But as far as the Cloudera product, you'd get much better answers by
asking in Cloudera-specific forums. Here's a place to start...
https://groups.google.com/a/cloudera.org/forum/#!forum/scm-users

Problem is that Cloudera Manager (CDH) uses Solr, but Solr hasn't done
anything special to accommodate Cloudera's usage so this forum is
relatively ignorant of CDH, particularly things like hbase
integration...

Best,
Erick

On Tue, May 20, 2014 at 8:50 PM, zzz  wrote:
> Hi
>
> I am using Solr on a 4 node CDH5 cluster (1 namenode, 3 datanodes).
>
> I am running the solr-server on the namenode, and the solr-indexer on each
> of the datanodes, alongside the hbase regionservers, for NRT indexing of a
> hbase table.
>
> The basics of the indexing seem to work - when I add records via
> hbase-shell, I can view the records, however *only* after I either restart
> solr-server, or click "optimize" through the Solr Web UI.
>
> Interesting, after I add some records to hbase, the Solr Web UI displays
> the "current" status as a red stop icon. After I restart/optimize, it turns
> into a green tick, and I can search and get back the new documents.
>
> Is there a way to get solr-server to refresh its view of the index
> automatically? Or would that even be a good idea? Why doesn't the Web UI
> have a clear "refresh index" button available...the "optimize" button is
> usually not available.
>
> TIA


solr-server refresh index

2014-05-20 Thread zzz
Hi

I am using Solr on a 4 node CDH5 cluster (1 namenode, 3 datanodes).

I am running the solr-server on the namenode, and the solr-indexer on each
of the datanodes, alongside the hbase regionservers, for NRT indexing of a
hbase table.

The basics of the indexing seem to work - when I add records via
hbase-shell, I can view the records, however *only* after I either restart
solr-server, or click "optimize" through the Solr Web UI.

Interesting, after I add some records to hbase, the Solr Web UI displays
the "current" status as a red stop icon. After I restart/optimize, it turns
into a green tick, and I can search and get back the new documents.

Is there a way to get solr-server to refresh its view of the index
automatically? Or would that even be a good idea? Why doesn't the Web UI
have a clear "refresh index" button available...the "optimize" button is
usually not available.

TIA


Re: Odd interaction between {!tag..} and {!field}

2014-05-20 Thread Erick Erickson
Thanks Chris!

The query parsing stuff is something I keep stumbling over, but you
may have noticed that!

Erick

On Tue, May 20, 2014 at 10:06 AM, Chris Hostetter
 wrote:
>
> : when local params are "embedded" in a query being parsed by the
> : LuceneQParser, it applies them using the same scoping as other query
> : operators
> :
> : :   fq: "{!tag=name_name}{!field f=name}United States"
>
>
> Think of that example in the context of this one -- the basics of
> when/what/why the variuos pices are parsed are the same...
>
>fq:  "{!tag=name_name}(+{!field f=name}United text:(States))"
>
>
> -Hoss
> http://www.lucidworks.com/


Re: Solr Cloud Shards and Replica not reviving after restarting

2014-05-20 Thread Erick Erickson
First thing I'd look at is the log on the server. It's possible that
you've changed the configuration such that Solr can't start. Shot in
the dark, but that's where I'd start looking.

Best,
Erick

On Tue, May 20, 2014 at 4:45 AM, Tim Burner  wrote:
> Hi Everyone,
>
> I have installed Solr Cloud 4.6.2 with external Zookeeper and Tomcat,
> having 3 shards with 2 replica each. I tried indexing some documents which
> went easy.
>
> After which I restarted my Tomcat, and now the Shards are not getting up,
> its coming up with bunch of Exceptions. First exception was "*no servers
> hosting shard:"*
>
> All the replica and leader are down and not responding, its even giving
>
> RecoveryStrategy Error while trying to recover.
> core=recollection_shard1_replica1:org.apache.solr.client.solrj.SolrServerException:
> Server refused connection at: http://192.168.2.183:9090/solr
>
> It would be great if you can help me out solving this issue. Expert advice
> needed.
>
> Thanks in Advance!


Re: How to optimize single shard only?

2014-05-20 Thread Erick Erickson
Marcin is correct. The index size on disk will perhaps double. (triple
in compound case). The reason is so you don't lose your index if the
process is interrupted.

Consider the case where you're optimizing to one segment.
1> All the current segments are copied into the new segment
2> The new segment is flushed
3> "control files" that tell Lucene what files constitute the valid
segment(s) are written.
4> the old segments are removed.

So at any point up to <3> if the system is killed, crashes, whatever,
then the old version of the index is intact and you can keep on
working, even optimizing again.

If, on the other hand, after each segment was written to the new
segment the old segment was deleted, interrupting the process (which
may be very long) would leave your index in an inconsistent state.

FWIW,
Erick

On Tue, May 20, 2014 at 4:14 AM, Marcin Rzewucki  wrote:
> As I wrote before index is being rewritten so it grows during optimization
> and later is reduced. I guess there was OOM in your case.
>
>
>
> On 20 May 2014 12:11, YouPeng Yang  wrote:
>
>> Hi
>>   My DIH work indeed hangs, I have only four shards,each has a master and a
>> replica.Maybe jvm memory size is very low.it was 3G while the size of
>> every
>> my core is almost 16GB.
>>
>>  I also have found that the size of the master increased during the
>> optimization(you can check on the overview page of the core.).the
>> phenomenon is very werid. Is it because that the collection overall
>> optimization will comput and copy  all the docs of the whole collection.
>>
>>
>> Version Gen Size   Master (Searching)
>> 1400501330248
>>  98396
>>29.83 GB
>>  Master (Replicable)
>> 1400501330888
>>  98397
>> -
>>
>>
>>   After I have check source code,unfortunatly,it seems the optimize action
>> distrib overall the collection.you can reference the
>> SolrCmdDistributor.distribCommit.
>>
>>
>> 2014-05-20 17:27 GMT+08:00 Marcin Rzewucki :
>>
>> > Well, it should not hang if all is configured fine :) How many shards and
>> > memory you have ? Note that optimize rewrites index so you might need
>> > additional disk space for this process. Optimizing works fine however I'd
>> > like to be able to do it on a single shard as well.
>> >
>> >
>> > On 20 May 2014 11:19, YouPeng Yang  wrote:
>> >
>> > > Hi Marcin
>> > >
>> > >   Thanks to your mail,now I know why my cloud hangs when I just click
>> the
>> > > optimize button on the overview page of the shard.
>> > >
>> > >
>> > > 2014-05-20 15:25 GMT+08:00 Ahmet Arslan :
>> > >
>> > > > Hi Marcin,
>> > > >
>> > > > just a guess, pass distrib=false ?
>> > > >
>> > > >
>> > > >
>> > > > Ahmet
>> > > >
>> > > >
>> > > > On Tuesday, May 20, 2014 10:23 AM, Marcin Rzewucki <
>> > mrzewu...@gmail.com>
>> > > > wrote:
>> > > > Hi,
>> > > >
>> > > > Do you know how to optimize index on a single shard only ? I was
>> trying
>> > > to
>> > > > use "optimize=true&waitFlush=true&shard.keys=myshard" but it does not
>> > > work
>> > > > - it optimizes all shards instead of just one.
>> > > >
>> > > > Kind regards.
>> > > >
>> > > >
>> > >
>> >
>>


Re: Extensibility and code reuse: SOLR vs Lucene

2014-05-20 Thread Yonik Seeley
On Tue, May 20, 2014 at 6:01 PM, Achim Domma  wrote:
> - I found several times code snippets like " if (collector instanceof 
> DelegatingCollector) { ((DelegatingCollector)collector).finish() } ". Such 
> code is considered bad practice in every OO language I know. Do I miss 
> something here? Is there a reason why it's solved like this?

In a single code base you would be correct (we would just add a finish
method to the base Collector class).  When you are adding additional
functionality to an existing API/code base however, this is often the
only way to do it.

What type of aggregation are you looking for?  The Heliosearch project
(a Solr fork), also has this:
http://heliosearch.org/solr-facet-functions/

-Yonik
http://heliosearch.org - facet functions, subfacets, off-heap filters&fieldcache


Re: Extensibility and code reuse: SOLR vs Lucene

2014-05-20 Thread Joel Bernstein
Achim,

Solr can be extended to plugin custom analytics. The code snippet you
mention is part of the framework which enables this.

Here is how you do it:

1) Create a QParserPlugin that returns a Query that extends PostFilter.
2) Then implement the PostFilter api and return a DelegatingCollector that
collects whatever you like.
3) DelegatingCollector.finish() signals your collector that the search has
completed.
4)  You can output your analytics directly to the ResponseBuilder. You can
get a reference to the ResponseBuilder through a static call in the
SolrRequestInfo class.

In Solr 4.9 you'll be able to implement your own MergeStrategy, to merge
the results generated by DelegatingCollectors on the shards (SOLR-5973).
 The pluggable collectors in that ticket are for ranking. The PostFilter
delegating collectors are a better place for doing custom analytics.












Joel Bernstein
Search Engineer at Heliosearch


On Tue, May 20, 2014 at 6:01 PM, Achim Domma  wrote:

> Hi,
>
> I have a project, where we need to do aggregations over facetted values.
> The stats component is not powerful enough anymore and the new statistic
> component seems not to be ready yet. I understand that it's not easy to
> create a general purpose component for this task. I decided to check
> whether I can solve my use case by myself, but I'm struggling. Any
> clarification regarding the following points would be very appreciated:
>
> - I assume that some of my use cases could be solved by using a custom
> collector. Lucene seems to be build to be extensible by deriving classes
> and overriding methods. That's how I would expect SOLID code to be. But
> looking at the SOLR code, I see a lot of hard coded types and no way to
> just exchange the collector. This is the case for most of the code parts I
> have read, so I wonder: Is there another way to customize / extend SOLR?
> How is the SOLR code supposed to be reused?
>
> - I found several times code snippets like " if (collector instanceof
> DelegatingCollector) { ((DelegatingCollector)collector).finish() } ". Such
> code is considered bad practice in every OO language I know. Do I miss
> something here? Is there a reason why it's solved like this?
>
> cheers,
> Achim


Extensibility and code reuse: SOLR vs Lucene

2014-05-20 Thread Achim Domma
Hi,

I have a project, where we need to do aggregations over facetted values. The 
stats component is not powerful enough anymore and the new statistic component 
seems not to be ready yet. I understand that it's not easy to create a general 
purpose component for this task. I decided to check whether I can solve my use 
case by myself, but I'm struggling. Any clarification regarding the following 
points would be very appreciated:

- I assume that some of my use cases could be solved by using a custom 
collector. Lucene seems to be build to be extensible by deriving classes and 
overriding methods. That's how I would expect SOLID code to be. But looking at 
the SOLR code, I see a lot of hard coded types and no way to just exchange the 
collector. This is the case for most of the code parts I have read, so I 
wonder: Is there another way to customize / extend SOLR? How is the SOLR code 
supposed to be reused?

- I found several times code snippets like " if (collector instanceof 
DelegatingCollector) { ((DelegatingCollector)collector).finish() } ". Such code 
is considered bad practice in every OO language I know. Do I miss something 
here? Is there a reason why it's solved like this?

cheers,
Achim

Stemming for Chinese and Japanese

2014-05-20 Thread Geepalem
Hi,

What is the filter to be used to implement stemming for Chinese and Japanese
language field types.
For English, I have used   and its working fine.

Appreciate your help!

Thanks,
G. Naresh Kumar







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stemming-for-Chinese-and-Japanese-tp4137225.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Issue paging when sorting on a Date field

2014-05-20 Thread Shawn Heisey
On 5/19/2014 2:05 PM, Bryan Bende wrote:
> Using Solr 4.6.1 and in my schema I have a date field storing the time a
> document was added to Solr.
> 
> I have a utility program which:
> - queries for all of the documents in the previous day sorted by create date
> - pages through the results keeping track of the unique document ids
> - compare the total number of unique doc ids to the numFound to see if it
> they match
> 
> I've noticed that if I use a page size larger than the number of documents
> for the given day (aka get everything in one query), then everything works
> as expected (results sorted correctly, unique doc ids size == numFound).
> 
> However, when I use a smaller page say, say 10 rows per page, I randomly
> see cases where the last document of a page will be duplicated as the first
> document of the next page, even though the "start" and "rows" parameters
> increased correctly. So I might see something like numFound=100 but unique
> doc ids is 97, and then I see three occurrences where the last doc id on a
> page was also the first on the next page.

This *sounds* like a situation where you have a sharded index that has
the same uniqueKey value in more than one shard.  This situation will
cause Solr to behave in a way that looks completely unpredictable.

There is no way for Solr to deal with this problem in a way that would
not consume large amounts of real time, CPU time, and RAM ... so Solr
does not do anything for dealing with this problem other than removing
duplicates from the actual results returned -- which is actually how the
discrepancies occur.

If you are absolutely sure that you are not running into the duplicate
document problem I described, then I am not sure what's going on.  It
might be related to the sort, and if that's true, adding a second sort
parameter using your uniqueKey field might be a solution.

Thanks,
Shawn



Re: Issue paging when sorting on a Date field

2014-05-20 Thread Chris Hostetter

: So I think when I was paging through the results, if the query for page N
: was handled by replica1 and page N+1 handled by replica2, and the page
: boundary happened to be where the reversed rows were, this would produce
: the behavior I was seeing where the last row from the previous page was
: also the first row from the next page.

Right, this can actually happen even in a single solr node. 

when 2 docs have identical sort values, the final ordering is 
non-deterministic -- they usually come back in "index" order (the order 
that they appear in the segments on disk) but that's not garunteed.  In 
particular if you have concurrent index updates that cause segment merges 
the order of documents can change (even if those updates don't directly 
affect the docs being reutrned).

If you want to ensure that docs with equal sort values are returned in a 
consistent order across pagination (in either single or multi-node setups) 
you have to have a "tie breaker" sort of some kind -- the uniquekey can be 
useul here.



-Hoss
http://www.lucidworks.com/


Re: Vague Behavior while setting Solr Cloud

2014-05-20 Thread Shawn Heisey
On 5/20/2014 7:10 AM, Tim Burner wrote:
> I am trying to setup Solr Cloud referring to the blog
> http://myjeeva.com/solrcloud-cluster-single-collection-deployment.html
> 
> if I complete the set in one go, then it seems to be going fine.
> 
> when the setup is complete and I am trying to restart Solr by restarted
> Tomcat instance, it does not deploy and moreover the shards and replicas
> are not up.

You've given us nearly zero information about what the problem is.  All
we know right now is that you restart tomcat and Solr doesn't deploy.
See this wiki page:

http://wiki.apache.org/solr/UsingMailingLists

Getting specific, we'll need tomcat logs, Solr logs, versions of
everything.  We might also need your config and schema, depending on
what the other information reveals.

Thanks,
Shawn



Re: solr-user Digest of: get.100322

2014-05-20 Thread Shawn Heisey
On 5/20/2014 2:01 AM, Jeongseok Son wrote:
> Though it uses only small amount of memory I'm worried about memory
> usage because I have to store so many documents. (32GB RAM / total 5B
> docs, sum of docs. of all cores)

If you've only got 32GB of RAM and there are five billion docs on the
system, Solr performance will be dismal no matter what you do with
docValues.  Your index will be FAR larger than the amount of available
RAM for caching.

http://wiki.apache.org/solr/SolrPerformanceProblems#RAM

With that many documents, even if you don't use RAM-hungry features like
sorting and facets, you'll need a significant heap size, which will
further reduce the amount of RAM on the system that the OS can use to
cache the index.

For good performance, Solr *relies* on the operating system caching a
significant portion of the index.

Thanks,
Shawn



Re: Odd interaction between {!tag..} and {!field}

2014-05-20 Thread Chris Hostetter

: when local params are "embedded" in a query being parsed by the 
: LuceneQParser, it applies them using the same scoping as other query 
: operators 
: 
: :   fq: "{!tag=name_name}{!field f=name}United States"


Think of that example in the context of this one -- the basics of 
when/what/why the variuos pices are parsed are the same...

   fq:  "{!tag=name_name}(+{!field f=name}United text:(States))"


-Hoss
http://www.lucidworks.com/


Re: Odd interaction between {!tag..} and {!field}

2014-05-20 Thread Chris Hostetter

: The presence of the {!tag} entry changes the filter query generated by
: the {!field...} tag. Note below that in one case the filter query is a
: phrase query, and in the other it's parsed with one term against the
: specified field and the other against the default field.

I think you are missunderstanding the way the localparams logic works.

when localparams are at the begining of the param, they apply to the 
entire string value

when local params are "embedded" in a query being parsed by the 
LuceneQParser, it applies them using the same scoping as other query 
operators 

:   fq: "{!tag=name_name}{!field f=name}United States"

that says "parse this entire query string using the default parser,, using 
"tag=name_name" on the result.  then he LuceneQParser gets the string 
"{!field f=name}United States" and it parses "United" using the "field" 
Qparser, and "Stats" using itself.

:   fq: "{!field f=name}United States"

that says "parse this entire query string using the "field" parser.

I think what you want is...

fq: "{!field f=name tag=name_name}United States"

or more explicitly w/o shortcut...

fq: "{!tag=name_name type=field f=name}United States"


-Hoss
http://www.lucidworks.com/


Odd interaction between {!tag..} and {!field}

2014-05-20 Thread Erick Erickson
not  sure what to make of this...
The presence of the {!tag} entry changes the filter query generated by
the {!field...} tag. Note below that in one case the filter query is a
phrase query, and in the other it's parsed with one term against the
specified field and the other against the default field.

Using the example data, submitting this:

http://localhost:8983/solr/collection1/select?q=*:*&fq={!tag=name_name}{!field
f=name}United States&wt=json&indent=true&debug=query

generates this response:
{
  responseHeader:
  {
status: 0,
QTime: 10,
params:
{
  indent: "true",
  q: "*:*",
  debug: "query",
  wt: "json",
  fq: "{!tag=name_name}{!field f=name}United States"
}
  },
  response:
  {
numFound: 0,
start: 0,
docs: [ ]
  },
  debug:
  {
rawquerystring: "*:*",
querystring: "*:*",
parsedquery: "MatchAllDocsQuery(*:*)",
parsedquery_toString: "*:*",
QParser: "LuceneQParser",
filter_queries:
[
  "{!tag=name_name}{!field f=name}United States"
],
parsed_filter_queries:
[
  "name:united text:states"
]
  }
}


while this one:
http://localhost:8983/solr/collection1/select?q=*:*&fq={!field
f=name}United States&wt=json&indent=true&debug=query

gives:
{
  responseHeader:
  {
status: 0,
QTime: 3,
params:
{
  indent: "true",
  q: "*:*",
  debug: "query",
  wt: "json",
  fq: "{!field f=name}United States"
}
  },
  response:
  {
numFound: 0,
start: 0,
docs: [ ]
  },
  debug:
  {
rawquerystring: "*:*",
querystring: "*:*",
parsedquery: "MatchAllDocsQuery(*:*)",
parsedquery_toString: "*:*",
QParser: "LuceneQParser",
filter_queries:
[
  "{!field f=name}United States"
],
parsed_filter_queries:
[
  "PhraseQuery(name:"united states")"
]
  }
}

 Of course quoting "United States" works. Escaping the space does
NOT change the behavior when {!tag...} is present.

Is this worth a JIRA or am I just missing the obvious?

Erick


Re: WordDelimiterFilterFactory and StandardTokenizer

2014-05-20 Thread Diego Fernandez
Hey Ahmet, 

Yeah I had missed Shawn's response, I'll have to give that a try as well.  As 
for the version, we're using 4.4.  StandardTokenizer sets type for HANGUL, 
HIRAGANA, IDEOGRAPHIC, KATAKANA, and SOUTHEAST_ASIAN and you're right, we're 
using TypeTokenFilter to remove those.

Diego Fernandez - 爱国
Software Engineer
US GSS Supportability - Diagnostics


- Original Message -
> Hi Diego,
> 
> Did you miss Shawn's response? His ICUTokenizerFactory solution is better
> than mine.
> 
> By the way, what solr version are you using? Does StandardTokenizer set type
> attribute for CJK words?
> 
> To filter out given types, you not need a custom filter. Type Token filter
> serves exactly that purpose.
> https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-TypeTokenFilter
> 
> 
> 
> On Tuesday, May 20, 2014 5:50 PM, Diego Fernandez 
> wrote:
> Great, thanks for the information!  Right now we're using the
> StandardTokenizer types to filter out CJK characters with a custom filter.
>   I'll test using MappingCharFilters, although I'm a little concerned with
> possible adverse scenarios.
> 
> Diego Fernandez - 爱国
> Software Engineer
> US GSS Supportability - Diagnostics
> 
> 
> 
> - Original Message -
> > Hi Aiguofer,
> > 
> > You mean ClassicTokenizer? Because StandardTokenizer does not set token
> > types
> > (e-mail, url, etc).
> > 
> > 
> > I wouldn't go with the JFlex edit, mainly because maintenance costs. It
> > will
> > be a burden to maintain a custom tokenizer.
> > 
> > MappingCharFilters could be used to manipulate tokenizer behavior.
> > 
> > Just an example, if you don't want your tokenizer to break on hyphens,
> > replace it with something that your tokenizer does not break. For example
> > under score.
> > 
> > "-" => "_"
> > 
> > 
> > 
> > Plus WDF can be customized too. Please see types attribute :
> > 
> > http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/wdftypes.txt
> > 
> >  
> > Ahmet
> > 
> > 
> > On Friday, May 16, 2014 6:24 PM, aiguofer  wrote:
> > Jack Krupansky-2 wrote
> > 
> > > Typically the white space tokenizer is the best choice when the word
> > > delimiter filter will be used.
> > > 
> > > -- Jack Krupansky
> > 
> > If we wanted to keep the StandardTokenizer (because we make use of the
> > token
> > types) but wanted to use the WDFF to get combinations of words that are
> > split with certain characters (mainly - and /, but possibly others as
> > well),
> > what is the suggested way of accomplishing this? Would we just have to
> > extend the JFlex file for the tokenizer and re-compile it?
> > 
> > 
> > 
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/WordDelimiterFilterFactory-and-StandardTokenizer-tp4131628p4136146.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> > 
> >
> 


Re: WordDelimiterFilterFactory and StandardTokenizer

2014-05-20 Thread Ahmet Arslan
Hi Diego,

Did you miss Shawn's response? His ICUTokenizerFactory solution is better than 
mine. 

By the way, what solr version are you using? Does StandardTokenizer set type 
attribute for CJK words?

To filter out given types, you not need a custom filter. Type Token filter 
serves exactly that purpose.
https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-TypeTokenFilter



On Tuesday, May 20, 2014 5:50 PM, Diego Fernandez  wrote:
Great, thanks for the information!  Right now we're using the StandardTokenizer 
types to filter out CJK characters with a custom filter.  I'll test using 
MappingCharFilters, although I'm a little concerned with possible adverse 
scenarios.  

Diego Fernandez - 爱国
Software Engineer
US GSS Supportability - Diagnostics



- Original Message -
> Hi Aiguofer,
> 
> You mean ClassicTokenizer? Because StandardTokenizer does not set token types
> (e-mail, url, etc).
> 
> 
> I wouldn't go with the JFlex edit, mainly because maintenance costs. It will
> be a burden to maintain a custom tokenizer.
> 
> MappingCharFilters could be used to manipulate tokenizer behavior.
> 
> Just an example, if you don't want your tokenizer to break on hyphens,
> replace it with something that your tokenizer does not break. For example
> under score.
> 
> "-" => "_"
> 
> 
> 
> Plus WDF can be customized too. Please see types attribute :
> 
> http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/wdftypes.txt
> 
>  
> Ahmet
> 
> 
> On Friday, May 16, 2014 6:24 PM, aiguofer  wrote:
> Jack Krupansky-2 wrote
> 
> > Typically the white space tokenizer is the best choice when the word
> > delimiter filter will be used.
> > 
> > -- Jack Krupansky
> 
> If we wanted to keep the StandardTokenizer (because we make use of the token
> types) but wanted to use the WDFF to get combinations of words that are
> split with certain characters (mainly - and /, but possibly others as well),
> what is the suggested way of accomplishing this? Would we just have to
> extend the JFlex file for the tokenizer and re-compile it?
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/WordDelimiterFilterFactory-and-StandardTokenizer-tp4131628p4136146.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
>


Re: Issue paging when sorting on a Date field

2014-05-20 Thread Bryan Bende
This is using the solr.TrieDateField, it is the field type "date" from the
example schema in solr 4.6.1:


After further testing I was only able to reproduce this in a sharded &
replicated environment (numShards=3, replicationFactor=2) and I think I
have narrowed down the issue, and at this point it may be expected
behavior...

I took a query like q=create_date:[2014-05-19T00:00:00Z TO
2014-05-19T23:59:59Z]&sort=create_date DESC&start=0&rows=1 which should
get all the documents for yesterday sorted by create date, and then added
distrib=false and ran it against shard1_replica1 and shard1_replica2. Then
I diff'd the files and it showed 5 occurrences where two consecutive rows
in one replica were reversed in the other replica, and in all 5 cases the
flipped flopped rows had the exact same create_date value, which happened
to only go down to the minute.

As an example:

shard1_replica1:
...
docX, 2014-05-19T20:15:00Z
docY, 2014-05-19T20:15:00Z
...

shard1_replica2:
...
docY, 2014-05-19T20:15:00Z
docX, 2014-05-19T20:15:00Z
...

So I think when I was paging through the results, if the query for page N
was handled by replica1 and page N+1 handled by replica2, and the page
boundary happened to be where the reversed rows were, this would produce
the behavior I was seeing where the last row from the previous page was
also the first row from the next page.

I guess the obvious solution is to ensure the date field is always more
granular than minutes, or add another field to the sort order to
consistently break ties.


On Mon, May 19, 2014 at 4:19 PM, Chris Hostetter
wrote:

>
> : Using Solr 4.6.1 and in my schema I have a date field storing the time a
> : document was added to Solr.
>
> what *exactly* does your schema look like?  are you using "solr.DateField"
> or "solr.TrieDateField" ? what field options do you have specified?
>
> : I have a utility program which:
> : - queries for all of the documents in the previous day sorted by create
> date
> : - pages through the results keeping track of the unique document ids
> : - compare the total number of unique doc ids to the numFound to see if it
> : they match
>
> what *exactly* do your queries look like?  show us some examples please
> (URL & results).  Are you using distributed searching across multiple
> nodes, or a single node?  do you have concurrent updates going on during
> your test?
>
> : It is not consistent between tests, the number of occurrences changes and
> : the locations of the occurrences can change as well. The larger the
> result
> : set, and smaller the page size, the more frequent the occurrences are.
>
> if you bring up a test instance of Solr using your current configs, can
> you reproduce (even occasionally) with some synthetic data you can share
> with us?  If so please provide your full configs & sample data (ie: create
> a Jira & attach all the neccessary files i na ZIP)
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: Error initializing QueryElevationComponent

2014-05-20 Thread Geepalem
Hi,

I have changed & as "&"
Now, core is getting initialized. But document added in elevate.xml is not
coming as top result.




 
  
   



Also, why below query is not returning any results though document is
available in index?

http://localhost:8080/solr/master/select?q=_uniqueid:"sitecore://master/{450555a7-2cf7-40ec-a4ad-a67926d23c4a}?lang=en&ver=1";


Please suggest as I am struck with this..


Thanks,
G. Naresh Kumar





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-initializing-QueryElevationComponent-tp4133914p4137160.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: WordDelimiterFilterFactory and StandardTokenizer

2014-05-20 Thread Diego Fernandez
Great, thanks for the information!  Right now we're using the StandardTokenizer 
types to filter out CJK characters with a custom filter.  I'll test using 
MappingCharFilters, although I'm a little concerned with possible adverse 
scenarios.  

Diego Fernandez - 爱国
Software Engineer
US GSS Supportability - Diagnostics


- Original Message -
> Hi Aiguofer,
> 
> You mean ClassicTokenizer? Because StandardTokenizer does not set token types
> (e-mail, url, etc).
> 
> 
> I wouldn't go with the JFlex edit, mainly because maintenance costs. It will
> be a burden to maintain a custom tokenizer.
> 
> MappingCharFilters could be used to manipulate tokenizer behavior.
> 
> Just an example, if you don't want your tokenizer to break on hyphens,
> replace it with something that your tokenizer does not break. For example
> under score.
> 
> "-" => "_"
> 
> 
> 
> Plus WDF can be customized too. Please see types attribute :
> 
> http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/wdftypes.txt
> 
>  
> Ahmet
> 
> 
> On Friday, May 16, 2014 6:24 PM, aiguofer  wrote:
> Jack Krupansky-2 wrote
> 
> > Typically the white space tokenizer is the best choice when the word
> > delimiter filter will be used.
> > 
> > -- Jack Krupansky
> 
> If we wanted to keep the StandardTokenizer (because we make use of the token
> types) but wanted to use the WDFF to get combinations of words that are
> split with certain characters (mainly - and /, but possibly others as well),
> what is the suggested way of accomplishing this? Would we just have to
> extend the JFlex file for the tokenizer and re-compile it?
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/WordDelimiterFilterFactory-and-StandardTokenizer-tp4131628p4136146.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 


Autoscaling Solr instances in AWS

2014-05-20 Thread Peter Keegan
We are running Solr 4.6.1 in AWS:
- 2 Solr instances (1 shard, 1 leader, 1 replica)
- 1 CloudSolrServer SolrJ client updating the index.
- 3 Zookeepers

The Solr instances are behind a load balanceer and also in an auto scaling
group. The ScaleUpPolicy will add up to 9 additional instances (replicas),
1 per minute. Later, the 9 replicas are terminated with the ScaleDownPolicy.

Problem: during the ScaleUpPolicy, when the Solr Leader is under heavy
query load, the SolrJ indexing client issues a commit which hangs and never
returns. Note that the index schema contains 3 ExternalFileFields wich slow
down the commit process. Here's the stack trace:

Thread 1959: (state = IN_NATIVE)
 - java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[],
int, int, int) @bci=0 (Compiled frame; information may be imprecise)
 - java.net.SocketInputStream.read(byte[], int, int, int) @bci=79, line=150
(Compiled frame)
 - java.net.SocketInputStream.read(byte[], int, int) @bci=11, line=121
(Compiled frame)
 - org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer() @bci=71,
line=166 (Compiled frame)
 - org.apache.http.impl.io.SocketInputBuffer.fillBuffer() @bci=1, line=90
(Compiled frame)
 -
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(org.apache.http.util.CharArrayBuffer)
@bci=137, line=281 (Compiled frame)
 -
org.apache.http.impl.conn.LoggingSessionInputBuffer.readLine(org.apache.http.util.CharArrayBuffer)
@bci=5, line=115 (Compiled frame)
 -
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(org.apache.http.io.SessionInputBuffer)
@bci=16, line=92 (Compiled frame)
 -
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(org.apache.http.io.SessionInputBuffer)
@bci=2, line=62 (Compiled frame)
 - org.apache.http.impl.io.AbstractMessageParser.parse() @bci=38, line=254
(Compiled frame)
 -
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader()
@bci=8, line=289 (Compiled frame)
 -
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader()
@bci=1, line=252 (Compiled frame)
 -
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader()
@bci=6, line=191 (Compiled frame)
 -
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(org.apache.http.HttpRequest,
org.apache.http.HttpClientConnection, org.apache.http.protocol.HttpContext)
@bci=62, line=300 (Compiled frame)
 -
org.apache.http.protocol.HttpRequestExecutor.execute(org.apache.http.HttpRequest,
org.apache.http.HttpClientConnection, org.apache.http.protocol.HttpContext)
@bci=60, line=127 (Compiled frame)
 -
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(org.apache.http.impl.client.RoutedRequest,
org.apache.http.protocol.HttpContext) @bci=198, line=717 (Compiled frame)
 -
org.apache.http.impl.client.DefaultRequestDirector.execute(org.apache.http.HttpHost,
org.apache.http.HttpRequest, org.apache.http.protocol.HttpContext)
@bci=597, line=522 (Compiled frame)
 -
org.apache.http.impl.client.AbstractHttpClient.execute(org.apache.http.HttpHost,
org.apache.http.HttpRequest, org.apache.http.protocol.HttpContext)
@bci=344, line=906 (Compiled frame)
 -
org.apache.http.impl.client.AbstractHttpClient.execute(org.apache.http.client.methods.HttpUriRequest,
org.apache.http.protocol.HttpContext) @bci=21, line=805 (Compiled frame)
 -
org.apache.http.impl.client.AbstractHttpClient.execute(org.apache.http.client.methods.HttpUriRequest)
@bci=6, line=784 (Compiled frame)
 -
org.apache.solr.client.solrj.impl.HttpSolrServer.request(org.apache.solr.client.solrj.SolrRequest,
org.apache.solr.client.solrj.ResponseParser) @bci=1175, line=395 (Compiled
frame)
 -
org.apache.solr.client.solrj.impl.HttpSolrServer.request(org.apache.solr.client.solrj.SolrRequest)
@bci=17, line=199 (Compiled frame)
 -
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(org.apache.solr.client.solrj.impl.LBHttpSolrServer$Req)
@bci=132, line=285 (Compiled frame)
 -
org.apache.solr.client.solrj.impl.CloudSolrServer.request(org.apache.solr.client.solrj.SolrRequest)
@bci=838, line=640 (Compiled frame)
 -
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(org.apache.solr.client.solrj.SolrServer)
@bci=17, line=117 (Compiled frame)
 - org.apache.solr.client.solrj.SolrServer.commit(boolean, boolean)
@bci=16, line=168 (Interpreted frame)
 - org.apache.solr.client.solrj.SolrServer.commit() @bci=3, line=146
(Interpreted frame)

 The Solr leader log shows many connection timeout exceptions from the
other Solr replicas during this period. Some of these timeouts may have
been caused by replicas disappearing from the ScaleDownPolicy. From the
search client application's point of view, everything looked fine, but
indexing stopped until I restarted the SolrJ client.

 Does this look like a case where a timeout value needs to be increased
somewhere? If so, which one?

 Thanks,
 Peter


Vague Behavior while setting Solr Cloud

2014-05-20 Thread Tim Burner
Hi Everyone,

I am trying to setup Solr Cloud referring to the blog
http://myjeeva.com/solrcloud-cluster-single-collection-deployment.html

if I complete the set in one go, then it seems to be going fine.

when the setup is complete and I am trying to restart Solr by restarted
Tomcat instance, it does not deploy and moreover the shards and replicas
are not up.

Urgent call, let me know if you know anything!

Thanks in Advance!


Re: trigger delete on nested documents

2014-05-20 Thread Thomas Scheffler

Am 20.05.2014 14:11, schrieb Jack Krupansky:

To be clear, you cannot update a single document of a nested document
in place - you must reindex the whole block, parent and all children.
This is because this feature relies on the underlying Lucene block
join feature that requires that the documents be contiguous, and
updating a single child document would make it discontiguous with the
rest of the block of documents.

Just update the block by resending the entire block of documents.

For e previous discussion of this limitation:
http://lucene.472066.n3.nabble.com/block-join-and-atomic-updates-td4117178.html


This is totally clear to me and I want nested document to not be 
accessible without it's root context.


There is no way it seems to delete the whole block by the id of the root 
document. There is no way to update the root document that removes the 
stale date from the index. Normal SOLR behavior is to automatically 
delete old documents with same ID. I expect this behavior for other 
documents in this block to.


Anyway to make things clear I issued a JIRA request and tried to explain 
it more carefully there:


https://issues.apache.org/jira/browse/SOLR-6096

regards

Thomas


Re: trigger delete on nested documents

2014-05-20 Thread Jack Krupansky
To be clear, you cannot update a single document of a nested document in 
place - you must reindex the whole block, parent and all children. This is 
because this feature relies on the underlying Lucene block join feature that 
requires that the documents be contiguous, and updating a single child 
document would make it discontiguous with the rest of the block of 
documents.


Just update the block by resending the entire block of documents.

For e previous discussion of this limitation:
http://lucene.472066.n3.nabble.com/block-join-and-atomic-updates-td4117178.html

-- Jack Krupansky

-Original Message- 
From: Thomas Scheffler

Sent: Tuesday, May 20, 2014 4:27 AM
To: solr-user@lucene.apache.org
Subject: Re: trigger delete on nested documents

Am 19.05.2014 19:25, schrieb Mikhail Khludnev:

Thomas,

Vanilla way to override a blocks is to send it with the same unique-key (I
guess it's "id" for your case, btw don't you have unique-key defined in 
the
schema?), but it must have at least one child. It seems like analysis 
issue

to me https://issues.apache.org/jira/browse/SOLR-5211

While block is indexed the special field _root_ equal to the 
is added across the whole block (caveat, it's not stored by default). At
least you can issue

_root_:PK_VAL

to wipe the whole block.


Thank you for your insight information. It sure helps a lot in
understanding. The '_root_' field was new to me on this rather poor
documented feature of SOLR. It helps already if I perform single updates
and deletes from the index. BUT:

If I delete by a query this results in a mess:

1.) request all IDs returned by that query
2.) fire a giant delete query with "id:(id1 OR .. OR idn) _root_:(id1 OR
.. OR idn)"

Before every update of single documents I have to fire a delete request.

This turns into a mess, when updating in batch mode:
1.) remove chunk of 100 documents and nested documents (see above)
2.) index chunk of 100 documents

All information for that is available on SOLR side. Can I configure some
hook that is executed on SOLR-Server so that I do not have to change all
applications? This would at least save these extra network transfers.

After big work to migrate from plain Lucene to SOLR I really require
proper nested document support. Elastic Search seems to support it
(http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-nested-type.html)
but I am afraid of another migration. Elastic Search even hides the
nested documents at queries which seems nice, too.

Does anyone have information how nested document support evolve in
future releases of SOLR?

kind regards,

Thomas




19.05.2014 10:37 пользователь "Thomas Scheffler" <
thomas.scheff...@uni-jena.de> написал:


Hi,

I plan to use nested documents to group some of my fields


art0001
My first article
   
 art0001-foo
 Smith, John
 author
   
   
 art0001-bar
 Power, Max
 reviewer
   


This way can ask for any documents that are reviewed by Max Power. 
However

to simplify update and deletes I want to ensure that nested documents are
deleted automatically on update and delete of the parent document.
Does anyone had to deal with this problem and found a solution? 




[ANNOUNCE] Apache Solr 4.8.1 released

2014-05-20 Thread Robert Muir
May 2014, Apache Solr™ 4.8.1 available

The Lucene PMC is pleased to announce the release of Apache Solr 4.8.1

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search. Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

Solr 4.8.1 is available for immediate download at:

http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

Solr 4.8.1 includes 10 bug fixes, as well as Lucene 4.8.1 and its bug fixes.

See the CHANGES.txt file included with the release for a full list of
changes and further details.

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring
network for distributing releases. It is possible that the mirror you
are using may not have replicated the release yet. If that is the
case, please try another mirror. This also goes for Maven access.


Re: Howto Search word which contains the character "

2014-05-20 Thread Jack Krupansky
It looks like it was escaped in the query, but the word delimiter filter 
will remove it and treat it as if it were white space.


The "types" attribute for WDF can point to a file containing the types for 
various characters, so you could map a quote to ALPHA.


The doc is sketchy, but there are some examples in my e-book that shows how 
to map @ and _ to ALPHA.


-- Jack Krupansky

-Original Message- 
From: Ahmet Arslan

Sent: Tuesday, May 20, 2014 4:55 AM
To: solr-user@lucene.apache.org
Subject: Re: Howto Search word which contains the character "

Hi,

It is special query parser character, so it needs to be escaped.

http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Escaping%20Special%20Characters

Ahmet





On Tuesday, May 20, 2014 10:57 AM, heyyo  wrote:
In hebrew words could contain the character *"*
ex: דו"ח

I would like to know how to configure my schema.xml to be able to index and
search correctly those types of words.

If I search this character *"* inside solr query tool I got this debug:

/"debug": {
   "rawquerystring": "\"",
   "querystring": "\"",
   "parsedquery": "(+())/no_coord",
   "parsedquery_toString": "+()",
/

So if I understand correctly solr remove the " when the query is parsed.


I'm using this schema:


 

   
   



   
   
   

   
   
   
   
 
 
   
   
   
   
   
   
   
   
   
 








--
View this message in context: 
http://lucene.472066.n3.nabble.com/Howto-Search-word-which-contains-the-character-tp4137083.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Solr Cloud Shards and Replica not reviving after restarting

2014-05-20 Thread Tim Burner
Hi Everyone,

I have installed Solr Cloud 4.6.2 with external Zookeeper and Tomcat,
having 3 shards with 2 replica each. I tried indexing some documents which
went easy.

After which I restarted my Tomcat, and now the Shards are not getting up,
its coming up with bunch of Exceptions. First exception was "*no servers
hosting shard:"*

All the replica and leader are down and not responding, its even giving

RecoveryStrategy Error while trying to recover.
core=recollection_shard1_replica1:org.apache.solr.client.solrj.SolrServerException:
Server refused connection at: http://192.168.2.183:9090/solr

It would be great if you can help me out solving this issue. Expert advice
needed.

Thanks in Advance!


Re: How to optimize single shard only?

2014-05-20 Thread Marcin Rzewucki
As I wrote before index is being rewritten so it grows during optimization
and later is reduced. I guess there was OOM in your case.



On 20 May 2014 12:11, YouPeng Yang  wrote:

> Hi
>   My DIH work indeed hangs, I have only four shards,each has a master and a
> replica.Maybe jvm memory size is very low.it was 3G while the size of
> every
> my core is almost 16GB.
>
>  I also have found that the size of the master increased during the
> optimization(you can check on the overview page of the core.).the
> phenomenon is very werid. Is it because that the collection overall
> optimization will comput and copy  all the docs of the whole collection.
>
>
> Version Gen Size   Master (Searching)
> 1400501330248
>  98396
>29.83 GB
>  Master (Replicable)
> 1400501330888
>  98397
> -
>
>
>   After I have check source code,unfortunatly,it seems the optimize action
> distrib overall the collection.you can reference the
> SolrCmdDistributor.distribCommit.
>
>
> 2014-05-20 17:27 GMT+08:00 Marcin Rzewucki :
>
> > Well, it should not hang if all is configured fine :) How many shards and
> > memory you have ? Note that optimize rewrites index so you might need
> > additional disk space for this process. Optimizing works fine however I'd
> > like to be able to do it on a single shard as well.
> >
> >
> > On 20 May 2014 11:19, YouPeng Yang  wrote:
> >
> > > Hi Marcin
> > >
> > >   Thanks to your mail,now I know why my cloud hangs when I just click
> the
> > > optimize button on the overview page of the shard.
> > >
> > >
> > > 2014-05-20 15:25 GMT+08:00 Ahmet Arslan :
> > >
> > > > Hi Marcin,
> > > >
> > > > just a guess, pass distrib=false ?
> > > >
> > > >
> > > >
> > > > Ahmet
> > > >
> > > >
> > > > On Tuesday, May 20, 2014 10:23 AM, Marcin Rzewucki <
> > mrzewu...@gmail.com>
> > > > wrote:
> > > > Hi,
> > > >
> > > > Do you know how to optimize index on a single shard only ? I was
> trying
> > > to
> > > > use "optimize=true&waitFlush=true&shard.keys=myshard" but it does not
> > > work
> > > > - it optimizes all shards instead of just one.
> > > >
> > > > Kind regards.
> > > >
> > > >
> > >
> >
>


Re: How to optimize single shard only?

2014-05-20 Thread YouPeng Yang
Hi
  My DIH work indeed hangs, I have only four shards,each has a master and a
replica.Maybe jvm memory size is very low.it was 3G while the size of every
my core is almost 16GB.

 I also have found that the size of the master increased during the
optimization(you can check on the overview page of the core.).the
phenomenon is very werid. Is it because that the collection overall
optimization will comput and copy  all the docs of the whole collection.


Version Gen Size   Master (Searching)
1400501330248
 98396
   29.83 GB
 Master (Replicable)
1400501330888
 98397
-


  After I have check source code,unfortunatly,it seems the optimize action
distrib overall the collection.you can reference the
SolrCmdDistributor.distribCommit.


2014-05-20 17:27 GMT+08:00 Marcin Rzewucki :

> Well, it should not hang if all is configured fine :) How many shards and
> memory you have ? Note that optimize rewrites index so you might need
> additional disk space for this process. Optimizing works fine however I'd
> like to be able to do it on a single shard as well.
>
>
> On 20 May 2014 11:19, YouPeng Yang  wrote:
>
> > Hi Marcin
> >
> >   Thanks to your mail,now I know why my cloud hangs when I just click the
> > optimize button on the overview page of the shard.
> >
> >
> > 2014-05-20 15:25 GMT+08:00 Ahmet Arslan :
> >
> > > Hi Marcin,
> > >
> > > just a guess, pass distrib=false ?
> > >
> > >
> > >
> > > Ahmet
> > >
> > >
> > > On Tuesday, May 20, 2014 10:23 AM, Marcin Rzewucki <
> mrzewu...@gmail.com>
> > > wrote:
> > > Hi,
> > >
> > > Do you know how to optimize index on a single shard only ? I was trying
> > to
> > > use "optimize=true&waitFlush=true&shard.keys=myshard" but it does not
> > work
> > > - it optimizes all shards instead of just one.
> > >
> > > Kind regards.
> > >
> > >
> >
>


Re: How to optimize single shard only?

2014-05-20 Thread Marcin Rzewucki
Well, it should not hang if all is configured fine :) How many shards and
memory you have ? Note that optimize rewrites index so you might need
additional disk space for this process. Optimizing works fine however I'd
like to be able to do it on a single shard as well.


On 20 May 2014 11:19, YouPeng Yang  wrote:

> Hi Marcin
>
>   Thanks to your mail,now I know why my cloud hangs when I just click the
> optimize button on the overview page of the shard.
>
>
> 2014-05-20 15:25 GMT+08:00 Ahmet Arslan :
>
> > Hi Marcin,
> >
> > just a guess, pass distrib=false ?
> >
> >
> >
> > Ahmet
> >
> >
> > On Tuesday, May 20, 2014 10:23 AM, Marcin Rzewucki 
> > wrote:
> > Hi,
> >
> > Do you know how to optimize index on a single shard only ? I was trying
> to
> > use "optimize=true&waitFlush=true&shard.keys=myshard" but it does not
> work
> > - it optimizes all shards instead of just one.
> >
> > Kind regards.
> >
> >
>


Re: How to optimize single shard only?

2014-05-20 Thread YouPeng Yang
Hi
 Maybe you can try _router_=myshard? I will check the source code ,note you
later.


2014-05-20 17:19 GMT+08:00 YouPeng Yang :

> Hi Marcin
>
>   Thanks to your mail,now I know why my cloud hangs when I just click the
> optimize button on the overview page of the shard.
>
>
> 2014-05-20 15:25 GMT+08:00 Ahmet Arslan :
>
> Hi Marcin,
>>
>> just a guess, pass distrib=false ?
>>
>>
>>
>> Ahmet
>>
>>
>> On Tuesday, May 20, 2014 10:23 AM, Marcin Rzewucki 
>> wrote:
>> Hi,
>>
>> Do you know how to optimize index on a single shard only ? I was trying to
>> use "optimize=true&waitFlush=true&shard.keys=myshard" but it does not work
>> - it optimizes all shards instead of just one.
>>
>> Kind regards.
>>
>>
>


Re: How to optimize single shard only?

2014-05-20 Thread YouPeng Yang
Hi Marcin

  Thanks to your mail,now I know why my cloud hangs when I just click the
optimize button on the overview page of the shard.


2014-05-20 15:25 GMT+08:00 Ahmet Arslan :

> Hi Marcin,
>
> just a guess, pass distrib=false ?
>
>
>
> Ahmet
>
>
> On Tuesday, May 20, 2014 10:23 AM, Marcin Rzewucki 
> wrote:
> Hi,
>
> Do you know how to optimize index on a single shard only ? I was trying to
> use "optimize=true&waitFlush=true&shard.keys=myshard" but it does not work
> - it optimizes all shards instead of just one.
>
> Kind regards.
>
>


Re: Howto Search word which contains the character "

2014-05-20 Thread Ahmet Arslan
Hi,

It is special query parser character, so it needs to be escaped. 

http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Escaping%20Special%20Characters

Ahmet





On Tuesday, May 20, 2014 10:57 AM, heyyo  wrote:
In hebrew words could contain the character *"*
ex: דו"ח

I would like to know how to configure my schema.xml to be able to index and
search correctly those types of words.

If I search this character *"* inside solr query tool I got this debug:

/"debug": {
    "rawquerystring": "\"",
    "querystring": "\"",
    "parsedquery": "(+())/no_coord",
    "parsedquery_toString": "+()",
/

So if I understand correctly solr remove the " when the query is parsed.


I'm using this schema:


      

        
        
        
        
        
        
        
        

        
        
        
        
      
      
        
        
        
        
        
        
        
        
        
      








--
View this message in context: 
http://lucene.472066.n3.nabble.com/Howto-Search-word-which-contains-the-character-tp4137083.html
Sent from the Solr - User mailing list archive at Nabble.com.


the whole web instance hangs when optimize one core.

2014-05-20 Thread YouPeng Yang
Hi.

   I am using solr4.6, in one my core it contains 50 million docs,and I am
just click the optimized button on the overview page of the core,and the
whole web instance hangs,one phenomenon is the DIH on another core hanged.

  Is it a known problem or something wrong with my env?


Regards


Re: trigger delete on nested documents

2014-05-20 Thread Thomas Scheffler

Am 19.05.2014 19:25, schrieb Mikhail Khludnev:

Thomas,

Vanilla way to override a blocks is to send it with the same unique-key (I
guess it's "id" for your case, btw don't you have unique-key defined in the
schema?), but it must have at least one child. It seems like analysis issue
to me https://issues.apache.org/jira/browse/SOLR-5211

While block is indexed the special field _root_ equal to the 
is added across the whole block (caveat, it's not stored by default). At
least you can issue

_root_:PK_VAL

to wipe the whole block.


Thank you for your insight information. It sure helps a lot in 
understanding. The '_root_' field was new to me on this rather poor 
documented feature of SOLR. It helps already if I perform single updates 
and deletes from the index. BUT:


If I delete by a query this results in a mess:

1.) request all IDs returned by that query
2.) fire a giant delete query with "id:(id1 OR .. OR idn) _root_:(id1 OR 
.. OR idn)"


Before every update of single documents I have to fire a delete request.

This turns into a mess, when updating in batch mode:
1.) remove chunk of 100 documents and nested documents (see above)
2.) index chunk of 100 documents

All information for that is available on SOLR side. Can I configure some 
hook that is executed on SOLR-Server so that I do not have to change all 
applications? This would at least save these extra network transfers.


After big work to migrate from plain Lucene to SOLR I really require 
proper nested document support. Elastic Search seems to support it 
(http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-nested-type.html) 
but I am afraid of another migration. Elastic Search even hides the 
nested documents at queries which seems nice, too.


Does anyone have information how nested document support evolve in 
future releases of SOLR?


kind regards,

Thomas




19.05.2014 10:37 пользователь "Thomas Scheffler" <
thomas.scheff...@uni-jena.de> написал:


Hi,

I plan to use nested documents to group some of my fields


art0001
My first article
   
 art0001-foo
 Smith, John
 author
   
   
 art0001-bar
 Power, Max
 reviewer
   


This way can ask for any documents that are reviewed by Max Power. However
to simplify update and deletes I want to ensure that nested documents are
deleted automatically on update and delete of the parent document.
Does anyone had to deal with this problem and found a solution?


Re: solr-user Digest of: get.100322

2014-05-20 Thread Jeongseok Son
Thank you for your reply! I also found docValues after sending an
email and your suggestion seems the best solution for me.

Now I'm configuring schema.xml to use docValues and have a question
about docValuesFormat.

According to this thread(
http://lucene.472066.n3.nabble.com/Trade-offs-in-choosing-DocValuesFormat-td4114758.html
),

Solr 4.6 only holds some hash structures in memory space with the
default docValuesFormat configuration.

Though it uses only small amount of memory I'm worried about memory
usage because I have to store so many documents. (32GB RAM / total 5B
docs, sum of docs. of all cores)

Which docValuesFormat is more appropriate in my case? (Default or
Disk?) Can I change it later without re-indexing?

On Sat, May 17, 2014 at 9:45 PM,   wrote:
>
> solr-user Digest of: get.100322
>
> Topics (messages 100322 through 100322)
>
> Re: Sorting problem in Solr due to Lucene Field Cache
> 100322 by: Joel Bernstein
>
> Administrivia:
>
>
> --- Administrative commands for the solr-user list ---
>
> I can handle administrative requests automatically. Please
> do not send them to the list address! Instead, send
> your message to the correct command address:
>
> To subscribe to the list, send a message to:
>
>
> To remove your address from the list, send a message to:
>
>
> Send mail to the following for info and FAQ for this list:
>
>
>
> Similar addresses exist for the digest list:
>
>
>
> To get messages 123 through 145 (a maximum of 100 per request), mail:
>
>
> To get an index with subject and author for messages 123-456 , mail:
>
>
> They are always returned as sets of 100, max 2000 per request,
> so you'll actually get 100-499.
>
> To receive all messages with the same subject as message 12345,
> send a short message to:
>
>
> The messages should contain one line or word of text to avoid being
> treated as sp@m, but I will ignore their content.
> Only the ADDRESS you send to is important.
>
> You can start a subscription for an alternate address,
> for example "john@host.domain", just add a hyphen and your
> address (with '=' instead of '@') after the command word:
> 
>
> To stop subscription for this address, mail:
> 
>
> In both cases, I'll send a confirmation message to that address. When
> you receive it, simply reply to it to complete your subscription.
>
> If despite following these instructions, you do not get the
> desired results, please contact my owner at
> solr-user-ow...@lucene.apache.org. Please be patient, my owner is a
> lot slower than I am ;-)
>
> --- Enclosed is a copy of the request I received.
>
> Return-Path: 
> Received: (qmail 64267 invoked by uid 99); 17 May 2014 12:22:20 -
> Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136)
> by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 17 May 2014 12:22:20 +
> X-ASF-Spam-Status: No, hits=-0.7 required=5.0
> tests=RCVD_IN_DNSWL_LOW,SPF_PASS
> X-Spam-Check-By: apache.org
> Received-SPF: pass (athena.apache.org: domain of invictu...@gmail.com 
> designates 209.85.128.193 as permitted sender)
> Received: from [209.85.128.193] (HELO mail-ve0-f193.google.com) 
> (209.85.128.193)
> by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 17 May 2014 12:22:14 +
> Received: by mail-ve0-f193.google.com with SMTP id sa20so1075564veb.8
> for ; Sat, 17 May 2014 
> 05:21:54 -0700 (PDT)
> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
> d=gmail.com; s=20120113;
> h=mime-version:date:message-id:subject:from:to:content-type;
> bh=QzTOKgbCPT36kZdZcCT/uV4aRZ2PlQ3OgQFPLH0SCoc=;
> b=yygC07cHEwmRg6rS0bHxGg5AaqtPRdsozFD6eO8ssVVC+YsfT32ZWUDDk9s7/2Z91Q
>  aCwFsbb7Thla9nkKbtMctqonOacly29Tsple/lzQX5qOQyAFdzOsQHpim+9jB+W0B1Ac
>  ZEDLqPzdMG8ZszKDa8lJ8yRadUtlb83HgB56PulZLh1XQG+WOMAuC8pBQ2zS8c/0lsib
>  JVehSX/OdqU+6HAhPYcIm6pLNWP4lYPwjTAp66Bms9j2/Y5ROwZ6azwCgGIe2hsk06q6
>  5BSKtoTXAfGweIvTQHEfvp6KgLEhIpgjlgo/s5r0NzNaaRM9zdkhp+qYOWM8nWuT8RAu
>  ytng==
> MIME-Version: 1.0
> X-Received: by 10.220.95.204 with SMTP id e12mr2401964vcn.37.1400329314139;
>  Sat, 17 May 2014 05:21:54 -0700 (PDT)
> Received: by 10.52.10.137 with HTTP; Sat, 17 May 2014 05:21:54 -0700 (PDT)
> Date: Sat, 17 May 2014 21:21:54 +0900
> Message-ID: 
> 
> Subject: Give me this mail
> From: Jeongseok Son 
> To: solr-user-get.100...@lucene.apache.org
> Content-Type: text/plain; charset=UTF-8
> X-Virus-Checked: Checked by ClamAV on apache.org
>
>
> --
>
>
>
> -- Forwarded message --
> From: Joel Bernstein 
> To: solr-user@lucene.apache.org
> Cc:
> Date: Fri, 16 May 2014 17:49:51 -0400
> Subject: Re: Sorting problem in Solr due to Lucene Field Cache
> Take a look at Solr's use of DocValues:
> https://cwiki.apache.org/confluence/display/solr/DocValues.
>
> There are docValues options that use less memory then the FieldCache.
>
> Joel Bernstein
> Sear

Howto Search word which contains the character "

2014-05-20 Thread heyyo
In hebrew words could contain the character *"*
ex: דו"ח

I would like to know how to configure my schema.xml to be able to index and
search correctly those types of words.

If I search this character *"* inside solr query tool I got this debug:

/"debug": {
"rawquerystring": "\"",
"querystring": "\"",
"parsedquery": "(+())/no_coord",
"parsedquery_toString": "+()",
/

So if I understand correctly solr remove the " when the query is parsed.


I'm using this schema:


  














  
  









  








--
View this message in context: 
http://lucene.472066.n3.nabble.com/Howto-Search-word-which-contains-the-character-tp4137083.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to optimize single shard only?

2014-05-20 Thread Ahmet Arslan
Hi Marcin,

just a guess, pass distrib=false ?



Ahmet


On Tuesday, May 20, 2014 10:23 AM, Marcin Rzewucki  wrote:
Hi,

Do you know how to optimize index on a single shard only ? I was trying to
use "optimize=true&waitFlush=true&shard.keys=myshard" but it does not work
- it optimizes all shards instead of just one.

Kind regards.



How to optimize single shard only?

2014-05-20 Thread Marcin Rzewucki
Hi,

Do you know how to optimize index on a single shard only ? I was trying to
use "optimize=true&waitFlush=true&shard.keys=myshard" but it does not work
- it optimizes all shards instead of just one.

Kind regards.