A tool to quickly browse Solr documents ?

2017-01-23 Thread Fengtan
Hi All,

I am looking for a tool to quickly browse/investigate documents indexed in
a Solr core.

The default web admin interface already offers this, but you need to know
the Solr query syntax if you want to list/filter/sort documents.

I have started to build my own tool (https://github.com/fengtan/sophie) but
I don't want to reinvent the wheel -- does anyone know if something similar
already exists ?

Thanks


Re: Latest advice on G1 collector?

2017-01-23 Thread Walter Underwood
I’m running a two hour benchmark using production log replay right now.

The CMS hosts have between 3% and 7% GC overhead (the portion of their CPU time 
spent in GC).

The G1 host has around 1% GC overhead.

I accidentally started one host with the throughput collector. That has 02% 
overhead and has a clearly faster response time. The other hosts are averaging 
from 460 to 660 ms. The host with the throughput collector is at 390 ms. The 
95th percentile is also faster. And it is using less CPU.

Isn’t GC tuning fun?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jan 23, 2017, at 2:49 PM, Pushkar Raste  wrote:
> 
> Hi Walter,
> We have been using G1GC for more than a year now and are very happy with
> it.
> 
> The only flag we have enabled is 'ParallelRefProcEnabled'
> 
> On Jan 23, 2017 3:00 PM, "Walter Underwood"  wrote:
> 
>> We have a workload with very long queries, and that can drive the CMS
>> collector into using about 20% of the CPU time. So I’m ready to try G1 on a
>> couple of replicas and see what happens. I’ve already upgraded to Java 8
>> update 121.
>> 
>> I’ve read these pages:
>> 
>> https://wiki.apache.org/solr/ShawnHeisey#G1_.28Garbage_First.29_Collector
>> >> 
>> https://gist.github.com/rockagen/e6d28244e1d540c05144370d6a64ba66 <
>> https://gist.github.com/rockagen/e6d28244e1d540c05144370d6a64ba66>
>> 
>> Any updates on recommended settings?
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>> 



Re: Latest advice on G1 collector?

2017-01-23 Thread Pushkar Raste
Hi Walter,
We have been using G1GC for more than a year now and are very happy with
it.

The only flag we have enabled is 'ParallelRefProcEnabled'

On Jan 23, 2017 3:00 PM, "Walter Underwood"  wrote:

> We have a workload with very long queries, and that can drive the CMS
> collector into using about 20% of the CPU time. So I’m ready to try G1 on a
> couple of replicas and see what happens. I’ve already upgraded to Java 8
> update 121.
>
> I’ve read these pages:
>
> https://wiki.apache.org/solr/ShawnHeisey#G1_.28Garbage_First.29_Collector
>  >
> https://gist.github.com/rockagen/e6d28244e1d540c05144370d6a64ba66 <
> https://gist.github.com/rockagen/e6d28244e1d540c05144370d6a64ba66>
>
> Any updates on recommended settings?
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>


Re: Information on classifier based key word suggestion

2017-01-23 Thread shamik
Anyone ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Information-on-classifier-based-key-word-suggestion-tp4314942p4315492.html
Sent from the Solr - User mailing list archive at Nabble.com.


Latest advice on G1 collector?

2017-01-23 Thread Walter Underwood
We have a workload with very long queries, and that can drive the CMS collector 
into using about 20% of the CPU time. So I’m ready to try G1 on a couple of 
replicas and see what happens. I’ve already upgraded to Java 8 update 121.

I’ve read these pages:

https://wiki.apache.org/solr/ShawnHeisey#G1_.28Garbage_First.29_Collector 

https://gist.github.com/rockagen/e6d28244e1d540c05144370d6a64ba66 


Any updates on recommended settings?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)




Re: NPE when using timeAllowed in the /export handler

2017-01-23 Thread Joel Bernstein
I'd have to put some thought into this. The problem with timeAllowed is
that it won't return all the results. So if you're using timeAllowed and
performing a join or aggregation it will just give incorrect answers. I'm
not we want to have that.

Joel Bernstein
http://joelsolr.blogspot.com/

On Sat, Jan 21, 2017 at 9:51 PM, radha krishnan 
wrote:

> can you give some estimate on when they will be compatible . with out this,
> we cannot use timeAllowed with the map reduce mode of /sql handler right
>
> Thanks,
> Radhakrishnan D
>
>
> On Sat, Jan 21, 2017 at 6:03 PM, Joel Bernstein 
> wrote:
>
> > I'm pretty sure that time allowed and the /export handler are not
> currently
> > compatible.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Fri, Jan 20, 2017 at 8:57 PM, radha krishnan <
> > dradhakrishna...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > am trying to query a core with 60 million docs by specifying
> timeAllowed
> > as
> > > 100 ms just to test the timeAllowed feature.
> > >
> > > This is the query
> > >
> > > http://10.20.132.162:8983/solr/large_core/export?indent=
> > > on&q=*:*&distrib=false&fl=logtype&timeAllowed=100&sort=
> > > logtype+asc&wt=json&version=2.2
> > >
> > > when i ran in the browser, i got the below NPE
> > >
> > > the /export query  has 17 million docs has hits but there was an NPE
> > after
> > > the /export was called.
> > >
> > > Can you tell if any thing is wrong in the query or is there is a known
> > bug
> > > for the NPE
> > >
> > > HTTP ERROR 500
> > >
> > > Problem accessing /solr/logs_core_new/export. Reason:
> > >
> > > {trace=java.lang.NullPointerException
> > > at org.apache.lucene.util.BitSetIterator.(
> > > BitSetIterator.java:61)
> > > at org.apache.solr.response.SortingResponseWriter.write(
> > > SortingResponseWriter.java:176)
> > > at org.apache.solr.response.QueryResponseWriterUtil.
> > > writeQueryResponse(QueryResponseWriterUtil.java:65)
> > > at org.apache.solr.servlet.HttpSolrCall.writeResponse(
> > > HttpSolrCall.java:728)
> > > at org.apache.solr.servlet.HttpSolrCall.call(
> > > HttpSolrCall.java:469)
> > > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > > SolrDispatchFilter.java:303)
> > > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > > SolrDispatchFilter.java:254)
> > > at org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> > > doFilter(ServletHandler.java:1668)
> > > at org.eclipse.jetty.servlet.ServletHandler.doHandle(
> > > ServletHandler.java:581)
> > > at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > > ScopedHandler.java:143)
> > > at org.eclipse.jetty.security.SecurityHandler.handle(
> > > SecurityHandler.java:548)
> > > at org.eclipse.jetty.server.session.SessionHandler.
> > > doHandle(SessionHandler.java:226)
> > > at org.eclipse.jetty.server.handler.ContextHandler.
> > > doHandle(ContextHandler.java:1160)
> > > at org.eclipse.jetty.servlet.ServletHandler.doScope(
> > > ServletHandler.java:511)
> > > at org.eclipse.jetty.server.session.SessionHandler.
> > > doScope(SessionHandler.java:185)
> > > at org.eclipse.jetty.server.handler.ContextHandler.
> > > doScope(ContextHandler.java:1092)
> > > at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > > ScopedHandler.java:141)
> > > at org.eclipse.jetty.server.handler.ContextHandlerCollection.
> > > handle(ContextHandlerCollection.java:213)
> > > at org.eclipse.jetty.server.handler.HandlerCollection.
> > > handle(HandlerCollection.java:119)
> > > at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> > > HandlerWrapper.java:134)
> > > at org.eclipse.jetty.server.Server.handle(Server.java:518)
> > > at org.eclipse.jetty.server.HttpChannel.handle(
> > > HttpChannel.java:308)
> > > at org.eclipse.jetty.server.HttpConnection.onFillable(
> > > HttpConnection.java:244)
> > > at org.eclipse.jetty.io.AbstractConnection$
> > ReadCallback.succeeded(
> > > AbstractConnection.java:273)
> > > at org.eclipse.jetty.io.FillInterest.fillable(
> > > FillInterest.java:95)
> > > at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> > > SelectChannelEndPoint.java:93)
> > > at org.eclipse.jetty.util.thread.strategy.
> ExecuteProduceConsume.
> > > produceAndRun(ExecuteProduceConsume.java:246)
> > > at org.eclipse.jetty.util.thread.strategy.
> > > ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
> > > at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> > > QueuedThreadPool.java:654)
> > > at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(
> > > QueuedThreadPool.java:572)
> > > at java.lang.Thread.run(Thread.java:745)
> > > ,code=500}
> > >
> > >
> > >
> > > RELATED SERVER LOGS
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > 2017-01-21 01:55:12.257 WARN  (qtp1397616

Re: OuterHashJoin doesn't return values

2017-01-23 Thread Joel Bernstein
When you work with relational algebra operations you'll need to specify the
/export handler in the search expressions so that all of the tuples are
operated on by the join.

search(ParentDocuments, q=DocId:1042, fl="Id,DocId,SubDocId", sort="Id asc",
q="/export")


Joel Bernstein
http://joelsolr.blogspot.com/

On Sun, Jan 22, 2017 at 10:17 PM, Sadheera Vithanage 
wrote:

> Hello,
>
> When I issue a OuterHashJoin (stream expression) as below it doesn't return
> any results from the right stream.
>
> outerHashJoin(
> search(ChildDocuments, q=*:* , fl="ContentId", sort="ContentId asc"),
> hashed=
> search(ParentDocuments, q=*:*, fl="Id,DocId,SubDocId", sort="Id asc"),
> on="ContentId=Id"
> )
>
> However, When I filter the Right stream as below, it gives me the results
> from both streams.
>
>
> outerHashJoin(
> search(ChildDocuments, q=*:* , fl="ContentId", sort="ContentId asc"),
> hashed=
> search(ParentDocuments, q=DocId:1042, fl="Id,DocId,SubDocId", sort="Id
> asc"),
> on="ContentId=Id"
> )
>
>
> What am I doing wrong here.
>
> Thank you.
>
> --
> Regards
>
> Sadheera Vithanage
>


Multivalued Fields queries for Occurences.

2017-01-23 Thread slee
Hi Everyone,

Assume we have the following documents stored, where animals is a
multivalued fields..

someGuid1

cats
dogs
wolf




someGuid2

cats
cats
wolf




someGuid3

cats
kangaroo
wolf



How do i query a document where the field "animals" contains "cats" with
occurences > 1?
So this query should return doc with "id":someGuid2

Is this possible?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multivalued-Fields-queries-for-Occurences-tp4315482.html
Sent from the Solr - User mailing list archive at Nabble.com.


suggest do not work on solr cloud

2017-01-23 Thread Noriyuki TAKEI
Hi.all.

We are running 3 shards,2 replicas solr cloud(Ver 6.3)
which is under control by Zoo Keeper(Ver 3.4.9).

We have a problem that suggesting by SpellCheck Component
do not work.I confirmed it do work well on single node.

When not using Solr Cloud,I can get the expected result by
sending the query below.

http://hostname:8983/solr/collection/suggest_ja?spellcheck.q=a&wt=json&indent=true&spellcheck.collate=true

But when using Solr Cloud,I get no words suggested.

My solrconfig.xml is as below.

  

  suggest_ja
  org.apache.solr.spelling.suggest.Suggester
  org.apache.solr.spelling.suggest.fst.AnalyzingLookupFactory
  suggest_ja
  true
  true
  freq
  suggest
  text_ja_romaji
  true

text_ja_romaji
  

  

  true
  suggest_ja
  false
  10
  true
  true
  false
  suggest

 
   suggest_ja
  terms

  


How do I solve this?


Re: DIH do not work. Child-entity cannot refer parent's id.

2017-01-23 Thread Keiichi MORITA
Resolved. My problem occurred because of the case-sensitive. 
I've read the source code of Solr-6.3 and found a code referencing metadata
of databases, 
so I finally noticed that Oracle Database returns *UPPERCASE* letters from
metadata.

As a correct setting, in the where clause of the query called by the child
entity,
it have to be written as follows.

## data-config.xml

  

- 
+
  

  

Only changing '${books.book_id}' to '${books_BOOK_ID}'.

In my first Solr challenge, it took a while to solve it. 
Because I overlooked the small but important reasons.
I hope this will be helpful for someone else.


Kind,
Keiichi




--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-do-not-work-Child-entity-cannot-refer-parent-s-id-tp4315023p4315444.html
Sent from the Solr - User mailing list archive at Nabble.com.


[ANNOUNCE] Apache Solr 6.4.0 released

2017-01-23 Thread jim ferenczi
23 January 2016 - Apache Solr™ 6.4.0 Available
The Lucene PMC is pleased to announce the release of Apache Solr 6.4.0.

Solr is the popular, blazing fast, open source NoSQL search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search and analytics, rich document
parsing, geospatial search, extensive REST APIs as well as parallel SQL.
Solr is enterprise grade, secure and highly scalable, providing fault
tolerant distributed search and indexing, and powers the search and
navigation features of many of the world's largest internet sites.

Solr 6.4.0 is available for immediate download at:
http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

Highlights of this Solr release include:

Streaming:
  * Addition of a HavingStream to Streaming API and Streaming Expressions
  * Addition of a priority Streaming Expression
  * Streaming expressions now support collection aliases

Machine Learning:
  * Configurable Learning-To-Rank (LTR) support: upload feature
definitions, extract feature values, upload your own machine learnt models
and use them to rerank search results.

Faceting:
  * Added "param" query type to facet domain filter specification to obtain
filters via query parameters
  * Any facet command can be filtered using a new parameter filter.
Example: { type:terms, field:category, filter:"user:yonik" }

Scripts / Command line:
  * A new command-line tool to manage the snapshots functionality
  * bin/solr and bin/solr.cmd now use mkroot command

SolrCloud / SolrJ
  * LukeResponse now supports dynamic fields
  * Solrj client now supports hierarchical clusters and other topics marker
  * Collection backup/restore are extensible.

Security:
  * Support Secure Impersonation / Proxy User for Solr authentication
  * Key Store type can be specified in solr.in.sh file for SSL
  * New generic authentication plugins: 'HadoopAuthPlugin' and
'ConfigurableInternodeAuthHadoopPlugin' that delegate all functionality to
Hadoop authentication framework

Query / QueryParser / Highlighting:
  * A new highlighter: The Unified Highlighter. Try it via
hl.method=unified; many popular highlighting parameters / features are
supported. It's the highest performing highlighter, especially for large
documents. Highlighting phrase queries and exotic queries are supported
equally as well as the Original Highlighter (aka the default/standard one).
Please use this new highlighter and report issues since it will likely
become the default one day.
  * Leading wildcard in complexphrase query parser are now accepted and
optimized with the ReversedWildcardFilterFactory when it's provided

Metrics:
  * Use metrics-jvm library to instrument jvm internals such as GC, memory
usage and others.
  * A lot of metrics have been added to the collection: index merges, index
store I/Os, query, update, core admin, core load thread pools, shard
replication, tlog replay and replicas
  * A new /admin/metrics API to return all metrics collected by Solr via
API.

Misc changes:
  * The new config parameter 'maxRamMB'can now limit the memory consumed by
the FastLRUCache
  * A new document processor 'SkipExistingDocumentsProcessor' that skips
duplicate inserts and ignores updates to missing docs
  * FieldCache information fetched via the mbeans handler or seen via the
UI now displays the total size used.
  * A new config flag 'enable' allows to enable/disable any cache

Please note, this release cannot be built from source with Java 8 update
121, use an earlier version instead! This is caused by a bug introduced
into the Javadocs tool shipped with that update. The workaround was too
late for this Lucene release. Of course, you can use the binary artifacts.

See the Solr CHANGES.txt files included with the release for a full list of
details.

Thanks,
Jim Ferenczi