Re: Update to solr 5 - custom phrase query implementation issue

2016-02-03 Thread Gerald Reinhart

On 02/02/2016 03:20 PM, Erik Hatcher wrote:

On Feb 2, 2016, at 8:57 AM, Elodie Sannier  wrote:

Hello,

We are using solr 4.10.4 and we want to update to 5.4.1.

With solr 4.10.4:
- we extend PhraseQuery with a custom class in order to remove some
terms from phrase queries with phrase slop (update of add(Term term, int
position) method)
- in order to use our implementation, we extend ExtendedSolrQueryParser
with a custom class and we override the method newPhraseQuery but with
solr 5 this method does not exist anymore

How can we do this with solr 5.4.1 ?


You’ll want to override this method, it looks like:

protected Query getFieldQuery(String field, String queryText, int slop)



Hi Erik,

To change the behavior of the PhraseQuery either:
   - we change it after the natural cycle. The PhraseQuery is supposed
to be immutable and the setSlop(int s) is deprecated... we don't really
want to do this.
   - we override the code that actually build it :
org.apache.solr.search.ExtendedDismaxQParser.getFieldQuery(String field,
String val, int slop) PROTECTED
  use getAliasedQuery() PROTECTED
use getQuery() PRIVATE which use new PhraseQuery.Builder()
to create the query...

so not easy to override the behavior: we would need to
overrride/duplicate getAliasedQuery() and getQuery() methods. we don't
really want to do this either.

So we don't really know where to go.

Thanks,


Gerald (I'm working with Elodie on the subject)



Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 158 Ter Rue du Temple 75003 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: Solr segment merging in different replica

2016-02-03 Thread Emir Arnautovic

Hi Edwin,
Master-Slave's main (maybe only) advantage is simpler infrastructure - 
it does not use ZK. Also, it does assume you don't need NRT search since 
there has to be longer periods between replicating master changes to slaves.


Regards,
Emir

On 03.02.2016 04:48, Zheng Lin Edwin Yeo wrote:

Hi Emir,

Thanks for your reply.

As currently both of my main and replica are in the same server, and as I
am using the SolrCloud setup, both the replica are doing the merging
concurrently, which causes the memory usage of the server to be very high,
and affect the other functions like querying. This issue should be
eliminated when I shift my replica to another server.

Would like to check, will there be any advantage if I change to the
Master-Slave setup, as compared to the SolrCloud setup which I am currently
using?

Regards,
Edwin



On 2 February 2016 at 21:23, Emir Arnautovic 
wrote:


Hi Edwin,
Do you see any signs of network being bottleneck that would justify such
setup? I would suggest you monitor your cluster before deciding if you need
separate interfaces for external and internal communication. Sematext's SPM
(http://sematext.com/spm) allows you to monitor SolrCloud, hosts and
network and identify bottlenecks in your cluster.

Regards,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



On 02.02.2016 00:50, Zheng Lin Edwin Yeo wrote:


Hi Emir,

My setup is SolrCloud.

Also, will it be good to use a separate network interface to connect the
two node with the interface that is used to connect to the network for
searching?

Regards,
Edwin


On 1 February 2016 at 19:01, Emir Arnautovic <
emir.arnauto...@sematext.com>
wrote:

Hi Edwin,

What is your setup - SolrCloud or Master-Slave? If it si SolrCloud, then
under normal index updates, each core is behaving as independent index.
In
theory, if all changes happen at the same time on all nodes, merges will
happen at the same time. But that is not realistic and it is expected to
happen in slightly different time.
If you are running Master-Slave, then new segments will be copied from
master to slave.

Regards,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/




On 01.02.2016 11:56, Zheng Lin Edwin Yeo wrote:

Hi,

I would like to check, during segment merging, how did the replical node
do
the merging?
Will it do the merging concurrently, or will the replica node delete the
old segment and replace the new one?

Also, is it possible to separate the network interface for inter-node
communication from the network interface for update/search requests?
If so I could put two network cards in each machine and route the index
and
search traffic over the first interface and the traffic for the
inter-node
communication (sending documents to replicas) over the second interface.

I'm using Solr 5.4.0

Regards,
Edwin





--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: plugging an analyzer

2016-02-03 Thread Roxana Danger
Hi Scott,
I have everything well configured but no parameters can be passed to an
Analyzer (FieldTypePluginLoader just calls the constructor without
parameters).
So, the quick solution is to create a TokenizerFactory instead.

Thanks,
Roxana



On 2 February 2016 at 16:29, Scott Stults  wrote:

> There are a lot of things that can go wrong when you're wiring up a custom
> analyzer. I'd first check the simple things:
>
> * Custom jar is in Solr's classpath
> * Not using the custom factory in a field type's analysis chain
> * Not declaring a field with that type
> * Not using that field in a document
> * Assuming the tokenizer/filter will be instantiated directly rather than
> through the factory interfaces.
>
> Hope that helps!
>
> k/r,
> Scott
>
>
> On Tue, Feb 2, 2016 at 3:04 AM, Roxana Danger <
> roxana.dan...@reedonline.co.uk> wrote:
>
> > Hello,
> > I would like to use some code embedded on an analyser. The problem is
> that
> > I need to pass some parameters for initializing it. My though was to
> create
> > a plugin and initialize the parameters with the init( Map
> > args ) or init( NamedList args ) methods as explained in
> > http://wiki.apache.org/solr/SolrPlugins.
> > But none of these methods are called when the schema is read and the
> > analyser constructed. I have also tried implementing the
> > ResourceLoaderAware interface, but the inform() method is not called
> > either.
> > I am missing something to have my analyser running? When these init
> methods
> > are and how can I trigger their call? Any suggestion that does not imply
> to
> > divide the code on Tokenizer/Filters?
> >
> > Thank you very much in advance,
> > Roxana
> >
>
>
>
> --
> Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
> | 434.409.2780
> http://www.opensourceconnections.com
>



-- 
Roxana Danger | Data Scientist
Dragon Court, 27-29 Macklin Street, London, WC2B 5LX
Tel: 020 7067 4568
[image: reed.co.uk] 
The UK's #1 job site. 
[image: Follow us on Twitter] 
 [image: Like us on Facebook]


It's time to Love Mondays » 


RE: Using Tika that comes with Solr 5.2

2016-02-03 Thread Allison, Timothy B.
Right.  Thank you for reporting the solution.  

Be aware, though, that some parser dependencies are not included with the Solr 
distribution, and, because of the way that Tika currently works, you'll 
silently get no text/metadata from those file types (e.g. sqlite files and 
others).  See [1] for some discussion of this.  If you want the full Tika (with 
all of its messiness) and you are already using SolrJ, use the tika-app.jar.

Your code will correctly extract content from embedded documents, but it will 
not extract metadata from embedded documents/attachments (SOLR-7229).  If you 
want to be able to process metadata from embedded docs, you might consider the 
RecursiveParserWrapper.

Note, too, that if you send in a ParseContext (SOLR-7189) in your call to 
parse, make sure to add the AutoDetectParser or else you will get no content 
from embedded docs.

Both of these will get embedded content:

parser.parse(in, contentHandler, metadata);

Or

ParseContext context = new ParseContext();
context.set(Parser.class, parser);
parser.parse(in, contentHandler, metadata, context);

This will not:
ParseContext context = new ParseContext();
parser.parse(in, contentHandler, metadata, context);


As you've already done, feel free to ask more Tika-specific questions over on 
tika-user.

Cheers,

   Tim

[1] 
https://issues.apache.org/jira/browse/TIKA-1511?focusedCommentId=14385803=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14385803

-Original Message-
From: Steven White [mailto:swhite4...@gmail.com] 
Sent: Tuesday, February 02, 2016 7:39 PM
To: solr-user@lucene.apache.org
Subject: Re: Using Tika that comes with Solr 5.2

I found my issue.  I need to include JARs off: \solr\contrib\extraction\lib\

Steve

On Tue, Feb 2, 2016 at 4:24 PM, Steven White  wrote:

> I'm not using solr-app.jar.  I need to stick with Tika JARs that come 
> with Solr 5.2 and yet get the full text extraction feature of Tika 
> (all file types it supports).
>
> At first, I started to include Tika JARs as needed; I now have all 
> Tika related JARs that come with Solr and yet it is not working.  Here 
> is the
> list: tika-core-1.7.jar, tika-java7-1.7.jar, tika-parsers-1.7.jar, 
> tika-xmp-1.7.jar, vorbis-java-tika-0.6.jar, 
> kite-morphlines-tika-core-0.12.1.jar
> and kite-morphlines-tika-decompress-0.12.1.jar.  As part of my 
> program, I also have SolrJ JARs and their dependency: 
> solr-solrj-5.2.1.jar, solr-core-5.2.1.jar, etc.
>
> You said "Might not have the parsers on your path within your Solr 
> framework?".  I"m using Tika outside Solr framework.  I'm trying to 
> use Tika from my own crawler application that uses SojrJ to send the 
> raw text to Solr for indexing.
>
> What is it that I am missing?!
>
> Steve
>
> On Tue, Feb 2, 2016 at 3:03 PM, Allison, Timothy B. 
> 
> wrote:
>
>> Might not have the parsers on your path within your Solr framework?
>>
>> Which tika jars are on your path?
>>
>> If you want the functionality of all of Tika, use the standalone 
>> tika-app.jar, but do not use the app in the same JVM as 
>> Solr...without a custom class loader.  The Solr team carefully prunes 
>> the dependencies when integrating Tika and makes sure that the main parsers 
>> _just work_.
>>
>>
>> -Original Message-
>> From: Steven White [mailto:swhite4...@gmail.com]
>> Sent: Tuesday, February 02, 2016 2:53 PM
>> To: solr-user@lucene.apache.org
>> Subject: Using Tika that comes with Solr 5.2
>>
>> Hi,
>>
>> I'm trying to use Tika that comes with Solr 5.2.  The following code 
>> is not
>> working:
>>
>> public static void parseWithTika() throws Exception {
>> File file = new File("C:\\temp\\test.pdf");
>>
>> FileInputStream in = new FileInputStream(file);
>> AutoDetectParser parser = new AutoDetectParser();
>> Metadata metadata = new Metadata();
>> metadata.add(Metadata.RESOURCE_NAME_KEY, file.getName());
>> BodyContentHandler contentHandler = new BodyContentHandler();
>>
>> parser.parse(in, contentHandler, metadata);
>>
>> String content = contentHandler.toString();   <=== 'content' is always
>> empty
>>
>> in.close();
>> }
>>
>> 'content' is always empty string unless when the file I pass to Tika 
>> is a text file.  Any idea what's the issue?
>>
>> I have also tried sample codes off
>> https://tika.apache.org/1.8/examples.html
>> with the same result.
>>
>>
>> Thanks !!
>>
>> Steve
>>
>
>


Re: Data Import Handler takes different time on different machines

2016-02-03 Thread Troy Edwards
While researching the space on the servers, I found that log files from
Sept 2015 are still there. These are solr_gc_log_datetime and
solr_log_datetime.

Is the default logging for Solr ok for production systems or does it need
to be changed/tuned?

Thanks,

On Tue, Feb 2, 2016 at 2:04 PM, Troy Edwards 
wrote:

> That is help!
>
> Thank you for the thoughts.
>
>
> On Tue, Feb 2, 2016 at 12:17 PM, Erick Erickson 
> wrote:
>
>> Scratch that installation and start over?
>>
>> Really, it sounds like something is fundamentally messed up with the
>> Linux install. Perhaps something as simple as file paths, or you have
>> old jars hanging around that are mis-matched. Or someone manually
>> deleted files from the Solr install. Or your disk filled up. Or
>>
>> How sure are you that the linux setup was done properly?
>>
>> Not much help I know,
>> Erick
>>
>> On Tue, Feb 2, 2016 at 10:11 AM, Troy Edwards 
>> wrote:
>> > Rerunning the Data Import Handler again on the the linux machine has
>> > started producing some errors and warnings:
>> >
>> > On the node on which DIH was started:
>> >
>> > WARN SolrWriter Error creating document : SolrInputDocument
>> >
>> > org.apache.solr.common.SolrException: No registered leader was found
>> > after waiting for 4000ms , collection: collectionmain slice: shard1
>> >
>> >
>> >
>> > On the second node:
>> >
>> > WARN ReplicationHandler Exception while writing response for params:
>> >
>> command=filecontent=true=1047=/replication=filestream=_1oo_Lucene50_0.tip
>> >
>> > java.nio.file.NoSuchFileException:
>> >
>> /var/solr/data/collectionmain_shard2_replica1/data/index/_1oo_Lucene50_0.tip
>> >
>> >
>> > ERROR
>> >
>> > Index fetch failed :org.apache.solr.common.SolrException: Unable to
>> > download _169.si completely. Downloaded 0!=466
>> >
>> >
>> > ReplicationHandler Index fetch failed
>> > :org.apache.solr.common.SolrException: Unable to download _169.si
>> > completely. Downloaded 0!=466
>> >
>> > WARN
>> > IndexFetcher File _1pd_Lucene50_0.tim did not match. expected checksum
>> is
>> > 3549855722 and actual is checksum 2062372352. expected length is 72522
>> and
>> > actual length is 39227
>> >
>> > WARN UpdateLog Log replay finished.
>> recoveryInfo=RecoveryInfo{adds=840638
>> > deletes=0 deleteByQuery=0 errors=0 positionOfStart=554264}
>> >
>> >
>> > Any suggestions about this?
>> >
>> > Thanks
>> >
>> > On Mon, Feb 1, 2016 at 10:03 PM, Erick Erickson <
>> erickerick...@gmail.com>
>> > wrote:
>> >
>> >> The first thing I'd be looking at is how I the JDBC batch size compares
>> >> between the two machines.
>> >>
>> >> AFAIK, Solr shouldn't notice the difference, and since a large majority
>> >> of the development is done on Linux-based systems, I'd be surprised if
>> >> this was worse than Windows, which would lead me to the one thing that
>> >> is definitely different between the two: Your JDBC driver and its
>> settings.
>> >> At least that's where I'd look first.
>> >>
>> >> If nothing immediate pops up, I'd probably write a small driver
>> program to
>> >> just access the database from the two machines and process your 10M
>> >> records _without_ sending them to Solr and see what the comparison is.
>> >>
>> >> You can also forgo DIH and do a simple import program via SolrJ. The
>> >> advantage here is that the comparison I'm talking about above is
>> >> really simple, just comment out the call that sends data to Solr.
>> Here's an
>> >> example...
>> >>
>> >> https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Mon, Feb 1, 2016 at 7:34 PM, Troy Edwards > >
>> >> wrote:
>> >> > Sorry, I should explain further. The Data Import Handler had been
>> running
>> >> > for a while retrieving only about 15 records from the database.
>> Both
>> >> in
>> >> > development env (windows) and linux machine it took about 3 mins.
>> >> >
>> >> > The query has been changed and we are now trying to retrieve about 10
>> >> > million records. We do expect the time to increase.
>> >> >
>> >> > With the new query the time taken on windows machine is consistently
>> >> around
>> >> > 40 mins. While the DIH is running queries slow down i.e. a query that
>> >> > typically took 60 msec takes 100 msec.
>> >> >
>> >> > The time taken on linux machine is consistently around 2.5 hours.
>> While
>> >> the
>> >> > DIH is running queries take about 200  to 400 msec.
>> >> >
>> >> > Thanks!
>> >> >
>> >> > On Mon, Feb 1, 2016 at 8:45 PM, Erick Erickson <
>> erickerick...@gmail.com>
>> >> > wrote:
>> >> >
>> >> >> What happens if you run just the SQL query from the
>> >> >> windows box and from the linux box? Is there any chance
>> >> >> that somehow the connection from the linux box is
>> >> >> just slower?
>> >> >>
>> >> >> Best,
>> >> >> Erick
>> >> >>
>> >> >> On Mon, Feb 1, 2016 at 6:36 PM, Alexandre Rafalovitch
>> >> >> 

Re: Tutorial or Code Samples to explain how to Write Solr Plugins

2016-02-03 Thread Charlie Hull
Here's one we wrote recently for indexing ontologies with Solr as part of
the BioSolr project:
https://github.com/flaxsearch/BioSolr/tree/master/ontology/solr and a
presentation on how it works (explained in the second half of the talk)
https://www.youtube.com/watch?v=v1qKNX_axdI - hope this helps!

Cheers

Charlie

On 3 February 2016 at 18:45, Upayavira  wrote:

> Not a tutorial as such, but here's some simple infrastructure for
> building Solr components alongside Solr:
>
> https://github.com/upayavira/custom-solr-components
>
> I suspect you're past that stage already though.
>
> Upayavira
>
> On Wed, Feb 3, 2016, at 04:45 PM, Binoy Dalal wrote:
> > Here's a couple of links you can follow to get started:
> >
> https://www.slideshare.net/mobile/searchbox-com/tutorial-on-developin-a-solr-search-component-plugin
> >
> https://www.slideshare.net/mobile/searchbox-com/develop-a-solr-request-handler-plugin
> > These are to write a search component and a request handler respectively.
> > They are on older solr versions but they should work with 5.x as well.
> > I used these to get started when I was trying to write my first plugin.
> > Once you get a hang of how it's to be done it's really not that
> > difficult.
> >
> > On Wed, 3 Feb 2016, 21:59 Gian Maria Ricci - aka Alkampfer <
> > alkamp...@nablasoft.com> wrote:
> >
> > > Hi,
> > >
> > >
> > >
> > > I wonder if there is some code samples or tutorial (updated to work
> with
> > > version 5) to help users writing plugins.
> > >
> > >
> > >
> > > I’ve found lots of difficulties on the past to find such kind of
> > > information when I needed to write some plugins, and I wonder if I
> missed
> > > some site or link that does a better explanation than official page
> > > http://wiki.apache.org/solr/SolrPlugins that is really old.
> > >
> > >
> > >
> > > Thanks in advance.
> > >
> > >
> > >
> > > --
> > > Gian Maria Ricci
> > > Cell: +39 320 0136949
> > >
> > > [image:
> > >
> https://ci5.googleusercontent.com/proxy/5oNMOYAeFXZ_LDKanNfoLRHC37mAZkVVhkPN7QxMdA0K5JW2m0bm8azJe7oWZMNt8fKHNX1bzrUTd-kIyE40CmwT2Mlf8OI=s0-d-e1-ft#http://www.codewrecks.com/files/signature/mvp.png
> ]
> > > 
> [image:
> > >
> https://ci3.googleusercontent.com/proxy/f-unQbmk6NtkHFspO5Y6x4jlIf_xrmGLUT3fU9y_7VUHSFUjLs7aUIMdZQYTh3eWIA0sBnvNX3WGXCU59chKXLuAHi2ArWdAcBclKA=s0-d-e1-ft#http://www.codewrecks.com/files/signature/linkedin.jpg
> ]
> > >  [image:
> > >
> https://ci3.googleusercontent.com/proxy/gjapMzu3KEakBQUstx_-cN7gHJ_GpcIZNEPjCzOYMrPl-r1DViPE378qNAQyEWbXMTj6mcduIAGaApe9qHG1KN_hyFxQAIkdNSVT=s0-d-e1-ft#http://www.codewrecks.com/files/signature/twitter.jpg
> ]
> > >  [image:
> > >
> https://ci5.googleusercontent.com/proxy/iuDOD2sdaxRDvTwS8MO7-CcXchpNJX96uaWuvagoVLcjpAPsJi88XeOonE4vHT6udVimo7yL9ZtdrYueEfH7jXnudmi_Vvw=s0-d-e1-ft#http://www.codewrecks.com/files/signature/rss.jpg
> ]
> > >  [image:
> > >
> https://ci6.googleusercontent.com/proxy/EBJjfkBzcsSlAzlyR88y86YXcwaKfn3x7ydAObL1vtjJYclQr_l5TvrFx4PQ5qLNYW3yp7Ig66DJ-0tPJCDbDmYAFcamPQehwg=s0-d-e1-ft#http://www.codewrecks.com/files/signature/skype.jpg
> ]
> > >
> > >
> > >
> > --
> > Regards,
> > Binoy Dalal
>


Re: "I was asked to wait on state recovering for shard.... but I still do not see the request state"

2016-02-03 Thread hawk
I have a similar issue on 4.10.1

160131 21:07:36.932 http-bio-8082-exec-2802
org.apache.solr.common.SolrException: I was asked to wait on state
recovering for shard2 in my_cases on localhost2:8080_solr but I still do not
see the requested state. I see state: recovering live:true leader from ZK:
http://localhost1:8080/solr/my_cases/
at
org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler.java:999)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:245)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:188)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:258)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/I-was-asked-to-wait-on-state-recovering-for-shard-but-I-still-do-not-see-the-request-state-tp4204348p4255043.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Math Function floats

2016-02-03 Thread Curtis Fehr
Hey there guys and gals,

I'm trying to devise a mathematical checksum in solr, so that I can quickly 
check the records against the authoritative source.  The problem I'm running 
into, is that all the math functions I want to use in solr are 32 bit floats, 
and my checksum values are 64 bit longs, so I cannot get a matching or valid 
checksum.  I've tried to workaround this by trying to reduce the number down 
with log, ln, pow, etc, but then it doesn't detect close but not matching data 
scenarios.  Is there any reason that the math functions are floats instead of 
doubles?  How much effort would be required to either change the existing math 
functions to use doubles, or to add a complement of 64 bit math functions?

Thanks





I am sending this e-mail to specific persons. From time to time, I may 
inadvertenly include a recipient by mistake. You will likely recognize the 
mistake from the subject line or from the list of other recipients. In that 
case, please do not read, copy, forward, or store the message and do let me 
know about my mistake after you have deleted it from your e-mail system. In any 
case, please also note that this message may contain information that is 
confidential or legally privileged, and you should treat the message and its 
content appropriately.


Re: Update to solr 5 - custom phrase query implementation issue

2016-02-03 Thread Erik Hatcher
Gerald - I don’t quite understand, sorry - perhaps best if you could post your 
code (or some test version you can work with and share here) so we can see what 
exactly you’re trying to do.Maybe there’s other ways to achieve what you 
want, maybe with somehow leveraging a StopFilter-like facility to remove phrase 
terms.   edismax has some stop word (inclusion, rather than exclusion, though) 
magic with the pf2 and pf3 and stopwords parameters - maybe worth leveraging 
something like how that works or maybe adding some options or pluggability to 
the edismax phrase/stopword facility?

Erik



> On Feb 3, 2016, at 6:05 AM, Gerald Reinhart  
> wrote:
> 
> On 02/02/2016 03:20 PM, Erik Hatcher wrote:
>>> On Feb 2, 2016, at 8:57 AM, Elodie Sannier  wrote:
>>> 
>>> Hello,
>>> 
>>> We are using solr 4.10.4 and we want to update to 5.4.1.
>>> 
>>> With solr 4.10.4:
>>> - we extend PhraseQuery with a custom class in order to remove some
>>> terms from phrase queries with phrase slop (update of add(Term term, int
>>> position) method)
>>> - in order to use our implementation, we extend ExtendedSolrQueryParser
>>> with a custom class and we override the method newPhraseQuery but with
>>> solr 5 this method does not exist anymore
>>> 
>>> How can we do this with solr 5.4.1 ?
>> 
>> You’ll want to override this method, it looks like:
>> 
>>protected Query getFieldQuery(String field, String queryText, int slop)
>> 
>> 
> Hi Erik,
> 
> To change the behavior of the PhraseQuery either:
>   - we change it after the natural cycle. The PhraseQuery is supposed
> to be immutable and the setSlop(int s) is deprecated... we don't really
> want to do this.
>   - we override the code that actually build it :
> org.apache.solr.search.ExtendedDismaxQParser.getFieldQuery(String field,
> String val, int slop) PROTECTED
>  use getAliasedQuery() PROTECTED
>use getQuery() PRIVATE which use new PhraseQuery.Builder()
> to create the query...
> 
>so not easy to override the behavior: we would need to
> overrride/duplicate getAliasedQuery() and getQuery() methods. we don't
> really want to do this either.
> 
> So we don't really know where to go.
> 
> Thanks,
> 
> 
> Gerald (I'm working with Elodie on the subject)
> 
> 
> 
> Kelkoo SAS
> Société par Actions Simplifiée
> Au capital de € 4.168.964,30
> Siège social : 158 Ter Rue du Temple 75003 Paris
> 425 093 069 RCS Paris
> 
> Ce message et les pièces jointes sont confidentiels et établis à l'attention 
> exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
> message, merci de le détruire et d'en avertir l'expéditeur.



Re: Update to solr 5 - custom phrase query implementation issue

2016-02-03 Thread Gerald Reinhart

Erik, here is some context :

   - migration from solr 4.10.4 to 5.4.1.
   - we have our own synonym implementation that do not use solr
synonym mechanism: at the time, we needed to manage multi token synonym
and it wasn't covered by the Lucene features. So basically we
- let' say that "playstation portable" is a synonym of
"psp". We identify "psp" and "playstation portable" as .
- at index time, on every document with "psp" we replace it
by "psp "
- at query-time:
   - By extending SearchHandler, we replace the query
"playstation portable" by  "((playstation AND portable) OR )".
   - (1) By extending ExtendedSolrQueryParser, we do
not add synonym ids in PhraseQuery. We need advice for the migration to
Solr 5.4.1
   - By extending BooleanQuery, we adjust the
coordination factor. We need advice for the migration to Solr 5.4.1 (see
other question in the mailing list)

   Hope it's clearer

Thanks

Gérald


(1)
public class MyPhraseQuery extends PhraseQuery {
   private int deltaPosition = 0;

   @Override
   public void add(Term term, int position) {
  String termText = term.text();
  if (!termText.matches(Constants.SYNONYM_PIVOT_ID_REGEX)) {
 super.add(term, position - deltaPosition);
  }else{
 deltaPosition++;
  }
   }
}


On 02/03/2016 03:05 PM, Erik Hatcher wrote:

Gerald - I don’t quite understand, sorry - perhaps best if you could post your 
code (or some test version you can work with and share here) so we can see what 
exactly you’re trying to do.Maybe there’s other ways to achieve what you 
want, maybe with somehow leveraging a StopFilter-like facility to remove phrase 
terms.   edismax has some stop word (inclusion, rather than exclusion, though) 
magic with the pf2 and pf3 and stopwords parameters - maybe worth leveraging 
something like how that works or maybe adding some options or pluggability to 
the edismax phrase/stopword facility?

  Erik




On Feb 3, 2016, at 6:05 AM, Gerald Reinhart  wrote:

On 02/02/2016 03:20 PM, Erik Hatcher wrote:

On Feb 2, 2016, at 8:57 AM, Elodie Sannier  wrote:

Hello,

We are using solr 4.10.4 and we want to update to 5.4.1.

With solr 4.10.4:
- we extend PhraseQuery with a custom class in order to remove some
terms from phrase queries with phrase slop (update of add(Term term, int
position) method)
- in order to use our implementation, we extend ExtendedSolrQueryParser
with a custom class and we override the method newPhraseQuery but with
solr 5 this method does not exist anymore

How can we do this with solr 5.4.1 ?

You’ll want to override this method, it looks like:

protected Query getFieldQuery(String field, String queryText, int slop)



Hi Erik,

To change the behavior of the PhraseQuery either:
   - we change it after the natural cycle. The PhraseQuery is supposed
to be immutable and the setSlop(int s) is deprecated... we don't really
want to do this.
   - we override the code that actually build it :
org.apache.solr.search.ExtendedDismaxQParser.getFieldQuery(String field,
String val, int slop) PROTECTED
  use getAliasedQuery() PROTECTED
use getQuery() PRIVATE which use new PhraseQuery.Builder()
to create the query...

so not easy to override the behavior: we would need to
overrride/duplicate getAliasedQuery() and getQuery() methods. we don't
really want to do this either.

So we don't really know where to go.

Thanks,


Gerald (I'm working with Elodie on the subject)



Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 158 Ter Rue du Temple 75003 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.



--

Kelkoo



*Gérald Reinhart *Software engineer

*E*gerald.reinh...@kelkoo.com 
*Y!Messenger*gerald.reinhart

*A*Rue des Méridiens 38130 Echirolles






Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 158 Ter Rue du Temple 75003 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: Tutorial or Code Samples to explain how to Write Solr Plugins

2016-02-03 Thread Alexandre Rafalovitch
There is a framework to help write them:
https://github.com/leonardofoderaro/alba

Also, some recent plugins were released at the Revolution conference,
maybe they have something useful:
https://github.com/DiceTechJobs/SolrPlugins

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 4 February 2016 at 08:25, Charlie Hull  wrote:
> Here's one we wrote recently for indexing ontologies with Solr as part of
> the BioSolr project:
> https://github.com/flaxsearch/BioSolr/tree/master/ontology/solr and a
> presentation on how it works (explained in the second half of the talk)
> https://www.youtube.com/watch?v=v1qKNX_axdI - hope this helps!
>
> Cheers
>
> Charlie
>
> On 3 February 2016 at 18:45, Upayavira  wrote:
>
>> Not a tutorial as such, but here's some simple infrastructure for
>> building Solr components alongside Solr:
>>
>> https://github.com/upayavira/custom-solr-components
>>
>> I suspect you're past that stage already though.
>>
>> Upayavira
>>
>> On Wed, Feb 3, 2016, at 04:45 PM, Binoy Dalal wrote:
>> > Here's a couple of links you can follow to get started:
>> >
>> https://www.slideshare.net/mobile/searchbox-com/tutorial-on-developin-a-solr-search-component-plugin
>> >
>> https://www.slideshare.net/mobile/searchbox-com/develop-a-solr-request-handler-plugin
>> > These are to write a search component and a request handler respectively.
>> > They are on older solr versions but they should work with 5.x as well.
>> > I used these to get started when I was trying to write my first plugin.
>> > Once you get a hang of how it's to be done it's really not that
>> > difficult.
>> >
>> > On Wed, 3 Feb 2016, 21:59 Gian Maria Ricci - aka Alkampfer <
>> > alkamp...@nablasoft.com> wrote:
>> >
>> > > Hi,
>> > >
>> > >
>> > >
>> > > I wonder if there is some code samples or tutorial (updated to work
>> with
>> > > version 5) to help users writing plugins.
>> > >
>> > >
>> > >
>> > > I’ve found lots of difficulties on the past to find such kind of
>> > > information when I needed to write some plugins, and I wonder if I
>> missed
>> > > some site or link that does a better explanation than official page
>> > > http://wiki.apache.org/solr/SolrPlugins that is really old.
>> > >
>> > >
>> > >
>> > > Thanks in advance.
>> > >
>> > >
>> > >
>> > > --
>> > > Gian Maria Ricci
>> > > Cell: +39 320 0136949
>> > >
>> > > [image:
>> > >
>> https://ci5.googleusercontent.com/proxy/5oNMOYAeFXZ_LDKanNfoLRHC37mAZkVVhkPN7QxMdA0K5JW2m0bm8azJe7oWZMNt8fKHNX1bzrUTd-kIyE40CmwT2Mlf8OI=s0-d-e1-ft#http://www.codewrecks.com/files/signature/mvp.png
>> ]
>> > > 
>> [image:
>> > >
>> https://ci3.googleusercontent.com/proxy/f-unQbmk6NtkHFspO5Y6x4jlIf_xrmGLUT3fU9y_7VUHSFUjLs7aUIMdZQYTh3eWIA0sBnvNX3WGXCU59chKXLuAHi2ArWdAcBclKA=s0-d-e1-ft#http://www.codewrecks.com/files/signature/linkedin.jpg
>> ]
>> > >  [image:
>> > >
>> https://ci3.googleusercontent.com/proxy/gjapMzu3KEakBQUstx_-cN7gHJ_GpcIZNEPjCzOYMrPl-r1DViPE378qNAQyEWbXMTj6mcduIAGaApe9qHG1KN_hyFxQAIkdNSVT=s0-d-e1-ft#http://www.codewrecks.com/files/signature/twitter.jpg
>> ]
>> > >  [image:
>> > >
>> https://ci5.googleusercontent.com/proxy/iuDOD2sdaxRDvTwS8MO7-CcXchpNJX96uaWuvagoVLcjpAPsJi88XeOonE4vHT6udVimo7yL9ZtdrYueEfH7jXnudmi_Vvw=s0-d-e1-ft#http://www.codewrecks.com/files/signature/rss.jpg
>> ]
>> > >  [image:
>> > >
>> https://ci6.googleusercontent.com/proxy/EBJjfkBzcsSlAzlyR88y86YXcwaKfn3x7ydAObL1vtjJYclQr_l5TvrFx4PQ5qLNYW3yp7Ig66DJ-0tPJCDbDmYAFcamPQehwg=s0-d-e1-ft#http://www.codewrecks.com/files/signature/skype.jpg
>> ]
>> > >
>> > >
>> > >
>> > --
>> > Regards,
>> > Binoy Dalal
>>


Re: "I was asked to wait on state recovering for shard.... but I still do not see the request state"

2016-02-03 Thread Mark Miller
You get this when the Overseer is either bogged down or not processing
events generally.

The Overseer is way, way faster at processing events in 5x.

If you search your logs for .Overseer you can see what it's doing. Either
nothing at the time, or bogged down processing state updates probably.

Along with 5x Overseer processing being much more efficient, SOLR-7281 is
going to take out a lot of state publishing on shutdown that can end up
getting processed on the next startup.

- Mark

On Wed, Feb 3, 2016 at 6:39 PM hawk  wrote:

> Here are more details around the event.
>
> 160201 11:57:22.272 http-bio-8082-exec-18 [] webapp=/solr path=/update
> params={waitSearcher=true=http://x:x
> /solr//=FROMLEADER=true=true=javabin=false_end_point=true=2=false}
> {commit=} 0 134
>
> 160201 11:57:25.993 RecoveryThread Error while trying to recover.
> core=x
> java.util.concurrent.ExecutionException:
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: I was
> asked to wait on state recovering for shard2 in xxx on xxx:xx_solr but I
> still do not see the requested state. I see state: recovering live:true
> leader from ZK: http://x:x/solr//
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:188)
> at
>
> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:615)
> at
>
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:371)
> at
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)
> Caused by:
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: I was
> asked to wait on state recovering for shard2 in xxx on xxx:xx_solr but I
> still do not see the requested state. I see state: recovering live:true
> leader from ZK: http://x:x/solr//
> at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:550)
> at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:245)
> at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:241)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
> 160201 11:57:25.993 RecoveryThread Recovery failed - trying again... (7)
> core=
>
> 160201 11:57:25.994 RecoveryThread Wait 256.0 seconds before trying to
> recover again (8)
>
> 160201 11:57:30.370 http-bio-8082-exec-3
> org.apache.solr.common.SolrException: no servers hosting shard:
> at
>
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149)
> at
>
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/I-was-asked-to-wait-on-state-recovering-for-shard-but-I-still-do-not-see-the-request-state-tp4204348p4255073.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
- Mark
about.me/markrmiller


Re: "I was asked to wait on state recovering for shard.... but I still do not see the request state"

2016-02-03 Thread hawk
Here are more details around the event.

160201 11:57:22.272 http-bio-8082-exec-18 [] webapp=/solr path=/update
params={waitSearcher=true=http://x:x/solr//=FROMLEADER=true=true=javabin=false_end_point=true=2=false}
{commit=} 0 134

160201 11:57:25.993 RecoveryThread Error while trying to recover. core=x
java.util.concurrent.ExecutionException:
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: I was
asked to wait on state recovering for shard2 in xxx on xxx:xx_solr but I
still do not see the requested state. I see state: recovering live:true
leader from ZK: http://x:x/solr//
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:615)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:371)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)
Caused by:
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: I was
asked to wait on state recovering for shard2 in xxx on xxx:xx_solr but I
still do not see the requested state. I see state: recovering live:true
leader from ZK: http://x:x/solr//
at
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:550)
at
org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:245)
at
org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:241)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

160201 11:57:25.993 RecoveryThread Recovery failed - trying again... (7)
core=

160201 11:57:25.994 RecoveryThread Wait 256.0 seconds before trying to
recover again (8)

160201 11:57:30.370 http-bio-8082-exec-3
org.apache.solr.common.SolrException: no servers hosting shard: 
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/I-was-asked-to-wait-on-state-recovering-for-shard-but-I-still-do-not-see-the-request-state-tp4204348p4255073.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: "I was asked to wait on state recovering for shard.... but I still do not see the request state"

2016-02-03 Thread hawk
Thanks Mark.

I was able to search "Overseer" in the solr logs around the time frame of
the condition. This particular message was from the leader node of the
shard.

160201 11:26:36.380 localhost-startStop-1 Overseer (id=null) closing

Also I found this message in the zookeeper logs.

11:26:35,218 [myid:02] - INFO [ProcessThread(sid:2
cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when
processing sessionid:0x15297c0fe2e3f2d type:create cxid:0x3 zxid:0xf0001be48
txntype:-1 reqpath:n/a Error Path:/overseer Error:KeeperErrorCode =
NodeExists for /overseer

Any thoughts what these messages suggest?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/I-was-asked-to-wait-on-state-recovering-for-shard-but-I-still-do-not-see-the-request-state-tp4204348p4255105.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tutorial or Code Samples to explain how to Write Solr Plugins

2016-02-03 Thread Binoy Dalal
Here's a couple of links you can follow to get started:
https://www.slideshare.net/mobile/searchbox-com/tutorial-on-developin-a-solr-search-component-plugin
https://www.slideshare.net/mobile/searchbox-com/develop-a-solr-request-handler-plugin
These are to write a search component and a request handler respectively.
They are on older solr versions but they should work with 5.x as well.
I used these to get started when I was trying to write my first plugin.
Once you get a hang of how it's to be done it's really not that difficult.

On Wed, 3 Feb 2016, 21:59 Gian Maria Ricci - aka Alkampfer <
alkamp...@nablasoft.com> wrote:

> Hi,
>
>
>
> I wonder if there is some code samples or tutorial (updated to work with
> version 5) to help users writing plugins.
>
>
>
> I’ve found lots of difficulties on the past to find such kind of
> information when I needed to write some plugins, and I wonder if I missed
> some site or link that does a better explanation than official page
> http://wiki.apache.org/solr/SolrPlugins that is really old.
>
>
>
> Thanks in advance.
>
>
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
> [image:
> https://ci5.googleusercontent.com/proxy/5oNMOYAeFXZ_LDKanNfoLRHC37mAZkVVhkPN7QxMdA0K5JW2m0bm8azJe7oWZMNt8fKHNX1bzrUTd-kIyE40CmwT2Mlf8OI=s0-d-e1-ft#http://www.codewrecks.com/files/signature/mvp.png]
>  [image:
> https://ci3.googleusercontent.com/proxy/f-unQbmk6NtkHFspO5Y6x4jlIf_xrmGLUT3fU9y_7VUHSFUjLs7aUIMdZQYTh3eWIA0sBnvNX3WGXCU59chKXLuAHi2ArWdAcBclKA=s0-d-e1-ft#http://www.codewrecks.com/files/signature/linkedin.jpg]
>  [image:
> https://ci3.googleusercontent.com/proxy/gjapMzu3KEakBQUstx_-cN7gHJ_GpcIZNEPjCzOYMrPl-r1DViPE378qNAQyEWbXMTj6mcduIAGaApe9qHG1KN_hyFxQAIkdNSVT=s0-d-e1-ft#http://www.codewrecks.com/files/signature/twitter.jpg]
>  [image:
> https://ci5.googleusercontent.com/proxy/iuDOD2sdaxRDvTwS8MO7-CcXchpNJX96uaWuvagoVLcjpAPsJi88XeOonE4vHT6udVimo7yL9ZtdrYueEfH7jXnudmi_Vvw=s0-d-e1-ft#http://www.codewrecks.com/files/signature/rss.jpg]
>  [image:
> https://ci6.googleusercontent.com/proxy/EBJjfkBzcsSlAzlyR88y86YXcwaKfn3x7ydAObL1vtjJYclQr_l5TvrFx4PQ5qLNYW3yp7Ig66DJ-0tPJCDbDmYAFcamPQehwg=s0-d-e1-ft#http://www.codewrecks.com/files/signature/skype.jpg]
>
>
>
-- 
Regards,
Binoy Dalal


Tutorial or Code Samples to explain how to Write Solr Plugins

2016-02-03 Thread Gian Maria Ricci - aka Alkampfer
Hi, 

 

I wonder if there is some code samples or tutorial (updated to work with
version 5) to help users writing plugins.

 

I've found lots of difficulties on the past to find such kind of information
when I needed to write some plugins, and I wonder if I missed some site or
link that does a better explanation than official page
http://wiki.apache.org/solr/SolrPlugins that is really old.

 

Thanks in advance. 

 

--
Gian Maria Ricci
Cell: +39 320 0136949

 

   


 



Re: Handling fields used for both display & index

2016-02-03 Thread Jay Potharaju
Thanks for the response Sameer & Binoy.
Jay

On Sun, Jan 31, 2016 at 6:13 PM, Binoy Dalal  wrote:

> Adding to sameer's answer, use string types when you want exact matches,
> both in terms of query and case.
> In case you want to perform additional operations on the input, like
> tokenization and applying filters, then you're better off with one of the
> other text types.
> You should take a look at the field type definition in the schema file to
> see if a predefined type fits your need, else create a custom type based on
> your requirements.
>
> On Mon, 1 Feb 2016, 07:36 Sameer Maggon  wrote:
>
> > Hi Jay,
> >
> > You could use one field for both unless there is a specific requirement
> you
> > are looking for that is not being met by that one field (e.g. faceting,
> > etc). Typically, if you have a field that is marked as both "indexed" and
> > "stored", the value that is passed while indexing to that field is stored
> > as is. However, it's indexed based on the field type that you've
> specified
> > for that field.
> >
> > e.g. a description field with the field type of "text_en" would be
> indexed
> > per the pipeline in the text_en fieldtype and the text as is will be
> stored
> > (which is what is returned in your response in the results).
> >
> > Thanks,
> > --
> > *Sameer Maggon*
> > Measured Search | Solr-as-a-Service | Solr Monitoring | Search Analytics
> > www.measuredsearch.com
> > <
> >
> https://mailtrack.io/trace/link/dca98638f8114f38d1ff30ed04feb547877c848e?url=http%3A%2F%2Fmeasuredsearch.com%2F=797ba5008ecc48b8
> > >
> >
> > On Sun, Jan 31, 2016 at 5:56 PM, Jay Potharaju 
> > wrote:
> >
> > > Hi,
> > > I am trying to decide if I should use text_en or string as my field
> type.
> > > The fields have to be both indexed and stored for display. One solution
> > is
> > > to duplicate fields, one for indexing other for display.One of the
> field
> > > happens to be a description field which I would like to avoid
> > duplicating.
> > > Solr should return results when someone searches for John or john.Is
> > > storing a copy of the field the best way to go about this problem?
> > >
> > >
> > > Thanks
> > >
> >
> --
> Regards,
> Binoy Dalal
>



-- 
Thanks
Jay Potharaju


Re: Solr for real time analytics system

2016-02-03 Thread CKReddy Bhimavarapu
Hello Rohit,

You can use the Banana project which was forked from Kibana
, and works with all kinds of time
series (and non-time series) data stored in Apache Solr
. It uses Kibana's powerful dashboard
configuration capabilities, ports key panels to work with Solr, and
provides significant additional capabilities, including new panels that
leverage D3.js 

 would need mostly aggregation queries like sum/average/groupby etc, but
> data set is quite huge. The aggregation queries should be very fast.


all your requirement can be served by this banana but I'm not sure about
how fast solr compare to ELK 

On Thu, Feb 4, 2016 at 10:51 AM, Rohit Kumar 
wrote:

> Hi
>
> I am quite new to Solr. I have to build a real time analytics system which
> displays metrics based on multiple filters over a huge data set (~50million
> documents with ~100 fileds ).  I would need mostly aggregation queries like
> sum/average/groupby etc, but data set is quite huge. The aggregation
> queries should be very fast.
>
> Is Solr suitable for such use cases?
>
> Thanks
> Rohit
>



-- 
ckreddybh. 


AW: Hard commits, soft commits and transaction logs

2016-02-03 Thread Clemens Wyss DEV
Sorry for coming back to this topic:
You (Erick) mention "By and large, do not issue commits from any client 
indexing to Solr"

In ordert o achieve NRT, I for example test
 
   18 
   true
 
  
1 
  

For (unit)testing purposes
 
   1000 
   true
 
  
500 
  

Suggestions are re-built on commit
...
true
...

(Almost) all my unit tests pass. Except for my docDeletion-test: it looks like 
expungeDeletes is never "issued" and suggestions of deleted docs are returned.
When I explicitly issue an "expunging-soft-commit"

UpdateRequest rq = new UpdateRequest();
rq.setAction( UpdateRequest.ACTION.COMMIT, false, false, 100, true, true );
rq.process( solrClient );

the test passes and no false suggestions are returned. What am I facing?

-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com] 
Gesendet: Montag, 4. Januar 2016 17:36
An: solr-user
Betreff: Re: Hard commits, soft commits and transaction logs

As far as I know. If you see anything different, let me know and we'll see if 
we can update it.

Best,
Erick

On Mon, Jan 4, 2016 at 1:34 AM, Clemens Wyss DEV  wrote:
> [Happy New Year to all]
>
> Is all herein
> https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-
> softcommit-and-commit-in-sorlcloud/
> mentioned/recommended still valid for Solr 5.x?
>
> - Clemens


Out of memory error during full import

2016-02-03 Thread Srinivas Kashyap
Hello,

I have implemented 'SortedMapBackedCache' in my SqlEntityProcessor for the 
child entities in data-config.xml. When i try to do full import, i'm getting 
OutOfMemory error(Java Heap Space). I increased the HEAP allocation to the 
maximum extent possible. Is there a workaround to do initial data load without 
running into this error?

I found that 'batchSize=-1' parameter needs to be specified in the datasource 
for MySql, is there a way to specify for others Databases as well?

Thanks and Regards,
Srinivas Kashyap

DISCLAIMER: 
E-mails and attachments from TradeStone Software, Inc. are confidential.
If you are not the intended recipient, please notify the sender immediately by
replying to the e-mail, and then delete it without making copies or using it
in any way. No representation is made that this email or any attachments are
free of viruses. Virus scanning is recommended and is the responsibility of
the recipient.

Solr for real time analytics system

2016-02-03 Thread Rohit Kumar
Hi

I am quite new to Solr. I have to build a real time analytics system which
displays metrics based on multiple filters over a huge data set (~50million
documents with ~100 fileds ).  I would need mostly aggregation queries like
sum/average/groupby etc, but data set is quite huge. The aggregation
queries should be very fast.

Is Solr suitable for such use cases?

Thanks
Rohit


Re: sorry, no dataimport-handler defined!

2016-02-03 Thread kostali hassan
in request data import handler for solrconfig.xml do :



  tika-data-config.xml

  

and define your file tika-data-config.xml and put this file in the
directory config from your core.

2016-02-02 17:35 GMT+00:00 Jean-Jacques Monot :

> Exact. Newbie user !
>
> OK i have seen what is missing ...
>
> Le 2 févr. 2016 15:40, "Davis, Daniel (NIH/NLM) [C]" 
> a écrit :
> >
> > It sounds a bit like you are just exploring Solr for the first time.
> To use the Data Import Handler, you need to create an XML file that
> configures it, data-config.xml by default.
> >
> > But before we go into details, what are you trying to accomplish with
> Solr?
> >
> > -Original Message-
> > From: Jean-Jacques MONOT [mailto:jj_mo...@yahoo.fr]
> > Sent: Monday, February 01, 2016 2:31 PM
> > To: solr-user@lucene.apache.org
> > Subject: Potential SPAM:sorry, no dataimport-handler defined!
> >
> > Hello
> >
> > I am using SOLR 5.4.1 and the graphical admin UI.
> >
> > I successfully created multiples cores and indexed various documents,
> using in line commands : (create -c) and (post.jar) on W10.
> >
> > But in the GUI, when I click on "Dataimport", I get the following
> message : "sorry, no dataimport-handler defined!"
> >
> > I get the same message even on 5.3.1 or for different cores.
> >
> > What is wrong ?
> >
> > JJM
> >
> > ---
> > L'absence de virus dans ce courrier électronique a été vérifiée par le
> logiciel antivirus Avast.
> > https://www.avast.com/antivirus
> >
> >
>


Re: Solr segment merging in different replica

2016-02-03 Thread Alessandro Benedetti
Master/Slave is the old legacy way to obtain a resilient system.
It's easier to setup, but if you are already on SolrCloud I can not see any
advantage in moving back.
Related the networking part, I am not a network expert.
The only think I can tell you is that the inter-nodes communication is
going to happen on the same REST endpoints and handlers which are used for
search/updates ( same java process).
I really doubt it is possible to have them running across different
physical network interfaces.

Cheers


On 3 February 2016 at 10:41, Emir Arnautovic 
wrote:

> Hi Edwin,
> Master-Slave's main (maybe only) advantage is simpler infrastructure - it
> does not use ZK. Also, it does assume you don't need NRT search since there
> has to be longer periods between replicating master changes to slaves.
>
> Regards,
> Emir
>
>
> On 03.02.2016 04:48, Zheng Lin Edwin Yeo wrote:
>
>> Hi Emir,
>>
>> Thanks for your reply.
>>
>> As currently both of my main and replica are in the same server, and as I
>> am using the SolrCloud setup, both the replica are doing the merging
>> concurrently, which causes the memory usage of the server to be very high,
>> and affect the other functions like querying. This issue should be
>> eliminated when I shift my replica to another server.
>>
>> Would like to check, will there be any advantage if I change to the
>> Master-Slave setup, as compared to the SolrCloud setup which I am
>> currently
>> using?
>>
>> Regards,
>> Edwin
>>
>>
>>
>> On 2 February 2016 at 21:23, Emir Arnautovic <
>> emir.arnauto...@sematext.com>
>> wrote:
>>
>> Hi Edwin,
>>> Do you see any signs of network being bottleneck that would justify such
>>> setup? I would suggest you monitor your cluster before deciding if you
>>> need
>>> separate interfaces for external and internal communication. Sematext's
>>> SPM
>>> (http://sematext.com/spm) allows you to monitor SolrCloud, hosts and
>>> network and identify bottlenecks in your cluster.
>>>
>>> Regards,
>>> Emir
>>>
>>> --
>>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>>> Solr & Elasticsearch Support * http://sematext.com/
>>>
>>>
>>>
>>> On 02.02.2016 00:50, Zheng Lin Edwin Yeo wrote:
>>>
>>> Hi Emir,

 My setup is SolrCloud.

 Also, will it be good to use a separate network interface to connect the
 two node with the interface that is used to connect to the network for
 searching?

 Regards,
 Edwin


 On 1 February 2016 at 19:01, Emir Arnautovic <
 emir.arnauto...@sematext.com>
 wrote:

 Hi Edwin,

> What is your setup - SolrCloud or Master-Slave? If it si SolrCloud,
> then
> under normal index updates, each core is behaving as independent index.
> In
> theory, if all changes happen at the same time on all nodes, merges
> will
> happen at the same time. But that is not realistic and it is expected
> to
> happen in slightly different time.
> If you are running Master-Slave, then new segments will be copied from
> master to slave.
>
> Regards,
> Emir
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
>
> On 01.02.2016 11:56, Zheng Lin Edwin Yeo wrote:
>
> Hi,
>
>> I would like to check, during segment merging, how did the replical
>> node
>> do
>> the merging?
>> Will it do the merging concurrently, or will the replica node delete
>> the
>> old segment and replace the new one?
>>
>> Also, is it possible to separate the network interface for inter-node
>> communication from the network interface for update/search requests?
>> If so I could put two network cards in each machine and route the
>> index
>> and
>> search traffic over the first interface and the traffic for the
>> inter-node
>> communication (sending documents to replicas) over the second
>> interface.
>>
>> I'm using Solr 5.4.0
>>
>> Regards,
>> Edwin
>>
>>
>>
>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


RE: Using Tika that comes with Solr 5.2

2016-02-03 Thread Allison, Timothy B.
>Be aware, though, that some parser dependencies are not included with the Solr 
>distribution, and, because of the way that Tika currently works, you'll 
>silently >get no text/metadata from those file types (e.g. sqlite files and 
>others).  See [1] for some discussion of this.  If you want the full Tika 
>(with all of its messiness) >and you are already using SolrJ, use the 
>tika-app.jar.

Correction, just realized that is mostly true.  We aren't packaging the sqlite 
jar any more in Tika-app (for the same reason that Solr doesn't -- native 
libs), you'll have to grab that and add it to your class path. :)

See also, very recently: 
https://mail-archives.apache.org/mod_mbox/tika-user/201602.mbox/%3C027601d15ea8%2443ffcf90%24cbff6eb0%24%40thetaphi.de%3E
 

-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org] 
Sent: Wednesday, February 03, 2016 7:35 AM
To: solr-user@lucene.apache.org
Subject: RE: Using Tika that comes with Solr 5.2

Right.  Thank you for reporting the solution.  

Be aware, though, that some parser dependencies are not included with the Solr 
distribution, and, because of the way that Tika currently works, you'll 
silently get no text/metadata from those file types (e.g. sqlite files and 
others).  See [1] for some discussion of this.  If you want the full Tika (with 
all of its messiness) and you are already using SolrJ, use the tika-app.jar.

Your code will correctly extract content from embedded documents, but it will 
not extract metadata from embedded documents/attachments (SOLR-7229).  If you 
want to be able to process metadata from embedded docs, you might consider the 
RecursiveParserWrapper.

Note, too, that if you send in a ParseContext (SOLR-7189) in your call to 
parse, make sure to add the AutoDetectParser or else you will get no content 
from embedded docs.

Both of these will get embedded content:

parser.parse(in, contentHandler, metadata);

Or

ParseContext context = new ParseContext(); context.set(Parser.class, parser); 
parser.parse(in, contentHandler, metadata, context);

This will not:
ParseContext context = new ParseContext(); parser.parse(in, contentHandler, 
metadata, context);


As you've already done, feel free to ask more Tika-specific questions over on 
tika-user.

Cheers,

   Tim

[1] 
https://issues.apache.org/jira/browse/TIKA-1511?focusedCommentId=14385803=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14385803

-Original Message-
From: Steven White [mailto:swhite4...@gmail.com]
Sent: Tuesday, February 02, 2016 7:39 PM
To: solr-user@lucene.apache.org
Subject: Re: Using Tika that comes with Solr 5.2

I found my issue.  I need to include JARs off: \solr\contrib\extraction\lib\

Steve

On Tue, Feb 2, 2016 at 4:24 PM, Steven White  wrote:

> I'm not using solr-app.jar.  I need to stick with Tika JARs that come 
> with Solr 5.2 and yet get the full text extraction feature of Tika 
> (all file types it supports).
>
> At first, I started to include Tika JARs as needed; I now have all 
> Tika related JARs that come with Solr and yet it is not working.  Here 
> is the
> list: tika-core-1.7.jar, tika-java7-1.7.jar, tika-parsers-1.7.jar, 
> tika-xmp-1.7.jar, vorbis-java-tika-0.6.jar, 
> kite-morphlines-tika-core-0.12.1.jar
> and kite-morphlines-tika-decompress-0.12.1.jar.  As part of my 
> program, I also have SolrJ JARs and their dependency:
> solr-solrj-5.2.1.jar, solr-core-5.2.1.jar, etc.
>
> You said "Might not have the parsers on your path within your Solr 
> framework?".  I"m using Tika outside Solr framework.  I'm trying to 
> use Tika from my own crawler application that uses SojrJ to send the 
> raw text to Solr for indexing.
>
> What is it that I am missing?!
>
> Steve
>
> On Tue, Feb 2, 2016 at 3:03 PM, Allison, Timothy B. 
> 
> wrote:
>
>> Might not have the parsers on your path within your Solr framework?
>>
>> Which tika jars are on your path?
>>
>> If you want the functionality of all of Tika, use the standalone 
>> tika-app.jar, but do not use the app in the same JVM as 
>> Solr...without a custom class loader.  The Solr team carefully prunes 
>> the dependencies when integrating Tika and makes sure that the main parsers 
>> _just work_.
>>
>>
>> -Original Message-
>> From: Steven White [mailto:swhite4...@gmail.com]
>> Sent: Tuesday, February 02, 2016 2:53 PM
>> To: solr-user@lucene.apache.org
>> Subject: Using Tika that comes with Solr 5.2
>>
>> Hi,
>>
>> I'm trying to use Tika that comes with Solr 5.2.  The following code 
>> is not
>> working:
>>
>> public static void parseWithTika() throws Exception {
>> File file = new File("C:\\temp\\test.pdf");
>>
>> FileInputStream in = new FileInputStream(file);
>> AutoDetectParser parser = new AutoDetectParser();
>> Metadata metadata = new Metadata();
>> metadata.add(Metadata.RESOURCE_NAME_KEY, file.getName());
>> BodyContentHandler contentHandler = new 

Re: possible to dump "routing table" from a single Solr node?

2016-02-03 Thread Shawn Heisey
On 2/3/2016 11:49 AM, Ian Rose wrote:
> I'm having a situation where our SolrCloud cluster often gets into a bad
> state where our solr nodes frequently respond with "no servers hosting
> shard" even though the node that hosts that shard is clearly up.  We
> suspect that this is a state bug where some servers are somehow ending up
> with an incorrect view of the network (e.g. which nodes are up/down, which
> shards are hosted on which nodes).  Is it possible to somehow get a "dump"
> of the current "routing table" (i.e. documents with prefixes in this range
> in this collection are stored in this shard on this node)?  That would help
> immensely when debugging.

The clusterstate (in zookeeper) has this information.  You can see the
zookeeper database within the Solr admin UI by clicking on Cloud and
then Tree.

In 4.x Solr versions (and when upgrading from 4.x to 5.x) the
clusterstate is maintained in a single "clusterstate.json" file at the
root of the SolrCloud information in zookeeper.  In 5.x, each collection
has its own "state.json" file.

It is possible to gather this information programmatically with HTTP.  I
do not remember the exact URLs, but you can find them by watching what
the admin UI does in your browser's dev console.

Thanks,
Shawn



Re: Tutorial or Code Samples to explain how to Write Solr Plugins

2016-02-03 Thread Upayavira
Not a tutorial as such, but here's some simple infrastructure for
building Solr components alongside Solr:

https://github.com/upayavira/custom-solr-components

I suspect you're past that stage already though.

Upayavira

On Wed, Feb 3, 2016, at 04:45 PM, Binoy Dalal wrote:
> Here's a couple of links you can follow to get started:
> https://www.slideshare.net/mobile/searchbox-com/tutorial-on-developin-a-solr-search-component-plugin
> https://www.slideshare.net/mobile/searchbox-com/develop-a-solr-request-handler-plugin
> These are to write a search component and a request handler respectively.
> They are on older solr versions but they should work with 5.x as well.
> I used these to get started when I was trying to write my first plugin.
> Once you get a hang of how it's to be done it's really not that
> difficult.
> 
> On Wed, 3 Feb 2016, 21:59 Gian Maria Ricci - aka Alkampfer <
> alkamp...@nablasoft.com> wrote:
> 
> > Hi,
> >
> >
> >
> > I wonder if there is some code samples or tutorial (updated to work with
> > version 5) to help users writing plugins.
> >
> >
> >
> > I’ve found lots of difficulties on the past to find such kind of
> > information when I needed to write some plugins, and I wonder if I missed
> > some site or link that does a better explanation than official page
> > http://wiki.apache.org/solr/SolrPlugins that is really old.
> >
> >
> >
> > Thanks in advance.
> >
> >
> >
> > --
> > Gian Maria Ricci
> > Cell: +39 320 0136949
> >
> > [image:
> > https://ci5.googleusercontent.com/proxy/5oNMOYAeFXZ_LDKanNfoLRHC37mAZkVVhkPN7QxMdA0K5JW2m0bm8azJe7oWZMNt8fKHNX1bzrUTd-kIyE40CmwT2Mlf8OI=s0-d-e1-ft#http://www.codewrecks.com/files/signature/mvp.png]
> >  [image:
> > https://ci3.googleusercontent.com/proxy/f-unQbmk6NtkHFspO5Y6x4jlIf_xrmGLUT3fU9y_7VUHSFUjLs7aUIMdZQYTh3eWIA0sBnvNX3WGXCU59chKXLuAHi2ArWdAcBclKA=s0-d-e1-ft#http://www.codewrecks.com/files/signature/linkedin.jpg]
> >  [image:
> > https://ci3.googleusercontent.com/proxy/gjapMzu3KEakBQUstx_-cN7gHJ_GpcIZNEPjCzOYMrPl-r1DViPE378qNAQyEWbXMTj6mcduIAGaApe9qHG1KN_hyFxQAIkdNSVT=s0-d-e1-ft#http://www.codewrecks.com/files/signature/twitter.jpg]
> >  [image:
> > https://ci5.googleusercontent.com/proxy/iuDOD2sdaxRDvTwS8MO7-CcXchpNJX96uaWuvagoVLcjpAPsJi88XeOonE4vHT6udVimo7yL9ZtdrYueEfH7jXnudmi_Vvw=s0-d-e1-ft#http://www.codewrecks.com/files/signature/rss.jpg]
> >  [image:
> > https://ci6.googleusercontent.com/proxy/EBJjfkBzcsSlAzlyR88y86YXcwaKfn3x7ydAObL1vtjJYclQr_l5TvrFx4PQ5qLNYW3yp7Ig66DJ-0tPJCDbDmYAFcamPQehwg=s0-d-e1-ft#http://www.codewrecks.com/files/signature/skype.jpg]
> >
> >
> >
> -- 
> Regards,
> Binoy Dalal


possible to dump "routing table" from a single Solr node?

2016-02-03 Thread Ian Rose
Hi all,

I'm having a situation where our SolrCloud cluster often gets into a bad
state where our solr nodes frequently respond with "no servers hosting
shard" even though the node that hosts that shard is clearly up.  We
suspect that this is a state bug where some servers are somehow ending up
with an incorrect view of the network (e.g. which nodes are up/down, which
shards are hosted on which nodes).  Is it possible to somehow get a "dump"
of the current "routing table" (i.e. documents with prefixes in this range
in this collection are stored in this shard on this node)?  That would help
immensely when debugging.

Thanks!
- Ian