date:20160217

Indexing will happen on everything.
Just use the copy fields to return data. That's it. They do not need to be
indexed just stored.

On Thu, 18 Feb 2016, 12:52 Anil  wrote:

> Thanks Binoy.
>
> index should happen on everything.
>
> But retrial/fetch should limit the characters. is it possible ?
>
> Regards,
> Anil
>
> On 18 February 2016 at 12:24, Binoy Dalal  wrote:
>
> > If you are not particular about what part of the field is returned you
> can
> > create copy fields and set a limit on those to store only the number of
> > characters you want.
> >
> > 
> >
> > This will copy over the first 500 chars of the contents of your SRC field
> > to your dest field.
> > Anything beyond this will be truncated.
> >
> > On Thu, 18 Feb 2016, 12:00 Anil  wrote:
> >
> > > Hi ,
> > >
> > > we have around 30 fields in solr document. and we search for text in
> all
> > > fields (by creating a record field with copy field).
> > >
> > > few fields have huge text .. order of mbs. how i can get only a
> fragment
> > > of fields in a configurable way.
> > >
> > > we have to display each field content on UI. so its must to get content
> > of
> > > each field.
> > >
> > > for now, i am fetching the content from solr and truncating it in my
> > code.
> > > but it has poor performance.
> > >
> > > is there any way to achieve fragmentation (not a highlight
> fragmentation)
> > > in solr ? please advice.
> > >
> > > Regards,
> > > Anil
> > >
> > --
> > Regards,
> > Binoy Dalal
> >
>
-- 
Regards,
Binoy Dalal

Querying data based on field type

2016-02-17 Thread Salman Ansari

Hi,

Due to some mis-configuration issues, I have a field that has values as
single string and an array of strings. Looks like there are some old values
that got indexed as an array of strings while anything new are single
valued string. I have checked the configuration and multivalued for that
field is set to false. What I want is to remove all the occurrences of the
field as an array (multi-valued) where it shows as  instead of
. Is there a way to query the field so it returns only those
documents that have field as an array and not as a single string?

Appreciate your comments/feedback.

Regards,
Salman

Re: Get one fragment of text of field

Thanks Binoy.

index should happen on everything.

But retrial/fetch should limit the characters. is it possible ?

Regards,
Anil

On 18 February 2016 at 12:24, Binoy Dalal  wrote:

> If you are not particular about what part of the field is returned you can
> create copy fields and set a limit on those to store only the number of
> characters you want.
>
> 
>
> This will copy over the first 500 chars of the contents of your SRC field
> to your dest field.
> Anything beyond this will be truncated.
>
> On Thu, 18 Feb 2016, 12:00 Anil  wrote:
>
> > Hi ,
> >
> > we have around 30 fields in solr document. and we search for text in all
> > fields (by creating a record field with copy field).
> >
> > few fields have huge text .. order of mbs. how i can get only a  fragment
> > of fields in a configurable way.
> >
> > we have to display each field content on UI. so its must to get content
> of
> > each field.
> >
> > for now, i am fetching the content from solr and truncating it in my
> code.
> > but it has poor performance.
> >
> > is there any way to achieve fragmentation (not a highlight fragmentation)
> > in solr ? please advice.
> >
> > Regards,
> > Anil
> >
> --
> Regards,
> Binoy Dalal
>

Re: solr-4.3.1 docValues usage

2016-02-17 Thread Neeraj Lajpal

Can someone please help me with this?
I am stuck for past few days.



> On 15-Feb-2016, at 6:39 PM, Neeraj Lajpal  wrote:
> 
> Hi,
> 
> I recently asked this question on stackoverflow:
> 
> I am trying to access a field in custom request handler. I am accessing it 
> like this for each document:
> 
> Document doc;
> doc = reader.document(id);
> DocFields = doc.getValues("state");
> There are around 600,000 documents in the solr. For a query running on all 
> the docs, it is taking more than 65 seconds.
> 
> I have also tried SolrIndexSearcher.doc method, but it is also taking around 
> 60 seconds.
> 
> Removing the above lines of code bring down the qtime to milliseconds. But, I 
> need to access that field for my algo.
> 
> Is there a more optimised way to do this?
> 
> 
> 
> 
> In reply, I got a suggestion to use docValues. I read about it and it seems 
> to be useful for my case. But, I am unable to find/figure out how to use it 
> in my custom request handler.
> 
> Please tell me if there is some function/api to access a docValue field from 
> custom handler, that takes input the doc-id and field.
> 
> Thanks,
> Neeraj Lajpal

Re: Do all SolrCloud nodes communicate with the database when indexing a collection?

2016-02-17 Thread Anshum Gupta

Hi Colin,

As per when I last checked, DIH works with SolrCloud but has it's
limitations. It was designed for the non-cloud mode and is single threaded.
It runs on whatever node you set it up on and that node might not host the
leader for the shard a document belongs to, adding an extra hop for those
documents.

SolrCloud is designed for multi-threaded indexing and I'd highly recommend
you to use SolrJ to speed up your indexing. Yes, that would involve writing
some code but it would speed things up considerably.

On Wed, Feb 17, 2016 at 10:51 PM, Colin Freas  wrote:

>
> I just set up a SolrCloud instance with 2 Solr nodes & another machine
> running zookeeper.
>
> I’ve imported 200M records from a SQL Server database, and those records
> are split nicely between the 2 nodes.  Everything seems ok.
>
> I did the data import via the admin ui.  It took not quite 8 hours, which
> I guess is fine.  So, in the middle of the import I checked to see what was
> connected to the SQL Server machine.  It turned out that only the node that
> I had started the import on was actually connected to my database server.
>
> Is that the expected behavior?  Is there any way to have all nodes of a
> SolrCloud index communicate with the database during the indexing?  Would
> that speed up indexing?  Maybe this isn’t a bottleneck I should be worried
> about.
>
> Thanks,
> -Colin
>

-- 
Anshum Gupta

Re: Get one fragment of text of field

If you are not particular about what part of the field is returned you can
create copy fields and set a limit on those to store only the number of
characters you want.

This will copy over the first 500 chars of the contents of your SRC field
to your dest field.
Anything beyond this will be truncated.

On Thu, 18 Feb 2016, 12:00 Anil  wrote:

> Hi ,
>
> we have around 30 fields in solr document. and we search for text in all
> fields (by creating a record field with copy field).
>
> few fields have huge text .. order of mbs. how i can get only a  fragment
> of fields in a configurable way.
>
> we have to display each field content on UI. so its must to get content of
> each field.
>
> for now, i am fetching the content from solr and truncating it in my code.
> but it has poor performance.
>
> is there any way to achieve fragmentation (not a highlight fragmentation)
> in solr ? please advice.
>
> Regards,
> Anil
>
-- 
Regards,
Binoy Dalal

Do all SolrCloud nodes communicate with the database when indexing a collection?

2016-02-17 Thread Colin Freas


I just set up a SolrCloud instance with 2 Solr nodes & another machine running 
zookeeper.

I’ve imported 200M records from a SQL Server database, and those records are 
split nicely between the 2 nodes.  Everything seems ok.

I did the data import via the admin ui.  It took not quite 8 hours, which I 
guess is fine.  So, in the middle of the import I checked to see what was 
connected to the SQL Server machine.  It turned out that only the node that I 
had started the import on was actually connected to my database server.

Is that the expected behavior?  Is there any way to have all nodes of a 
SolrCloud index communicate with the database during the indexing?  Would that 
speed up indexing?  Maybe this isn’t a bottleneck I should be worried about.

Thanks,
-Colin

Get one fragment of text of field

Hi ,

we have around 30 fields in solr document. and we search for text in all
fields (by creating a record field with copy field).

few fields have huge text .. order of mbs. how i can get only a  fragment
of fields in a configurable way.

we have to display each field content on UI. so its must to get content of
each field.

for now, i am fetching the content from solr and truncating it in my code.
but it has poor performance.

is there any way to achieve fragmentation (not a highlight fragmentation)
in solr ? please advice.

Regards,
Anil

Re: Best practices for Solr (how to update jar files safely)

On 2/17/2016 10:38 PM, Brian Wright wrote:
> We have a new project to use Solr. Our Solr instance will use Jetty
> rather than Tomcat. We plan to extend the Solr core system by adding
> additional classes (jar files) to the
> /opt/solr/server/solr-webapp/webapp/WEB-INF/lib directory to extend
> features. We also plan to run two instances of Solr on each physical
> server preferably from a single installed Solr instance. I've read the
> best practices doc on running two Solr instances, and while it's
> detailed about how to set up two instances, it doesn't cover our
> specific use case.

Why do you want to run multiple instances on one server?  Unless you
have a REALLY good reason to have more than one instance per server,
don't do it.  One instance can handle many indexes with no problem.

The only valid reason I can think of to run more than one instance per
machine is when a single instance requires a VERY large heap.  In that
case, it *might* be better to run two instances that each have a smaller
heap, so that garbage collection times are lower.  I personally would
add more machines, rather than run multiple instances.

Generally the best way to load custom jars (and contrib components like
the dataimport handler) in Solr is to create a "lib" directory in the
solr home (where solr.xml lives) and place all extra jars there.  They
will be loaded once when Solr starts, and all cores will have access to
them.

The rest of your email was concerned with running multiple instances. 
If you *REALLY* want to go against advice and do this, here's the
recommended way:

https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production#TakingSolrtoProduction-RunningmultipleSolrnodesperhost

It is very likely possible to run multiple instances out of the same
installation directory, but I am not sure how to do it.

Thanks,
Shawn

Re: Display entire string containing query string

Append =

On Thu, 18 Feb 2016, 11:35 Tom Running  wrote:

> Hello,
>
> I am working on a project using Solr to search data from retrieved from
> Nutch.
>
> I have successfully integrated Nutch with Solr, and Solr is able to search
> Nutch's data.
>
> However I am having a bit of a problem. If I query Solr, it will bring back
> the numfound and which document the query string was found in, but it will
> not display the string that contains the query string.
>
> Can anyone help on how to display the entire string that contains the
> query.
>
>
> I appreciate your time and guidance. Thank you so much!
>
> -T
>
-- 
Regards,
Binoy Dalal

Display entire string containing query string

2016-02-17 Thread Tom Running

Hello,

I am working on a project using Solr to search data from retrieved from
Nutch.

I have successfully integrated Nutch with Solr, and Solr is able to search
Nutch's data.

However I am having a bit of a problem. If I query Solr, it will bring back
the numfound and which document the query string was found in, but it will
not display the string that contains the query string.

Can anyone help on how to display the entire string that contains the query.


I appreciate your time and guidance. Thank you so much!

-T

Best practices for Solr (how to update jar files safely)

2016-02-17 Thread Brian Wright


Hello,

We have a new project to use Solr. Our Solr instance will use Jetty 
rather than Tomcat. We plan to extend the Solr core system by adding 
additional classes (jar files) to the 
/opt/solr/server/solr-webapp/webapp/WEB-INF/lib directory to extend 
features. We also plan to run two instances of Solr on each physical 
server preferably from a single installed Solr instance. I've read the 
best practices doc on running two Solr instances, and while it's 
detailed about how to set up two instances, it doesn't cover our 
specific use case.


For ease of our custom jar deployments, we would prefer to have only one 
install of Solr. However, I have read that the JVM's default class 
loader provides lazy class loading. This means that should we replace a 
jar file (during new release deployments) and the JVM goes looking for 
the old jar after replacement, the JVM could crash and dump. Is lazy 
class loading a concern with Solr and specifically when custom built 
jars are used?


Would it be better to use two separate installs of Solr and manage each 
independently or can we safely get away with using a single Solr install 
and then update jars in this instance for both running instances? I'm 
trying to find the best safety practices for release deployment with the 
way we intend to install and use our Solr in our environment. I should 
also mention that running Solr instances cannot be down concurrently due 
to cluster sharding.


Is there any other way through CLASSPATH or other config/runtime methods 
where we could use a single install, but separate the WEB-INF directory 
into two to support each separate running instance? Are there any other 
ideas?


Any advice is appreciated.

Thanks.

--
Signature

*Brian Wright*
*Sr. Systems Engineer *
901 Mariners Island Blvd Suite 200
San Mateo, CA 94404 USA
*Email *bri...@marketo.com 
*Phone *+1.650.539.3530**
*www.marketo.com *

Marketo Logo

Re: Highlight brings the content from the first pages of pdf

Thanks Philippe.

i am using hl.fl=*, when a field is available in highlight section, is it
possible to skip that filed in the main response ? please clarify.

Regards,
Anil

On 18 February 2016 at 08:42, Philippe Soares 
wrote:

> You can put fields that you want to retrieve without highlighting in the
> "fl" parameter, and the large fields in the "hl.fl" parameter. Those will
> go in the highlight section only. It may also be a good idea to add
> hl.requiresFieldMatch=true.
>
> E.g. : fl=id=true=field1,field2=true
>
> Note that you can also use wildcards in fl or hl.fl. So if you want to
> highlight all the fields, just set hl.fl=*
> On Feb 17, 2016 21:49, "Anil"  wrote:
>
> > Thanks Binoy. But this may not help my usecase.
> >
> > I am storing and indexing huge documents in solr. when no search text
> > matches with that filed text, i should skip that field of the document.
> > when match exists, it should be part of highlight section.
> >
> > fl may not be right option in my case.
> >
> > Any suggestions would be appreciated.
> >
> > Regards,
> > Anil
> >
> > On 16 February 2016 at 13:43, Binoy Dalal 
> wrote:
> >
> > > Yeah.
> > > Under  an entry like so:
> > > fields
> > >
> > > On Tue, 16 Feb 2016, 13:00 Anil  wrote:
> > >
> > > > you mean default fl ?
> > > >
> > > > On 16 February 2016 at 12:57, Binoy Dalal 
> > > wrote:
> > > >
> > > > > Oh wait. We don't append the fl parameter to the query.
> > > > > We've configured it in the request handler in solrconfig.xml
> > > > > Maybe that is something that you can do.
> > > > >
> > > > > On Tue, 16 Feb 2016, 12:39 Anil  wrote:
> > > > >
> > > > > > Thanks for your response Binoy.
> > > > > >
> > > > > > Yes.I am looking for any alternative to this. With long number of
> > > > fileds,
> > > > > > url will become long and might lead to "url too long exception"
> > when
> > > > > using
> > > > > > http request.
> > > > > >
> > > > > > On 16 February 2016 at 11:01, Binoy Dalal <
> binoydala...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Filling in the fl parameter with all the required fields is
> what
> > we
> > > > do
> > > > > at
> > > > > > > my project as well, and I don't think there is any alternative
> to
> > > > this.
> > > > > > >
> > > > > > > Maybe somebody else can advise on this?
> > > > > > >
> > > > > > > On Tue, 16 Feb 2016, 10:30 Anil  wrote:
> > > > > > >
> > > > > > > > Any help on this ? Thanks.
> > > > > > > >
> > > > > > > > On 15 February 2016 at 19:06, Anil 
> wrote:
> > > > > > > >
> > > > > > > > > Yes. But i have long list of fields.
> > > > > > > > >
> > > > > > > > > i feel adding all the fileds in fl is not good practice
> > unless
> > > > one
> > > > > > > > > interested in few fields. In my case, i am interested in
> all
> > > > fields
> > > > > > > > except
> > > > > > > > > the one .
> > > > > > > > >
> > > > > > > > > is there any alternative approach ? Thanks in advance.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On 15 February 2016 at 17:27, Binoy Dalal <
> > > > binoydala...@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> If I understand correctly, you have already highlighted
> the
> > > > field
> > > > > > and
> > > > > > > > only
> > > > > > > > >> want to return the highlights and not the field itself.
> > > > > > > > >> Well in that case, simply remove the field name from your
> fl
> > > > list.
> > > > > > > > >>
> > > > > > > > >> On Mon, 15 Feb 2016, 17:04 Anil 
> wrote:
> > > > > > > > >>
> > > > > > > > >> > HOw can highlighted field excluded in the main result ?
> as
> > > it
> > > > is
> > > > > > > > >> available
> > > > > > > > >> > in the highlight section.
> > > > > > > > >> >
> > > > > > > > >> > In my scenario, One filed (lets say commands) of the
> each
> > > solr
> > > > > > > > document
> > > > > > > > >> > would be around 10 mg. I dont want to fetch that filed
> in
> > > > > response
> > > > > > > > when
> > > > > > > > >> its
> > > > > > > > >> > highlight snippets available in the response.
> > > > > > > > >> >
> > > > > > > > >> > Please advice.
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > On 15 February 2016 at 15:36, Evert R. <
> > > evert.ra...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > >> >
> > > > > > > > >> > > Hello Mark,
> > > > > > > > >> > >
> > > > > > > > >> > > Thanks for you reply.
> > > > > > > > >> > >
> > > > > > > > >> > > All text is indexed (1 pdf file). It works now.
> > > > > > > > >> > >
> > > > > > > > >> > > Best regard,
> > > > > > > > >> > >
> > > > > > > > >> > >
> > > > > > > > >> > > *--Evert*
> > > > > > > > >> > >
> > > > > > > > >> > > 2016-02-14 23:47 GMT-02:00 Mark Ehle <
> > marke...@gmail.com
> > > >:
> > > > > > > > >> > >
> > > > > > > > >> > > > is

Re: Highlight brings the content from the first pages of pdf

2016-02-17 Thread Philippe Soares

You can put fields that you want to retrieve without highlighting in the
"fl" parameter, and the large fields in the "hl.fl" parameter. Those will
go in the highlight section only. It may also be a good idea to add
hl.requiresFieldMatch=true.

E.g. : fl=id=true=field1,field2=true

Note that you can also use wildcards in fl or hl.fl. So if you want to
highlight all the fields, just set hl.fl=*
On Feb 17, 2016 21:49, "Anil"  wrote:

> Thanks Binoy. But this may not help my usecase.
>
> I am storing and indexing huge documents in solr. when no search text
> matches with that filed text, i should skip that field of the document.
> when match exists, it should be part of highlight section.
>
> fl may not be right option in my case.
>
> Any suggestions would be appreciated.
>
> Regards,
> Anil
>
> On 16 February 2016 at 13:43, Binoy Dalal  wrote:
>
> > Yeah.
> > Under  an entry like so:
> > fields
> >
> > On Tue, 16 Feb 2016, 13:00 Anil  wrote:
> >
> > > you mean default fl ?
> > >
> > > On 16 February 2016 at 12:57, Binoy Dalal 
> > wrote:
> > >
> > > > Oh wait. We don't append the fl parameter to the query.
> > > > We've configured it in the request handler in solrconfig.xml
> > > > Maybe that is something that you can do.
> > > >
> > > > On Tue, 16 Feb 2016, 12:39 Anil  wrote:
> > > >
> > > > > Thanks for your response Binoy.
> > > > >
> > > > > Yes.I am looking for any alternative to this. With long number of
> > > fileds,
> > > > > url will become long and might lead to "url too long exception"
> when
> > > > using
> > > > > http request.
> > > > >
> > > > > On 16 February 2016 at 11:01, Binoy Dalal 
> > > > wrote:
> > > > >
> > > > > > Filling in the fl parameter with all the required fields is what
> we
> > > do
> > > > at
> > > > > > my project as well, and I don't think there is any alternative to
> > > this.
> > > > > >
> > > > > > Maybe somebody else can advise on this?
> > > > > >
> > > > > > On Tue, 16 Feb 2016, 10:30 Anil  wrote:
> > > > > >
> > > > > > > Any help on this ? Thanks.
> > > > > > >
> > > > > > > On 15 February 2016 at 19:06, Anil  wrote:
> > > > > > >
> > > > > > > > Yes. But i have long list of fields.
> > > > > > > >
> > > > > > > > i feel adding all the fileds in fl is not good practice
> unless
> > > one
> > > > > > > > interested in few fields. In my case, i am interested in all
> > > fields
> > > > > > > except
> > > > > > > > the one .
> > > > > > > >
> > > > > > > > is there any alternative approach ? Thanks in advance.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On 15 February 2016 at 17:27, Binoy Dalal <
> > > binoydala...@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >> If I understand correctly, you have already highlighted the
> > > field
> > > > > and
> > > > > > > only
> > > > > > > >> want to return the highlights and not the field itself.
> > > > > > > >> Well in that case, simply remove the field name from your fl
> > > list.
> > > > > > > >>
> > > > > > > >> On Mon, 15 Feb 2016, 17:04 Anil  wrote:
> > > > > > > >>
> > > > > > > >> > HOw can highlighted field excluded in the main result ? as
> > it
> > > is
> > > > > > > >> available
> > > > > > > >> > in the highlight section.
> > > > > > > >> >
> > > > > > > >> > In my scenario, One filed (lets say commands) of the each
> > solr
> > > > > > > document
> > > > > > > >> > would be around 10 mg. I dont want to fetch that filed in
> > > > response
> > > > > > > when
> > > > > > > >> its
> > > > > > > >> > highlight snippets available in the response.
> > > > > > > >> >
> > > > > > > >> > Please advice.
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > On 15 February 2016 at 15:36, Evert R. <
> > evert.ra...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > >> >
> > > > > > > >> > > Hello Mark,
> > > > > > > >> > >
> > > > > > > >> > > Thanks for you reply.
> > > > > > > >> > >
> > > > > > > >> > > All text is indexed (1 pdf file). It works now.
> > > > > > > >> > >
> > > > > > > >> > > Best regard,
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > *--Evert*
> > > > > > > >> > >
> > > > > > > >> > > 2016-02-14 23:47 GMT-02:00 Mark Ehle <
> marke...@gmail.com
> > >:
> > > > > > > >> > >
> > > > > > > >> > > > is all the text being indexed? Check to make sure that
> > > > there's
> > > > > > > >> actually
> > > > > > > >> > > the
> > > > > > > >> > > > data you are looking for in the index. Is there a
> > setting
> > > in
> > > > > > tika
> > > > > > > >> that
> > > > > > > >> > > > limits how much is indexed? I seem to remember
> > confronting
> > > > > this
> > > > > > > >> problem
> > > > > > > >> > > > myself once, and the data that I wanted just wasn't in
> > the
> > > > > index
> > > > > > > >> > because
> > > > > > > >> > > it
> > > > > > > >> > > >

Re: Highlight brings the content from the first pages of pdf

Thanks Binoy. But this may not help my usecase.

I am storing and indexing huge documents in solr. when no search text
matches with that filed text, i should skip that field of the document.
when match exists, it should be part of highlight section.

fl may not be right option in my case.

Any suggestions would be appreciated.

Regards,
Anil

On 16 February 2016 at 13:43, Binoy Dalal  wrote:

> Yeah.
> Under  an entry like so:
> fields
>
> On Tue, 16 Feb 2016, 13:00 Anil  wrote:
>
> > you mean default fl ?
> >
> > On 16 February 2016 at 12:57, Binoy Dalal 
> wrote:
> >
> > > Oh wait. We don't append the fl parameter to the query.
> > > We've configured it in the request handler in solrconfig.xml
> > > Maybe that is something that you can do.
> > >
> > > On Tue, 16 Feb 2016, 12:39 Anil  wrote:
> > >
> > > > Thanks for your response Binoy.
> > > >
> > > > Yes.I am looking for any alternative to this. With long number of
> > fileds,
> > > > url will become long and might lead to "url too long exception" when
> > > using
> > > > http request.
> > > >
> > > > On 16 February 2016 at 11:01, Binoy Dalal 
> > > wrote:
> > > >
> > > > > Filling in the fl parameter with all the required fields is what we
> > do
> > > at
> > > > > my project as well, and I don't think there is any alternative to
> > this.
> > > > >
> > > > > Maybe somebody else can advise on this?
> > > > >
> > > > > On Tue, 16 Feb 2016, 10:30 Anil  wrote:
> > > > >
> > > > > > Any help on this ? Thanks.
> > > > > >
> > > > > > On 15 February 2016 at 19:06, Anil  wrote:
> > > > > >
> > > > > > > Yes. But i have long list of fields.
> > > > > > >
> > > > > > > i feel adding all the fileds in fl is not good practice unless
> > one
> > > > > > > interested in few fields. In my case, i am interested in all
> > fields
> > > > > > except
> > > > > > > the one .
> > > > > > >
> > > > > > > is there any alternative approach ? Thanks in advance.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On 15 February 2016 at 17:27, Binoy Dalal <
> > binoydala...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > >> If I understand correctly, you have already highlighted the
> > field
> > > > and
> > > > > > only
> > > > > > >> want to return the highlights and not the field itself.
> > > > > > >> Well in that case, simply remove the field name from your fl
> > list.
> > > > > > >>
> > > > > > >> On Mon, 15 Feb 2016, 17:04 Anil  wrote:
> > > > > > >>
> > > > > > >> > HOw can highlighted field excluded in the main result ? as
> it
> > is
> > > > > > >> available
> > > > > > >> > in the highlight section.
> > > > > > >> >
> > > > > > >> > In my scenario, One filed (lets say commands) of the each
> solr
> > > > > > document
> > > > > > >> > would be around 10 mg. I dont want to fetch that filed in
> > > response
> > > > > > when
> > > > > > >> its
> > > > > > >> > highlight snippets available in the response.
> > > > > > >> >
> > > > > > >> > Please advice.
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On 15 February 2016 at 15:36, Evert R. <
> evert.ra...@gmail.com
> > >
> > > > > wrote:
> > > > > > >> >
> > > > > > >> > > Hello Mark,
> > > > > > >> > >
> > > > > > >> > > Thanks for you reply.
> > > > > > >> > >
> > > > > > >> > > All text is indexed (1 pdf file). It works now.
> > > > > > >> > >
> > > > > > >> > > Best regard,
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > *--Evert*
> > > > > > >> > >
> > > > > > >> > > 2016-02-14 23:47 GMT-02:00 Mark Ehle  >:
> > > > > > >> > >
> > > > > > >> > > > is all the text being indexed? Check to make sure that
> > > there's
> > > > > > >> actually
> > > > > > >> > > the
> > > > > > >> > > > data you are looking for in the index. Is there a
> setting
> > in
> > > > > tika
> > > > > > >> that
> > > > > > >> > > > limits how much is indexed? I seem to remember
> confronting
> > > > this
> > > > > > >> problem
> > > > > > >> > > > myself once, and the data that I wanted just wasn't in
> the
> > > > index
> > > > > > >> > because
> > > > > > >> > > it
> > > > > > >> > > > was never put there in the first place.Something about
> > > > > > >> > setMaxStringLength
> > > > > > >> > > > orsomething.
> > > > > > >> > > >
> > > > > > >> > > > On Sun, Feb 14, 2016 at 8:28 PM, Binoy Dalal <
> > > > > > >> binoydala...@gmail.com>
> > > > > > >> > > > wrote:
> > > > > > >> > > >
> > > > > > >> > > > > What you've done so far will highlight every instance
> of
> > > > > > "nietava"
> > > > > > >> > > found
> > > > > > >> > > > in
> > > > > > >> > > > > the field, and return it, i.e., your entire field will
> > > > return
> > > > > > with
> > > > > > >> > all
> > > > > > >> > > > the
> > > > > > >> > > > > "nietava"s in  tags.
> > > > > > >> > > > > If you do not want the entire field, only portions of
> > your

Re: Running solr as a service vs. Running it as a process

That's a bummer.
Anyhow I'll give it a shot and update this thread if I get anywhere.

Thanks for your help.

On Thu, 18 Feb 2016, 04:30 Shawn Heisey  wrote:

> On 2/17/2016 11:37 AM, Binoy Dalal wrote:
> > At my project, we aren't that big on directory and user set up but the
> fact
> > that services can be started and stopped automatically on server reboots
> > and ensuring single running copies of the service is of significance.
> > Now currently we are running Solr 4.4 but pretty soon we're going to
> > upgrade to Solr 4.10.4.
> > If I'm not wrong, the install scripts that set everything up ship with
> Solr
> > 5.x. In this case, how do I set up my solr instances to behave like
> > services created by the scripts in 5.x?
> > I understand that this would entail setting up the init scripts and
> > environment variables, and I do not have a lot of experience with those,
> so
> > it'll be great if you can just walk me through it.
>
> The "bin/solr" script was introduced in version 4.10, but I think the
> service install script did not appear until 5.0.
>
> Between 4.10 and 5.0, Solr's directory structure and the start/install
> scripts underwent a fairly extensive reorganization.  Further
> reorganization and improvements have happened in later 5.x releases.
> Getting the init script working properly with 4.x will involve a lot of
> work to figure out what is different in 4.x and adjust the script.
> There's no formal documentation that you can look at to describe what
> those differences are.  The work was handled in multiple Jira issues,
> and some of it may not be documented at all, other than commits to the
> source code repository.
>
> Thanks,
> Shawn
>
> --
Regards,
Binoy Dalal

Re: Retrieving 1000 records at a time

On 2/17/2016 3:49 PM, Mark Robinson wrote:
> I have around 121 fields out of which 12 of them are indexed and almost all
> 121 are stored.
> Average size of a doc is 10KB.
>
> I was checking for start=0, rows=1000.
> We were querying a Solr instance which was on another server and I think
> network lag might have come into the picture also.
>
> I did not go for any caching as I wanted good response time in the first
> time querying itself.

Stored fields, which contain the data that is returned to the client in
the response, are compressed on disk.  Uncompressing this data can
contribute to the time on a slow query, but I do not think it can
explain 30 seconds of delay.  Very large documents can be particularly
slow to decompress, but you have indicated that each entire document is
about 10K in size, which is not huge.

It is more likely that the delay is caused by one of two things,
possibly both:

* Extremely long garbage collection pauses due to a heap that is too
small or VERY huge (beyond 32GB) with inadequate GC tuning.
* Not enough system memory to effectively cache the index.

Some additional info that may be helpful in tracking this down further:

* For each core on one machine, the size on disk of the data directory.
* For each core, the number of documents and the number of deleted
documents.
* The max heap size for the Solr JVM.
* Whether there is more than one Solr instance per server.
* The total installed memory size in the server.
* Whether or not the server is used for other applications.
* What operating system the server is running.
* Whether the index is distributed or contained in a single core.
* Whether Solr is in SolrCloud mode or not.
* Solr version.

Thanks,
Shawn

Hitting complex multilevel pivot queries in solr

2016-02-17 Thread Lewin Joy (TMS)

Hi,

Is there an efficient way to hit solr for complex time consuming queries?
I have a requirement where I need to pivot on 4 fields. Two fields contain 
facet values close to 50. And the other 2 fields have 5000 and 8000 values. 
Pivoting on the 4 fields would crash the server.

Is there a better way to get the data?

Example Query Params looks like this:
=country,state,part_num,part_code

Thanks,
Lewin

Re: Running solr as a service vs. Running it as a process

On 2/17/2016 11:37 AM, Binoy Dalal wrote:
> At my project, we aren't that big on directory and user set up but the fact
> that services can be started and stopped automatically on server reboots
> and ensuring single running copies of the service is of significance.
> Now currently we are running Solr 4.4 but pretty soon we're going to
> upgrade to Solr 4.10.4.
> If I'm not wrong, the install scripts that set everything up ship with Solr
> 5.x. In this case, how do I set up my solr instances to behave like
> services created by the scripts in 5.x?
> I understand that this would entail setting up the init scripts and
> environment variables, and I do not have a lot of experience with those, so
> it'll be great if you can just walk me through it.

The "bin/solr" script was introduced in version 4.10, but I think the
service install script did not appear until 5.0.

Between 4.10 and 5.0, Solr's directory structure and the start/install
scripts underwent a fairly extensive reorganization.  Further
reorganization and improvements have happened in later 5.x releases. 
Getting the init script working properly with 4.x will involve a lot of
work to figure out what is different in 4.x and adjust the script. 
There's no formal documentation that you can look at to describe what
those differences are.  The work was handled in multiple Jira issues,
and some of it may not be documented at all, other than commits to the
source code repository.

Thanks,
Shawn

Re: Errors on master after upgrading to 4.10.3

2016-02-17 Thread Joseph Hagerty

Ahh, makes sense. I did have a feeling I was barking up the wrong tree
since it's an Extraction issue, but I thought I'd throw it out there,
anyway.

Thanks so much for the information!

On Wed, Feb 17, 2016 at 4:49 PM, Rachel Lynn Underwood <
r.lynn.underw...@gmail.com> wrote:

> This is an error being thrown by Apache PDFBox/Tika. You're seeing it now
> because Solr 4.x uses a different Tika version than Solr 3.x.
>
> It looks like this error is thrown when you parse a PDF with Tika, and a
> font in that PDF doesn't have a ToUnicode mapping.
> https://issues.apache.org/jira/browse/PDFBOX-1408
>
> Another user reported that this might be related to special characters, but
> PDFBox developers haven't been able to reproduce the bug.
> https://issues.apache.org/jira/browse/PDFBOX-1706
>
> Since this isn't an issue in the Solr code, if you're concerned about it,
> you'll probably have better luck asking the PDFBox developers directly, via
> Jira or their mailing list.
>
>
> On Tue, Feb 16, 2016 at 12:08 PM, Joseph Hagerty  wrote:
>
> > Does literally nobody else see this error in their logs? I see this error
> > hundreds of times per day, in occasional bursts. Should I file this as a
> > bug?
> >
> > On Mon, Feb 15, 2016 at 4:56 PM, Joseph Hagerty 
> wrote:
> >
> > > After migrating from 3.5 to 4.10.3, I'm seeing the following error with
> > > alarming regularity in the master's error log:
> > >
> > > 2/15/2016, 4:32:22 PM ERROR PDSimpleFont Can't determine the width of
> the
> > > space character using 250 as default
> > > I can't seem to glean much information about this one from the web. Has
> > > anyone else fought this error?
> > >
> > > In case this helps, here's some technical/miscellaneous info:
> > >
> > > - I'm running a master-slave set-up.
> > >
> > > - I rely on the ERH (tika/solr-cell/whatever) for extracting plaintext
> > > from .docs and .pdfs. I'm guessing that PDSimpleFont is a component of
> > > this, but I don't know the first thing about it.
> > >
> > > - I have the clients specifying 'autocommit=6s' in their requests,
> which
> > I
> > > realize is a pretty aggressive commit interval, but so far that hasn't
> > > caused any problems I couldn't surmount.
> > >
> > > - There are north of 11 million docs in my index, which is 36 gigs
> thick.
> > > The storage volume is only 10% full.
> > >
> > > - When I migrated from 3.5 to 4.10.3, I correctly performed a reindex
> due
> > > to incompatibility between versions.
> > >
> > > - Both master and slave are running on AWS instances, C4.4XL's (16
> cores,
> > > 30 gigs of RAM).
> > >
> > > So far, I have been unable to reproduce this error on my own: I can
> only
> > > observe it in the logs. I haven't been able to tie it to any specific
> > > document.
> > >
> > > Let me know if further information would be helpful.
> > >
> > >
> > >
> > >
> >
> >
> > --
> > - Joe
> >
>



-- 
- Joe

Re: Retrieving 1000 records at a time

2016-02-17 Thread Mark Robinson

Thanks Joel and Chris!

I have around 121 fields out of which 12 of them are indexed and almost all
121 are stored.
Average size of a doc is 10KB.

I was checking for start=0, rows=1000.
We were querying a Solr instance which was on another server and I think
network lag might have come into the picture also.

I did not go for any caching as I wanted good response time in the first
time querying itself.

Thanks much for the links and suggestions. I will go thru each of them.

Best,
Mark.

On Wed, Feb 17, 2016 at 5:26 PM, Chris Hostetter 
wrote:

>
> : I have a requirement where I need to retrieve 1 to 15000 records at a
> : time from SOLR.
> : With 20 or 100 records everything happens in milliseconds.
> : When it goes to 1000, 1  it is taking more time... like even 30
> seconds.
>
> so far all you've really told us about your setup is that some
> queries with "rows=1000" are slow -- but you haven't really told us
> anything else we can help you with -- for example it's not obvious if you
> mean that you are using start=0 in all ofthose queries andthey are slow,
> or if you mean you are paginating through results (ie: increasing start
> param) 1000 at a time nad it starts getting slow as you page deeply.
>
> you also haven't told us anything about the fields you are returning --
> how many are there?, what data types are they? are they large string
> values?
>
> how are you measuring the time? are you sure network lag, or client side
> processing of the data as solr returns it isn't the bulk of the time you
> are measuring?  what does the QTime in the solr responses for these slow
> queries say?
>
> my best guesses are that either: you are doing deep paging and conflating
> the increased response time for deep results with an increase in response
> time for large rows params (because you are getting "deeper" faster with a
> large rows#) or you are seeing an increase in processing time on the
> client due ot the large volume of data being returned -- possibly even
> with SolrJ which is designed to parse the entire response into java
> data structures by default before returning to the client.
>
> w/o more concrete information, it's hard to give you advice beyond
> guesses.
>
>
> potentially helpful links...
>
> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
>
> https://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/
>
> https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets
>
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
>
> https://lucene.apache.org/solr/5_4_0/solr-solrj/org/apache/solr/client/solrj/io/stream/expr/StreamFactory.html
>
>
>
> -Hoss
> http://www.lucidworks.com/
>

Re: Retrieving 1000 records at a time

2016-02-17 Thread Chris Hostetter

: I have a requirement where I need to retrieve 1 to 15000 records at a
: time from SOLR.
: With 20 or 100 records everything happens in milliseconds.
: When it goes to 1000, 1 it is taking more time... like even 30 seconds.

so far all you've really told us about your setup is that some
queries with "rows=1000" are slow -- but you haven't really told us
anything else we can help you with -- for example it's not obvious if you
mean that you are using start=0 in all ofthose queries andthey are slow,
or if you mean you are paginating through results (ie: increasing start
param) 1000 at a time nad it starts getting slow as you page deeply.

you also haven't told us anything about the fields you are returning --
how many are there?, what data types are they? are they large string
values?

how are you measuring the time? are you sure network lag, or client side
processing of the data as solr returns it isn't the bulk of the time you
are measuring? what does the QTime in the solr responses for these slow
queries say?

my best guesses are that either: you are doing deep paging and conflating
the increased response time for deep results with an increase in response
time for large rows params (because you are getting "deeper" faster with a
large rows#) or you are seeing an increase in processing time on the
client due ot the large volume of data being returned -- possibly even
with SolrJ which is designed to parse the entire response into java
data structures by default before returning to the client.

w/o more concrete information, it's hard to give you advice beyond
guesses.

potentially helpful links...

https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
https://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets

https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
https://lucene.apache.org/solr/5_4_0/solr-solrj/org/apache/solr/client/solrj/io/stream/expr/StreamFactory.html

-Hoss
http://www.lucidworks.com/

Re: Errors on master after upgrading to 4.10.3

2016-02-17 Thread Rachel Lynn Underwood

This is an error being thrown by Apache PDFBox/Tika. You're seeing it now
because Solr 4.x uses a different Tika version than Solr 3.x.

It looks like this error is thrown when you parse a PDF with Tika, and a
font in that PDF doesn't have a ToUnicode mapping.
https://issues.apache.org/jira/browse/PDFBOX-1408

Another user reported that this might be related to special characters, but
PDFBox developers haven't been able to reproduce the bug.
https://issues.apache.org/jira/browse/PDFBOX-1706

Since this isn't an issue in the Solr code, if you're concerned about it,
you'll probably have better luck asking the PDFBox developers directly, via
Jira or their mailing list.


On Tue, Feb 16, 2016 at 12:08 PM, Joseph Hagerty  wrote:

> Does literally nobody else see this error in their logs? I see this error
> hundreds of times per day, in occasional bursts. Should I file this as a
> bug?
>
> On Mon, Feb 15, 2016 at 4:56 PM, Joseph Hagerty  wrote:
>
> > After migrating from 3.5 to 4.10.3, I'm seeing the following error with
> > alarming regularity in the master's error log:
> >
> > 2/15/2016, 4:32:22 PM ERROR PDSimpleFont Can't determine the width of the
> > space character using 250 as default
> > I can't seem to glean much information about this one from the web. Has
> > anyone else fought this error?
> >
> > In case this helps, here's some technical/miscellaneous info:
> >
> > - I'm running a master-slave set-up.
> >
> > - I rely on the ERH (tika/solr-cell/whatever) for extracting plaintext
> > from .docs and .pdfs. I'm guessing that PDSimpleFont is a component of
> > this, but I don't know the first thing about it.
> >
> > - I have the clients specifying 'autocommit=6s' in their requests, which
> I
> > realize is a pretty aggressive commit interval, but so far that hasn't
> > caused any problems I couldn't surmount.
> >
> > - There are north of 11 million docs in my index, which is 36 gigs thick.
> > The storage volume is only 10% full.
> >
> > - When I migrated from 3.5 to 4.10.3, I correctly performed a reindex due
> > to incompatibility between versions.
> >
> > - Both master and slave are running on AWS instances, C4.4XL's (16 cores,
> > 30 gigs of RAM).
> >
> > So far, I have been unable to reproduce this error on my own: I can only
> > observe it in the logs. I haven't been able to tie it to any specific
> > document.
> >
> > Let me know if further information would be helpful.
> >
> >
> >
> >
>
>
> --
> - Joe
>

Re: Negating multiple array fileds

2016-02-17 Thread Jack Krupansky

I actually thought seriously about whether to mention wildcard vs. range,
but... it annoys me that the Lucene and query parser folks won't fix either
PrefixQuery or the query parsers to do the right/optimal thing for
single-asterisk query. I wrote up a Jira for it years ago, but for whatever
reason the difficulty persists. At one point one of the Lucene guys told me
that there was a filter query that could do both * and -* very efficiently,
but then later that was disputed, not to mention that filter query is now
gone. In any case, with the newer AutomatonQuery the single-asterisk
PrefixQuery case should always perform at least semi-reasonably no matter
what, especially since it is now a constant-score query, which it wasn't
many years ago.

Whether [* TO *] is actually a lot more (or less) efficient than
PrefixQuery for an empty prefix these days is... unknown to me, but I won't
give anybody grief for using it as a way of compensating for the
brain-damaged way that Lucene and Solr handle single-asterisk and negated
single-asterisk queries.

-- Jack Krupansky

On Tue, Feb 16, 2016 at 8:17 PM, Shawn Heisey  wrote:

> On 2/15/2016 9:22 AM, Jack Krupansky wrote:
> > I should also have noted that your full query:
> >
> > (-persons:*)AND(-places:*)AND(-orgs:*)
> >
> > can be written as:
> >
> > -persons:* -places:* -orgs:*
> >
> > Which may work as is, or can also be written as:
> >
> > *:* -persons:* -places:* -orgs:*
>
> Salman,
>
> One fact of Lucene operation is that purely negative queries do not
> work.  A negative query clause is like a subtraction.  If you make a
> query that only says "subtract these values", then you aren't going to
> get anything, because you did not start with anything.
>
> Adding the "*:*" clause at the beginning of the query says "start with
> everything."
>
> You might ask why a query of -field:value works, when I just said that
> it *won't* work.  This is because Solr has detected the problem and
> fixed it.  When the query is very simple (a single negated clause), Solr
> is able to detect the unworkable situation and implicitly add the "*:*"
> starting point, producing the expected results.  With more complex
> queries, like the one you are trying, this detection fails, and the
> query is executed as-is.
>
> Jack is an awesome member of this community.  I do not want to disparage
> him at all when I tell you that the rewritten query he provided will
> work, but is not optimal.  It can be optimized as the following:
>
> *:* -persons:[* TO *] -places:[* TO *] -orgs:[* TO *]
>
> A query clause of the format "field:*" is a wildcard query.  Behind the
> scenes, Solr will interpret this as "all possible values for field" --
> which sounds like it would be exactly what you're looking for, except
> that if there are ten million possible values in the field you're
> searching, the constructed Lucene query will quite literally include all
> ten million values.  Wildcard queries tend to use a lot of memory and
> run slowly.
>
> The [* TO *] syntax is an all-inclusive range query, which will usually
> be much faster than a wildcard query.
>
> Thanks,
> Shawn
>
>

Sell more tickets to your next event!

2016-02-17 Thread Cody Rasmus


Hey NYC Apache Lucene/Solr Meetup,

Hope all is well. I wanted to follow up on my last email with some
interesting data we’ve collected from event organizers similar to NYC
Apache Lucene/Solr Meetup. Over the last 6 months more than 70% of
payments for events on SquadUP have come from a mobile device. Is your
ticketing company helping you take advantage of the shift from web to
mobile commerce? Our best guess is that they are not.

The shift from web to mobile ticket purchasing happened almost overnight
and most ticketing companies have not been able to keep up. As such, the

ticketing industry has become notorious for high rates of shopping cart
abandonment. The result is simple and it’s crushing your business. Your
customers are getting frustrated with cumbersome check out experiences on
mobile and you are losing money.

It would be great to discuss the above and NYC Apache Lucene/Solr
Meetup’s event strategy, please let me know if 2:30pm PST on 2/19 works
for you. If so, I’ll follow up with a calendar invite.

Speak Soon,
Cody

Cody Rasmus
Sales Associate / SquadUP
c...@squadup.com / 310-795-8499

NYC Apache Lucene/Solr Meetup, if you don't want to hear from me anymore,
please click here:
http://go.squadup.com/unsubscribe/u/106112/94484ea8b74d1f7abdb8c4952288ea89183b27bc01320a91f2e88e559712d709/20465319
.

Re: Zookeeper upconfig files to upload big config files

On 2/17/2016 1:04 PM, Aswath Srinivasan (TMS) wrote:
> I’m tyring to upconfig my config files and my synonyms.txt file is
> about 2 MB. Whenever I try to do this, I get the following expection.
> It’s either a “broken pipe” expection or the following expection. Any
> advice for me to fix it?

The default size limit on a zookeeper node is about 1 megabyte.

Adjusting this limit is possible, but involves some invasive changes. 
You will need to add "-Djute.maxbuffer=NNN" to the command-line of
every single java invocation related to Solr and Zookeeper, including
the zkcli script, to allow a larger node size.

http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#Unsafe+Options

The zookeeper project calls this an "unsafe" option because zookeeper is
not really designed to store large data.

I would also expect such a large list of synonyms to slow down analysis.

Thanks,
Shawn

Re: Adding nodes

2016-02-17 Thread Jeff Wartes

Solrcloud does not come with any autoscaling functionality. If you want such a 
thing, you’ll need to write it yourself.

https://github.com/whitepages/solrcloud_manager might be a useful head start 
though, particularly the “fill” and “cleancollection” commands. I don’t do 
*auto* scaling, but I do use this for all my cluster management, which 
certantly involves moving collections/shards around among nodes, adding 
capacity, and removing capacity.






On 2/14/16, 11:17 AM, "McCallick, Paul"  wrote:

>These are excellent questions and give me a good sense of why you suggest 
>using the collections api.
>
>In our case we have 8 shards of product data with a even distribution of data 
>per shard, no hot spots. We have very different load at different points in 
>the year (cyber monday), and we tend to have very little traffic at night. I'm 
>thinking of two use cases:
>
>1) we are seeing increased latency due to load and want to add 8 more replicas 
>to handle the query volume.  Once the volume subsides, we'd remove the nodes. 
>
>2) we lose a node due to some unexpected failure (ec2 tends to do this). We 
>want auto scaling to detect the failure and add a node to replace the failed 
>one. 
>
>In both cases the core api makes it easy. It adds nodes to the shards evenly. 
>Otherwise we have to write a fairly involved script that is subject to race 
>conditions to determine which shard to add nodes to. 
>
>Let me know if I'm making dangerous or uninformed assumptions, as I'm new to 
>solr. 
>
>Thanks,
>Paul
>
>> On Feb 14, 2016, at 10:35 AM, Susheel Kumar  wrote:
>> 
>> Hi Pual,
>> 
>> 
>> For Auto-scaling, it depends on how you are thinking to design and what/how
>> do you want to scale. Which scenario you think makes coreadmin API easy to
>> use for a sharded SolrCloud environment?
>> 
>> Isn't if in a sharded environment (assume 3 shards A,B & C) and shard B has
>> having higher or more load,  then you want to add Replica for shard B to
>> distribute the load or if a particular shard replica goes down then you
>> want to add another Replica back for the shard in which case ADDREPLICA
>> requires a shard name?
>> 
>> Can you describe your scenario / provide more detail?
>> 
>> Thanks,
>> Susheel
>> 
>> 
>> 
>> On Sun, Feb 14, 2016 at 11:51 AM, McCallick, Paul <
>> paul.e.mccall...@nordstrom.com> wrote:
>> 
>>> Hi all,
>>> 
>>> 
>>> This doesn’t really answer the following question:
>>> 
>>> What is the suggested way to add a new node to a collection via the
>>> apis?  I  am specifically thinking of autoscale scenarios where a node has
>>> gone down or more nodes are needed to handle load.
>>> 
>>> 
>>> The coreadmin api makes this easy.  The collections api (ADDREPLICA),
>>> makes this very difficult.
>>> 
>>> 
 On 2/14/16, 8:19 AM, "Susheel Kumar"  wrote:
 
 Hi Paul,
 
 Shawn is referring to use Collections API
 https://cwiki.apache.org/confluence/display/solr/Collections+API  than
>>> Core
 Admin API https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API
 for SolrCloud.
 
 Hope that clarifies and you mentioned about ADDREPLICA which is the
 collections API, so you are on right track.
 
 Thanks,
 Susheel
 
 
 
 On Sun, Feb 14, 2016 at 10:51 AM, McCallick, Paul <
 paul.e.mccall...@nordstrom.com> wrote:
 
> Then what is the suggested way to add a new node to a collection via the
> apis?  I  am specifically thinking of autoscale scenarios where a node
>>> has
> gone down or more nodes are needed to handle load.
> 
> Note that the ADDREPLICA endpoint requires a shard name, which puts the
> onus of how to scale out on the user. This can be challenging in an
> autoscale scenario.
> 
> Thanks,
> Paul
> 
>> On Feb 14, 2016, at 12:25 AM, Shawn Heisey 
>>> wrote:
>> 
>>> On 2/13/2016 6:01 PM, McCallick, Paul wrote:
>>> - When creating a new collection, SOLRCloud will use all available
> nodes for the collection, adding cores to each.  This assumes that you
>>> do
> not specify a replicationFactor.
>> 
>> The number of nodes that will be used is numShards multipled by
>> replicationFactor.  The default value for replicationFactor is 1.  If
>> you do not specify numShards, there is no default -- the CREATE call
>> will fail.  The value of maxShardsPerNode can also affect the overall
>> result.
>> 
>>> - When adding new nodes to the cluster AFTER the collection is
>>> created,
> one must use the core admin api to add the node to the collection.
>> 
>> Using the CoreAdmin API is strongly discouraged when running
>>> SolrCloud.
>> It works, but it is an expert API when in cloud mode, and can cause
>> serious problems if not used correctly.  Instead, use the Collections
>> API.  It can handle all normal

Zookeeper upconfig files to upload big config files

2016-02-17 Thread Aswath Srinivasan (TMS)

Hi fellow Solr developers,

I'm tyring to upconfig my config files and my synonyms.txt file is about 2 MB. 
Whenever I try to do this, I get the following expection. It's either a "broken 
pipe" expection or the following expection. Any advice for me to fix it?

If I remove most of the synonym entries and keep my synonyms.txt file simple 
and short then upconfig works without any problem.

WARN  - 2016-02-17 11:48:55.514; org.apache.zookeeper.ClientCnxn$SendThread; 
Session 0x252e439e59f0017 for server zookeeperhost/zookeeperhost:2181, 
unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
[abc@dc1-abc ~]$ ^C
[abc@dc1-abc ~]$ /t3/apps/solr-5.3.2/server/scripts/cloud-scripts/zkcli.sh 
-zkhost 
t3solr.test1.abc.com:2181,t3solr.test2.abc.com:2181,t3solr.test3.abc.com:2181 
-cmd upconfig -confname test_config_2 -confdir 
/t3/apps/solr-5.3.2/example/example-DIH/solr/TestEnvCore/conf
Exception in thread "main" java.io.IOException: Error uploading file 
/t3/apps/solr-5.3.2/example/example-DIH/solr/TestEnvCore/conf/synonyms.txt to 
zookeeper path /configs/test_config_2/synonyms.txt
at 
org.apache.solr.common.cloud.ZkConfigManager$1.visitFile(ZkConfigManager.java:68)
at 
org.apache.solr.common.cloud.ZkConfigManager$1.visitFile(ZkConfigManager.java:58)
at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:135)
at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:199)
at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:69)
at java.nio.file.Files.walkFileTree(Files.java:2602)
at java.nio.file.Files.walkFileTree(Files.java:2635)
at 
org.apache.solr.common.cloud.ZkConfigManager.uploadToZK(ZkConfigManager.java:58)
at 
org.apache.solr.common.cloud.ZkConfigManager.uploadConfigDir(ZkConfigManager.java:120)
at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:220)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for /configs/test_config_2/synonyms.txt
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1270)
at 
org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:362)
at 
org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:359)
at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)
at 
org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:359)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:529)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:408)
at 
org.apache.solr.common.cloud.ZkConfigManager$1.visitFile(ZkConfigManager.java:66)
... 9 more
[abc@dc1-abc ~]$


[cid:image002.png@01D1697B.64FB26D0]
Thank you,
Aswath NS

Re: Retrieving 1000 records at a time

2016-02-17 Thread Joel Bernstein

Also are you ranking documents by score

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Feb 17, 2016 at 1:59 PM, Joel Bernstein  wrote:

> A few questions for you: What types of fields and how many fields will you
> be retrieving? What version of Solr are you using?
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Feb 17, 2016 at 1:37 PM, Mark Robinson 
> wrote:
>
>> Hi,
>>
>> I have a requirement where I need to retrieve 1 to 15000 records at a
>> time from SOLR.
>> With 20 or 100 records everything happens in milliseconds.
>> When it goes to 1000, 1  it is taking more time... like even 30
>> seconds.
>>
>> Will Solr be able to return 1 records at a time in less than say 200
>> milliseconds?
>>
>> I have read that disk read is a costly affair so we have to batch results
>> and lesser the number of records retrieved in a batch the faster the
>> response when using SOLR.
>>
>> So is Solr a straight away NO candidate in a situation where 1 records
>> should be retrieved in a time of <=200 mS.
>>
>> A quick response would be very helpful.
>>
>> Thanks!
>> Mark
>>
>
>

Re: Retrieving 1000 records at a time

2016-02-17 Thread Joel Bernstein

A few questions for you: What types of fields and how many fields will you
be retrieving? What version of Solr are you using?

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Feb 17, 2016 at 1:37 PM, Mark Robinson 
wrote:

> Hi,
>
> I have a requirement where I need to retrieve 1 to 15000 records at a
> time from SOLR.
> With 20 or 100 records everything happens in milliseconds.
> When it goes to 1000, 1  it is taking more time... like even 30
> seconds.
>
> Will Solr be able to return 1 records at a time in less than say 200
> milliseconds?
>
> I have read that disk read is a costly affair so we have to batch results
> and lesser the number of records retrieved in a batch the faster the
> response when using SOLR.
>
> So is Solr a straight away NO candidate in a situation where 1 records
> should be retrieved in a time of <=200 mS.
>
> A quick response would be very helpful.
>
> Thanks!
> Mark
>

SolrCloud sync issues under server failure

2016-02-17 Thread Håkon Hitland

Hi,

We have been testing an installation of SolrCloud under some failure
scenarios, and are seeing some issues we would like to fix before putting
this into production.

Our cluster is 6 servers running Solr 5.4.1, with config stored in our
Zookeeper cluster.
Our cores currently each have a single shard replicated across all servers.

Scenario 1:
We start a full import from our database using the dataimport handler.
During the import we do a clean shutdown of the node running the import.

When the node is started again, it comes up with a partial index. The index
is not resynced from the leader until we start and complete a new full
import.

Are we missing some settings that will make updates atomic? We would rather
roll back the update than run with a partial set of documents.
How can we make replicas stay in sync with the leader?

Scenario 2:
One of our servers had a disk error that made the Solr home directory turn
read-only.
On the cores where this node was a follower the node was correctly marked
as down.
But on one core where this node was a leader, it stayed healthy. All
updates would fail, without the node realizing it should step down as
leader.

In addition, leader elections stalled while this node was in the cluster.
When a second server was shut down, several cores stayed leaderless until
the node with the failed disk was shut down as well.

Is there a way to healthcheck nodes so a disk failure will make the
affected node step down?

Scenario 3:
We changed the faulty disk, wiping the Solr home directory.
Starting Solr again did not resync the missing cores.

I do see some lines in our logs like:
2016-02-16 13:44:02.841 INFO (qtp1395089624-22) [ ]
o.a.s.h.a.CoreAdminHandler It has been requested that we recover:
core=content_shard1_replica1
2016-02-16 13:44:02.842 ERROR (Thread-15) [ ] o.a.s.h.a.CoreAdminHandler
Could not find core to call recovery:content_shard1_replica1

Is there a way to force recovery of the cores a node should have based on
the collection replica settings?


Any tips on how to make this more robust would be appreciated.

Regards,
Håkon Hitland

Re: Running solr as a service vs. Running it as a process

Hi Dan,
At my project, we aren't that big on directory and user set up but the fact
that services can be started and stopped automatically on server reboots
and ensuring single running copies of the service is of significance.
Now currently we are running Solr 4.4 but pretty soon we're going to
upgrade to Solr 4.10.4.
If I'm not wrong, the install scripts that set everything up ship with Solr
5.x. In this case, how do I set up my solr instances to behave like
services created by the scripts in 5.x?
I understand that this would entail setting up the init scripts and
environment variables, and I do not have a lot of experience with those, so
it'll be great if you can just walk me through it.

Or maybe it'll be simple enough to just manually follow the script from 5.x
and adapt it to my solr instance as I go?

Thanks.


On Wed, Feb 17, 2016 at 11:40 PM Davis, Daniel (NIH/NLM) [C] <
daniel.da...@nih.gov> wrote:

> So, running solr as a service also runs it as a process.   In typical
> Linux environments, (based on initscripts), a service is a process
> installed to meet additional considerations:
>
> - Putting logs in predictable places where system operators and
> administrators expect to see logs - /var/logs
> - Putting dynamic data that varies again in predictable places where
> system administrators expect to see dynamic data.
> - Putting code for the process in /opt/solr - the /opt filesystem is for
> non-operating system components
> - Putting configuration files for the process again in predictable places.
> - Running the process as a non-root user, but also as a user that is not
> any one user's account - e.g. a "service" account
> - Making sure Solr starts at system startup and stops at system shutdown
> - Making sure only a single copy of the service is running
>
> The options implemented in the install_solr_service.sh command are meant
> to be generic to many Linux environments, e.g. appropriate for RHEL/CentOS,
> Ubuntu, and Amazon Linux.   My organization is large enough (and perhaps
> peculiar enough) to have its own standards for where administrators expect
> to see logs and where dynamic data should go.   However, I still need to
> make sure to run it as a service, and this is part of taking it to
> production.
>
> The command /sbin/service is part of a package called "initscripts" which
> is used on a number of different Linux environments.   Many systems are now
> using both that package and another, "systemd", that starts things somewhat
> differently.
>
> Hope this helps,
>
> Dan Davis, Systems/Applications Architect (Contractor),
> Office of Computer and Communications Systems,
> National Library of Medicine, NIH
>
>
> -Original Message-
> From: Binoy Dalal [mailto:binoydala...@gmail.com]
> Sent: Wednesday, February 17, 2016 2:17 AM
> To: SOLR users group 
> Subject: Running solr as a service vs. Running it as a process
>
> Hello everyone,
> I've read about running solr as a service but I don't understand what it
> really means.
>
> I went through the "Taking solr to production" documentation on the wiki
> which suggests that solr be installed using the script provided and run as
> a service.
> From what I could glean, the script creates a directory structure and sets
> various environment variables and then starts solr using the service
> command.
> How is this different from setting up solr manually and starting solr
> using `./solr start`?
>
> Currently in my project, we start solr as a process using the `./` Is this
> something that should be avoided and if so why?
>
> Additionally, and I know that this is not the right place to ask, yet if
> someone could explain what the service command actually does, that would be
> great. I've read a few articles and they say that it runs the init script
> in as predictable an environment as possible, but what does that mean?
>
> Thanks
> --
> Regards,
> Binoy Dalal
>
-- 
Regards,
Binoy Dalal

Retrieving 1000 records at a time

2016-02-17 Thread Mark Robinson

Hi,

I have a requirement where I need to retrieve 1 to 15000 records at a
time from SOLR.
With 20 or 100 records everything happens in milliseconds.
When it goes to 1000, 1  it is taking more time... like even 30 seconds.

Will Solr be able to return 1 records at a time in less than say 200
milliseconds?

I have read that disk read is a costly affair so we have to batch results
and lesser the number of records retrieved in a batch the faster the
response when using SOLR.

So is Solr a straight away NO candidate in a situation where 1 records
should be retrieved in a time of <=200 mS.

A quick response would be very helpful.

Thanks!
Mark

Re: Negating multiple array fileds

On 2/17/2016 12:34 AM, Salman Ansari wrote:
> 2) "Behind the scenes, Solr will interpret this as "all possible values for
> field" --which sounds like it would be exactly what you're looking for,
> except that if there are ten million possible values in the field
> you're searching,
> the constructed Lucene query will quite literally include all ten million
> values."
>
> Does that mean that the  [* TO *] syntax does not return all results?

What the [* TO *] range query will do is match all documents where the
named field *has a value*.  It will exclude documents where the field is
missing.  So a query of "*:* -field:[* TO *]" will return all documents
where field is not present.  Related note:  If the field is present but
contains the empty string, it *is* matched by the range query.  The
field must be entirely missing to not match the range.

A full wildcard query (like field:*) does much the same thing, but in a
different way that might not perform as well as the range query.

If you want *all* documents, use *:* for your query.  This is a special
query -- even though it looks like it's a double wildcard, it isn't.

Thanks,
Shawn

Re: Running solr as a service vs. Running it as a process

2016-02-17 Thread Susheel Kumar

In addition you also get many advantages like you can start/stop/restart
solr using "service solr stop|start|restart" as mentioned above.  You don't
need to launch solr script directly. Also the install scripts take care of
installing/setting up Solr nicely for Production environment.  Even you can
automate installation/launch of Solr using Install script.

Thanks,
Susheel


On Wed, Feb 17, 2016 at 1:10 PM, Davis, Daniel (NIH/NLM) [C] <
daniel.da...@nih.gov> wrote:

> So, running solr as a service also runs it as a process.   In typical
> Linux environments, (based on initscripts), a service is a process
> installed to meet additional considerations:
>
> - Putting logs in predictable places where system operators and
> administrators expect to see logs - /var/logs
> - Putting dynamic data that varies again in predictable places where
> system administrators expect to see dynamic data.
> - Putting code for the process in /opt/solr - the /opt filesystem is for
> non-operating system components
> - Putting configuration files for the process again in predictable places.
> - Running the process as a non-root user, but also as a user that is not
> any one user's account - e.g. a "service" account
> - Making sure Solr starts at system startup and stops at system shutdown
> - Making sure only a single copy of the service is running
>
> The options implemented in the install_solr_service.sh command are meant
> to be generic to many Linux environments, e.g. appropriate for RHEL/CentOS,
> Ubuntu, and Amazon Linux.   My organization is large enough (and perhaps
> peculiar enough) to have its own standards for where administrators expect
> to see logs and where dynamic data should go.   However, I still need to
> make sure to run it as a service, and this is part of taking it to
> production.
>
> The command /sbin/service is part of a package called "initscripts" which
> is used on a number of different Linux environments.   Many systems are now
> using both that package and another, "systemd", that starts things somewhat
> differently.
>
> Hope this helps,
>
> Dan Davis, Systems/Applications Architect (Contractor),
> Office of Computer and Communications Systems,
> National Library of Medicine, NIH
>
>
> -Original Message-
> From: Binoy Dalal [mailto:binoydala...@gmail.com]
> Sent: Wednesday, February 17, 2016 2:17 AM
> To: SOLR users group 
> Subject: Running solr as a service vs. Running it as a process
>
> Hello everyone,
> I've read about running solr as a service but I don't understand what it
> really means.
>
> I went through the "Taking solr to production" documentation on the wiki
> which suggests that solr be installed using the script provided and run as
> a service.
> From what I could glean, the script creates a directory structure and sets
> various environment variables and then starts solr using the service
> command.
> How is this different from setting up solr manually and starting solr
> using `./solr start`?
>
> Currently in my project, we start solr as a process using the `./` Is this
> something that should be avoided and if so why?
>
> Additionally, and I know that this is not the right place to ask, yet if
> someone could explain what the service command actually does, that would be
> great. I've read a few articles and they say that it runs the init script
> in as predictable an environment as possible, but what does that mean?
>
> Thanks
> --
> Regards,
> Binoy Dalal
>

Re: join and NOT together

2016-02-17 Thread Mikhail Khludnev

Sergo,

Please provide more debug output, I want to see how query was parsed.

On Tue, Feb 16, 2016 at 1:20 PM, Sergio García Maroto 
wrote:

> My debugQuery=true returns related to the NOT:
>
> 0.06755901 = (MATCH) sum of: 0.06755901 = (MATCH) MatchAllDocsQuery,
> product of: 0.06755901 = queryNorm
>
> I tried changing v='(*:* -DocType:pdf)'  to v='(-DocType:pdf)'
> and it worked.
>
> Anyone could explain the difference?
>
> Thanks
> Sergo
>
>
> On 15 February 2016 at 21:12, Mikhail Khludnev  >
> wrote:
>
> > Hello Sergio,
> >
> > What debougQuery=true output does look like?
> >
> > On Mon, Feb 15, 2016 at 7:10 PM, marotosg  wrote:
> >
> > > Hi,
> > >
> > > I am trying to solve an issue when doing a search joining two
> collections
> > > and negating the cross core query.
> > >
> > > Let's say I have one collection person and another collection documents
> > and
> > > I can join them using local param !join because I have PersonIDS in
> > > document
> > > collection.
> > >
> > > if my query is like below. Query executed against Person Core. I want
> to
> > > retrieve people with name Peter and not documents attached of type pdf.
> > >
> > > q=PersonName:peter AND {!type=join from=DocPersonID to=PersonID
> > > fromIndex=document v='(*:* -DocType:pdf)' }
> > >
> > > If I have for person 1 called peter two documents one of type:pdf and
> > other
> > > one of type:word.
> > > Then this person will come back.
> > >
> > > Is there any way of excluding that person if any of the docs fulfill
> the
> > > NOT.
> > >
> > > Thanks
> > > Sergio
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> http://lucene.472066.n3.nabble.com/join-and-NOT-together-tp4257411.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > 
> > 
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Re: Negating multiple array fileds

2016-02-17 Thread Salman Ansari

Thanks Shawn for explaining in details.
Regarding the performance issue you mentioned, there are 2 points

1) "The [* TO *] syntax is an all-inclusive range query, which will usually be
much faster than a wildcard query."

I will take your statement for granted and let the space for people to
comment on the details behind this.

2) "Behind the scenes, Solr will interpret this as "all possible values for
field" --which sounds like it would be exactly what you're looking for,
except that if there are ten million possible values in the field
you're searching,
the constructed Lucene query will quite literally include all ten million
values."

Does that mean that the  [* TO *] syntax does not return all results?

Regards,

Salman
On Feb 17, 2016 6:29 AM, "Binoy Dalal"  wrote:

> Hi Shawn,
> Please correct me If I'm wrong here, but don't the all inclusive range
> query [* TO *] and an only wildcard query like the one above essentially do
> the same thing from a black box perspective?
> In such a case wouldn't it be better to default an only wildcard query to
> an all inclusive range query?
>
> On Wed, 17 Feb 2016, 06:47 Shawn Heisey  wrote:
>
> > On 2/15/2016 9:22 AM, Jack Krupansky wrote:
> > > I should also have noted that your full query:
> > >
> > > (-persons:*)AND(-places:*)AND(-orgs:*)
> > >
> > > can be written as:
> > >
> > > -persons:* -places:* -orgs:*
> > >
> > > Which may work as is, or can also be written as:
> > >
> > > *:* -persons:* -places:* -orgs:*
> >
> > Salman,
> >
> > One fact of Lucene operation is that purely negative queries do not
> > work.  A negative query clause is like a subtraction.  If you make a
> > query that only says "subtract these values", then you aren't going to
> > get anything, because you did not start with anything.
> >
> > Adding the "*:*" clause at the beginning of the query says "start with
> > everything."
> >
> > You might ask why a query of -field:value works, when I just said that
> > it *won't* work.  This is because Solr has detected the problem and
> > fixed it.  When the query is very simple (a single negated clause), Solr
> > is able to detect the unworkable situation and implicitly add the "*:*"
> > starting point, producing the expected results.  With more complex
> > queries, like the one you are trying, this detection fails, and the
> > query is executed as-is.
> >
> > Jack is an awesome member of this community.  I do not want to disparage
> > him at all when I tell you that the rewritten query he provided will
> > work, but is not optimal.  It can be optimized as the following:
> >
> > *:* -persons:[* TO *] -places:[* TO *] -orgs:[* TO *]
> >
> > A query clause of the format "field:*" is a wildcard query.  Behind the
> > scenes, Solr will interpret this as "all possible values for field" --
> > which sounds like it would be exactly what you're looking for, except
> > that if there are ten million possible values in the field you're
> > searching, the constructed Lucene query will quite literally include all
> > ten million values.  Wildcard queries tend to use a lot of memory and
> > run slowly.
> >
> > The [* TO *] syntax is an all-inclusive range query, which will usually
> > be much faster than a wildcard query.
> >
> > Thanks,
> > Shawn
> >
> > --
> Regards,
> Binoy Dalal
>

Re: SOLR ranking

2016-02-17 Thread Nitin.K

Hi Binoy,

We are searching for both phrases and individual words 
but we want that only those documents which are having phrases will come
first in the order and then the individual app.

termPositions = true is also not working in my case.

I have also removed the string type from copy fields. kindly look into the
changed configuration below:

Hi Emir,

I have changed the cofiguration as per your suggestion, added pf2 / pf3.
Yes, i saw the difference but still the ranking is not getting followed
correctly in case of phrases.

Changed configuration;















Copy fields again for the reference :







Added following field type:









Removed the string type from the copy fields.

Changed Query :

http://localhost:8983/solr/tgl/select?q=rheumatoid%20arthritis=xml=1.0=200=AND=true=edismax=true=true=true;
pf=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
pf2=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
pf3=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
qf=topic_title^100 subtopic_title^40 index_term^20 drug^15 content^3

After making these changes, I am able to get my search results correctly for
a single term but in case of phrase search, i am still not able to get the
results in the correct order.

Hi Modassar,

I tried using mm=100, but the order is still the same.

Hi Alessandro,

I have not yet tried the slope parameter. By default it is taking it as 1.0
when i looked it in debug mode. Will revert you definitely. So, let me try
this option too.

All,

Please suggest if anyone is having any other suggestion on this. I have to
implement it on urgent basis and i think i am very close to it. Thanks all
of you. I have reached to this level just because of you guys.

Thanks and Regards,
Nitin



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257782.html
Sent from the Solr - User mailing list archive at Nabble.com.

Facet count with expand and collapse

HI,

will there be any change in the facet count in case of expand and collpase
? please clarify.

Regards,
Anil

RE: Running solr as a service vs. Running it as a process

2016-02-17 Thread Davis, Daniel (NIH/NLM) [C]

So, running solr as a service also runs it as a process.   In typical Linux 
environments, (based on initscripts), a service is a process installed to meet 
additional considerations:

- Putting logs in predictable places where system operators and administrators 
expect to see logs - /var/logs
- Putting dynamic data that varies again in predictable places where system 
administrators expect to see dynamic data.
- Putting code for the process in /opt/solr - the /opt filesystem is for 
non-operating system components
- Putting configuration files for the process again in predictable places.
- Running the process as a non-root user, but also as a user that is not any 
one user's account - e.g. a "service" account
- Making sure Solr starts at system startup and stops at system shutdown
- Making sure only a single copy of the service is running

The options implemented in the install_solr_service.sh command are meant to be 
generic to many Linux environments, e.g. appropriate for RHEL/CentOS, Ubuntu, 
and Amazon Linux.   My organization is large enough (and perhaps peculiar 
enough) to have its own standards for where administrators expect to see logs 
and where dynamic data should go.   However, I still need to make sure to run 
it as a service, and this is part of taking it to production.

The command /sbin/service is part of a package called "initscripts" which is 
used on a number of different Linux environments.   Many systems are now using 
both that package and another, "systemd", that starts things somewhat 
differently.

Hope this helps,

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH


-Original Message-
From: Binoy Dalal [mailto:binoydala...@gmail.com] 
Sent: Wednesday, February 17, 2016 2:17 AM
To: SOLR users group 
Subject: Running solr as a service vs. Running it as a process

Hello everyone,
I've read about running solr as a service but I don't understand what it really 
means.

I went through the "Taking solr to production" documentation on the wiki which 
suggests that solr be installed using the script provided and run as a service.
From what I could glean, the script creates a directory structure and sets 
various environment variables and then starts solr using the service command.
How is this different from setting up solr manually and starting solr using 
`./solr start`?

Currently in my project, we start solr as a process using the `./` Is this 
something that should be avoided and if so why?

Additionally, and I know that this is not the right place to ask, yet if 
someone could explain what the service command actually does, that would be 
great. I've read a few articles and they say that it runs the init script in as 
predictable an environment as possible, but what does that mean?

Thanks
--
Regards,
Binoy Dalal

Null Pointer Exception on distributed search

2016-02-17 Thread Lokesh Chhaparwal

Hi,

We are facing NPE while using distributed search (Solr version 4.7.2)
(using *shards* parameter in solr query)

Exception Trace:
ERROR - 2016-02-17 16:44:26.616; org.apache.solr.common.SolrException;
null:java.lang.NullPointerException
at org.apache.solr.response.XMLWriter.writeSolrDocument(XMLWriter.java:190)
at
org.apache.solr.response.TextResponseWriter.writeSolrDocumentList(TextResponseWriter.java:222)
at
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:184)
at org.apache.solr.response.XMLWriter.writeNamedList(XMLWriter.java:227)
at
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:188)
at org.apache.solr.response.XMLWriter.writeArray(XMLWriter.java:273)
at
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:190)
at org.apache.solr.response.XMLWriter.writeNamedList(XMLWriter.java:227)
at
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:188)
at org.apache.solr.response.XMLWriter.writeNamedList(XMLWriter.java:227)
at
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:188)
at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:111)
at
org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:40)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:756)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:428)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:205)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)


Can somebody help us in finding the root cause of this exception?

FYI, we have documents split across 8 shards (num docs ~ 20 million) with
index size ~ 4 GB per node. We are using c3.2xlarge amazon ec2 machines
with solr running in apache tomcat (memory config 8 to 10 gb). Request
count ~ 200/sec.

Thanks,
Lokesh

Running solr as a service vs. Running it as a process