date:20130117

Re: Large data importing getting rollback with solr

2013-01-17 Thread ashimbose

Hi Gora ,

Thank you for your quick reply.

I have only one data source, But have more than 300 tables. Each tables I
have put in individual  in data-confic.xml

But when I am trying to do full import, Its showing Thant much  as
169

This 169 means I took 169 tables from my data source and each 169 tables
created individual  in my 
data-confic.xml file.

I am not sure, if I did something wrong. Please let me know.

My sample data-config.xml I am posting as below..



  
  

  
  
  
  

.
.
.
.
  


Thank you

Regards,
Ashim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Large-data-importing-getting-rollback-with-solr-tp4034075p4034466.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Questions about boosting

2013-01-17 Thread Mikhail Khludnev

Colleagues,
fwiw bq is a DisMax parser feature. Shawn, to approach the boosting syntax
with the standard parser you need something like q=foo:bar ip:sc^1000.
Specifying ^1000 in bq makes no sense ever. If you show query params and
debugQuery output, it would much easier for us to help you.
PS omitting termfreq's and positions doesn't impact query time boosing
ever. The closes caveat is that disabling norms indexing kills _index_ time
boosting.

On Fri, Jan 18, 2013 at 11:10 AM, Shawn Heisey  wrote:

> On 1/17/2013 11:41 PM, Walter Underwood wrote:
>
>> As I understand it, the bq parameter is a full Lucene query, but only
>> used for ranking, not for selection. This is the complement of fq.
>>
>> You can use weighting:  provider:fred^8
>>
>
> I tried bq=ip:sc^1000 and it doesn't seem to be making any difference.
> Even if I add fq=ip:sc, I don't see any mention of bq, ip, sc, or 1000 in
> the debugQuery output.
>
> This is the case on both 3.5 and 4.1.  In case it was caused by omitting
> termfreq and positions on the field I'm using in the bq, I tried a couple
> of other fields that don't omit anything and bq seems to be having no
> effect at all.
>
> Thanks,
> Shawn
>
>

-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Re: Large data importing getting rollback with solr

2013-01-17 Thread Gora Mohanty

On 18 January 2013 12:49, ashimbose  wrote:
> Hi Otis,
>
> Thank you for your reply.
>
> But I am unable to get any search result related to the error code. Its not
> response for more than 168 Data Source. I have tested it. If you have any
> other solution please let me know.

Not sure about the limit on 168 data sources in
DIH, but I am curious as to why you need that
many? Do you have that many different mysql
databases that you are indexing from?

Regards,
Gora

Re: Questions about boosting

2013-01-17 Thread Shawn Heisey


On 1/17/2013 11:41 PM, Walter Underwood wrote:

As I understand it, the bq parameter is a full Lucene query, but only used for 
ranking, not for selection. This is the complement of fq.

You can use weighting:  provider:fred^8


I tried bq=ip:sc^1000 and it doesn't seem to be making any difference. 
Even if I add fq=ip:sc, I don't see any mention of bq, ip, sc, or 1000 
in the debugQuery output.


This is the case on both 3.5 and 4.1.  In case it was caused by omitting 
termfreq and positions on the field I'm using in the bq, I tried a 
couple of other fields that don't omit anything and bq seems to be 
having no effect at all.


Thanks,
Shawn

Re: Questions about boosting

2013-01-17 Thread Shawn Heisey


On 1/17/2013 11:41 PM, Walter Underwood wrote:

As I understand it, the bq parameter is a full Lucene query, but only used for 
ranking, not for selection. This is the complement of fq.

You can use weighting:  provider:fred^8

This will be affected by idf, so providers with fewer matches will have higher 
weight than those with more matches. This is a bother, but the idf-free 
approach requires Solr 4.0.


I am doing my testing on Solr 4.1, so if you can give me the syntax for 
that, I would appreciate it.  My production indexes are 3.5, but once we 
are confident with the 4.1 dev system, we'll upgrade.


The provider field has omitTermFreqAndPositions="true" defined, but the 
fields that typically get searched don't omit anything, so IDF probably 
still applies in the aggregate.


On a related note, I have rather extreme length variation in my fields, 
so I see quite a lot of weird results due to very short metadata.  Is 
there any way to lessen the impact of lengthNorm without eliminating it 
entirely?  If not, is there any way to eliminate lengthNorm without also 
disabling index-time boosts?  At this moment I am not doing index-time 
boosting, but business requirements may change that in the future.


Thanks,
Shawn

Re: Questions about boosting

2013-01-17 Thread Walter Underwood

As I understand it, the bq parameter is a full Lucene query, but only used for 
ranking, not for selection. This is the complement of fq.

You can use weighting:  provider:fred^8

This will be affected by idf, so providers with fewer matches will have higher 
weight than those with more matches. This is a bother, but the idf-free 
approach requires Solr 4.0.

wunder

On Jan 17, 2013, at 10:31 PM, Shawn Heisey wrote:

> I did try the bq parameter.  Either I'm not using it correctly, or it's not 
> making a noticeable difference.  I was not able to find any good docs, 
> either.  Can you give me complete instructions in its use?  Can I control the 
> boost factor?  Is the boost additive or multiplicative?
> 
> For query elevation, don't you have to know in advance the query that a user 
> will send?  There's no way for me to know this - we want to be able to apply 
> the boost to arbitrary queries.
> 
> The source data comes from MySQL, and this is a seven-shard distributed index 
> with 74075200 documents as of a few minutes ago.  Although ExternalFileField 
> probably wouldn't be impossible, it is rather impractical.
> 
> Thanks,
> Shawn
> 
> On 1/17/2013 10:53 PM, Walter Underwood wrote:
>> Have you tried boost query?  bq=provider:fred
>> 
>> wunder
>> 
>> On Jan 17, 2013, at 9:08 PM, Jack Krupansky wrote:
>> 
>>> Start with "Query Elevation" and see if that helps:
>>> http://wiki.apache.org/solr/QueryElevationComponent
>>> 
>>> Index-time document boost is a possibility.
>>> 
>>> Maybe an ExternalFileField where every document could have a dynamic boost 
>>> value that you add with a boost function.
>>> 
>>> -- Jack Krupansky
>>> 
>>> -Original Message- From: Shawn Heisey
>>> Sent: Thursday, January 17, 2013 4:11 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Questions about boosting
>>> 
>>> I've been trying to figure this out on my own, but I've come up empty so
>>> far.  I need to boost documents from a certain provider.  The idea is
>>> that if any documents in a result match a separate query (like
>>> provider:bigbucks), I need to multiply the score by X.  It's important
>>> that the result set of the actual query is not changed, just the order.
>

Re: Questions about boosting

2013-01-17 Thread Shawn Heisey

I did try the bq parameter.  Either I'm not using it correctly, or it's 
not making a noticeable difference.  I was not able to find any good 
docs, either.  Can you give me complete instructions in its use?  Can I 
control the boost factor?  Is the boost additive or multiplicative?


For query elevation, don't you have to know in advance the query that a 
user will send?  There's no way for me to know this - we want to be able 
to apply the boost to arbitrary queries.


The source data comes from MySQL, and this is a seven-shard distributed 
index with 74075200 documents as of a few minutes ago.  Although 
ExternalFileField probably wouldn't be impossible, it is rather impractical.


Thanks,
Shawn

On 1/17/2013 10:53 PM, Walter Underwood wrote:

Have you tried boost query?  bq=provider:fred

wunder

On Jan 17, 2013, at 9:08 PM, Jack Krupansky wrote:


Start with "Query Elevation" and see if that helps:
http://wiki.apache.org/solr/QueryElevationComponent

Index-time document boost is a possibility.

Maybe an ExternalFileField where every document could have a dynamic boost 
value that you add with a boost function.

-- Jack Krupansky

-Original Message- From: Shawn Heisey
Sent: Thursday, January 17, 2013 4:11 PM
To: solr-user@lucene.apache.org
Subject: Questions about boosting

I've been trying to figure this out on my own, but I've come up empty so
far.  I need to boost documents from a certain provider.  The idea is
that if any documents in a result match a separate query (like
provider:bigbucks), I need to multiply the score by X.  It's important
that the result set of the actual query is not changed, just the order.

Re: Using Solr Spatial in conjunction with HBASE/Hadoop

2013-01-17 Thread David Smiley (@MITRE.org)

Hi Oakstream,

Coincidentally I've been thinking of porting the geohash prefixtree
intersection algorithm in Lucene 4 spatial to Accumulo (another big-table
system like HBase).  There's a decent chance it'll happen this year, I
think.  That doesn't help your need right now of course so go with Otis's
advise.

~ David Smiley


oakstream wrote
> Hello,
> I have point data (lat/lon) stored in hbase/hadoop and would like to query
> the data spatially with polygons.  (If I pass in a few polygons find me
> all the records that exist within these polygons.  I need it to support
> polygons not just box queries).  Hadoop doesn't really have much support
> that I could find for these types of queries.  I was wondering if I could
> leverage SOLR spatial 4 and create spatial indexes on the hbase data that
> could be used to query this data?? I need near real-time answers (within a
> couple seconds). 
> 
> If anyone has any thoughts on this I would greatly appreciate them.
> 
> Thank you





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Solr-Spatial-in-conjunction-with-HBASE-Hadoop-tp4034307p403.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: group.ngroups behavior in response

2013-01-17 Thread Amit Nithian

A new response attribute would be better but it also complicates the patch
in that it would require a new way to serialize DocSlices I think
(especially when group.main=true)? I was looking to set group.main=true so
that my existing clients don't have to change to parse the grouped
resultset format.

Secondly, while a new response attribute makes sense the question is
whether or not numFound is the numGroups or numTotal. To me it should be
the number of groups because logically that is what the resultset shows and
the new attribute should point to the number of total.

Thanks
Amit

Re: Suggestion that preserve original phrase case

2013-01-17 Thread Selvam

Thanks again Eric. This time I got it working :). Infact your first
response itself had clear explanation, somehow I did not understand it
completely!


On Thu, Jan 17, 2013 at 6:59 PM, Erick Erickson wrote:

> You could write a custom Filter (or perhaps Tokenizer), but I usually
> just do it on the input side before things get sent to Solr.
>
> I don't think PatternReplaceCharFilterFactory will help, you could
> easily turn the input into original:original, but then you'd need to
> write a custom filter that normalized the left-hand-side but not the
> right-hand-side
>
> Best
> Erick
>
> On Tue, Jan 15, 2013 at 11:27 AM, Selvam  wrote:
> > Thanks Erick, can you tell me how to do the appending
> > (lowercaseversion:LowerCaseVersion) before indexing. I tried pattern
> > factory filters, but I could not get it right.
> >
> >
> > On Sun, Jan 13, 2013 at 8:49 PM, Erick Erickson  >wrote:
> >
> >> One way I've seen this done is to index pairs like
> >> lowercaseversion:LowerCaseVersion. You can't push this whole thing
> through
> >> your field as defined since it'll all be lowercased, you have to produce
> >> the left hand side of the above yourself and just use KeywordTokenizer
> >> without LowercaseFilter.
> >>
> >> Then, your application displays the right-hand-side of the returned
> token.
> >>
> >> Simple solution, not very elegant, but sometimes the easiest...
> >>
> >> Best
> >> Erick
> >>
> >>
> >> On Fri, Jan 11, 2013 at 1:30 AM, Selvam  wrote:
> >>
> >> > Hi*,
> >> >
> >> > *
> >> > I have been trying to figure out a way for case insensitive suggestion
> >> but
> >> > which should return original phrase as result.* *I am using* *solr
> 3.5*
> >> >
> >> > *
> >> > *For eg:
> >> >
> >> > *
> >> > If I index 'Hello world' and search  for 'hello' it needs to return
> >> *'Hello
> >> > world'* not *'hello world'. *My configurations are as follows,*
> >> > *
> >> > *
> >> > New field type:*
> >> > 
> >> >   
> >> >
> >> > 
> >> > 
> >> >
> >> > *Field values*:
> >> > >> > termVectors="true" omitNorms="true"/>
> >> > >> > stored="true" multiValued="false"/>
> >> >
> >> >
> >> > *Spellcheck Component*:
> >> >   
> >> > text_auto
> >> > 
> >> >  suggest
> >> >   >> name="classname">org.apache.solr.spelling.suggest.Suggester
> >> >   >> > name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup
> >> > true
> >> > true
> >> > label_autocomplete
> >> >   
> >> > 
> >> >
> >> >
> >> > Kindly share your suggestions to implement this behavior.
> >> >
> >> > --
> >> > Regards,
> >> > Selvam
> >> > KnackForge 
> >> > Acquia Service Partner
> >> > No. 1, 12th Line, K.K. Road, Venkatapuram,
> >> > Ambattur, Chennai,
> >> > Tamil Nadu, India.
> >> > PIN - 600 053.
> >> >
> >>
> >
> >
> >
> > --
> > Regards,
> > Selvam
> > KnackForge 
> > Acquia Service Partner
> > No. 1, 12th Line, K.K. Road, Venkatapuram,
> > Ambattur, Chennai,
> > Tamil Nadu, India.
> > PIN - 600 053.
>



-- 
Regards,
Selvam
KnackForge 
Acquia Service Partner
No. 1, 12th Line, K.K. Road, Venkatapuram,
Ambattur, Chennai,
Tamil Nadu, India.
PIN - 600 053.

Re: Questions about boosting

2013-01-17 Thread Walter Underwood

Have you tried boost query?  bq=provider:fred

wunder

On Jan 17, 2013, at 9:08 PM, Jack Krupansky wrote:

> Start with "Query Elevation" and see if that helps:
> http://wiki.apache.org/solr/QueryElevationComponent
> 
> Index-time document boost is a possibility.
> 
> Maybe an ExternalFileField where every document could have a dynamic boost 
> value that you add with a boost function.
> 
> -- Jack Krupansky
> 
> -Original Message- From: Shawn Heisey
> Sent: Thursday, January 17, 2013 4:11 PM
> To: solr-user@lucene.apache.org
> Subject: Questions about boosting
> 
> I've been trying to figure this out on my own, but I've come up empty so
> far.  I need to boost documents from a certain provider.  The idea is
> that if any documents in a result match a separate query (like
> provider:bigbucks), I need to multiply the score by X.  It's important
> that the result set of the actual query is not changed, just the order.
> 
> I've tried a few things from the relevancy page on the wiki but so far I
> can't seem to get anything to work.  What syntax should I be using?  Is
> it possible to do this at query time?
> 
> Thanks,
> Shawn

Re: Is required="true" useless in dynamicField?

2013-01-17 Thread Jack Krupansky

Solr will ignore "required" for dynamic fields. It will be parsed and 
preserved, but will not affect the check for required fields in an input 
document.


Ditto for "default" value for a dynamic field.

-- Jack Krupansky

-Original Message- 
From: Alexandre Rafalovitch

Sent: Friday, January 18, 2013 12:08 AM
To: solr-user@lucene.apache.org
Subject: Is required="true" useless in dynamicField?

Hello,

Given the definition:


Does it actually matter whether I specify required? I guess there is no way
to have it enforced, right?

Looking at the Wiki, dynamicField does not actually say what parameters it
cares about, so it probably does not even read it from the definition.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

Re: build CMIS compatible Solr

2013-01-17 Thread Nicholas Li

I want to make something like Alfresco, but not having that many features.
And I'd like to utilise the searching ability of Solr.

On Fri, Jan 18, 2013 at 4:11 PM, Gora Mohanty  wrote:

> On 18 January 2013 10:36, Nicholas Li  wrote:
> > hi
> >
> > I am new to solr and I would like to use Solr as my document server, plus
> > search engine. But solr is not CMIS compatible( While it shoud not be, as
> > it is not build as a pure document management server).  In that sense, I
> > would build another layer beyond Solr so that the exposed interface would
> > be CMIS compatible.
> [...]
>
> May I ask why? Solr is designed to be a search engine,
> which is a very different beast from a document repository.
> In the open-source world, Alfresco ( http://www.alfresco.com/ )
> already exists, can index into Solr, and supports CMIS-based
> access.
>
> Regards,
> Gora
>

Re: What is the difference in defining multiValued on field and or fieldtype?

2013-01-17 Thread Jack Krupansky

Yes.

-- Jack Krupansky

-Original Message- 
From: Alexandre Rafalovitch

Sent: Friday, January 18, 2013 12:26 AM
To: solr-user@lucene.apache.org
Subject: Re: What is the difference in defining multiValued on field and or 
fieldtype?

Thank you Jack,

I just realized that perhaps ignored was a bad example. But if I understood
correctly, then I can specify multiValued on the type and not do so on the
field itself and I still get multiValued entries.

That's good to know.

Regards,
  Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

On Fri, Jan 18, 2013 at 12:19 AM, Jack Krupansky 
wrote:

Specifying an attribute on the field type makes it the default for any
field of that type.

Setting multiValued=true on "ignored" simply allows it to be used for any
field, whether it is single or multi-valued, and any source data, whether
it has one or multiple values for that ignored field. Otherwise, you would
get an error if multiple values were given for an ignored field which had
no multiValued attribute, while the stated goal is to simply ignore the
field and its incoming values.

-- Jack Krupansky

-Original Message- From: Alexandre Rafalovitch
Sent: Thursday, January 17, 2013 6:20 PM
To: solr-user@lucene.apache.org
Subject: What is the difference in defining multiValued on field and or
fieldtype?

Hello,

I was looking at the 'ignored' field in the example's schema.xml and
suddenly noticed that its field type has multiValued=true in the
definition. Wiki confirms that it is possible, but does not explains.

What's the difference between defining it on the type and on the field
itself? Because example has it defined on both.

I am confused suddenly, because we now have permutation of 9 different
values (true/false/missing ^ 2) and I am not sure what the exact semantics
is.

I am mostly interested in fieldType/@multiValued=true impact, but curious
about the other permutations.

Thanks,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: 
http://www.linkedin.com/in/**alexandrerafalovitch

- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

Re: What is the difference in defining multiValued on field and or fieldtype?

2013-01-17 Thread Alexandre Rafalovitch

Thank you Jack,

I just realized that perhaps ignored was a bad example. But if I understood
correctly, then I can specify multiValued on the type and not do so on the
field itself and I still get multiValued entries.

That's good to know.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Jan 18, 2013 at 12:19 AM, Jack Krupansky wrote:

> Specifying an attribute on the field type makes it the default for any
> field of that type.
>
> Setting multiValued=true on "ignored" simply allows it to be used for any
> field, whether it is single or multi-valued, and any source data, whether
> it has one or multiple values for that ignored field. Otherwise, you would
> get an error if multiple values were given for an ignored field which had
> no multiValued attribute, while the stated goal is to simply ignore the
> field and its incoming values.
>
> -- Jack Krupansky
>
> -Original Message- From: Alexandre Rafalovitch
> Sent: Thursday, January 17, 2013 6:20 PM
> To: solr-user@lucene.apache.org
> Subject: What is the difference in defining multiValued on field and or
> fieldtype?
>
>
> Hello,
>
> I was looking at the 'ignored' field in the example's schema.xml and
> suddenly noticed that its field type has multiValued=true in the
> definition. Wiki confirms that it is possible, but does not explains.
>
> What's the difference between defining it on the type and on the field
> itself? Because example has it defined on both.
>
> I am confused suddenly, because we now have permutation of 9 different
> values (true/false/missing ^ 2) and I am not sure what the exact semantics
> is.
>
> I am mostly interested in fieldType/@multiValued=true impact, but curious
> about the other permutations.
>
> Thanks,
>Alex.
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: 
> http://www.linkedin.com/in/**alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>

Re: Solr cache considerations

2013-01-17 Thread Isaac Hebsh

Unfortunately, it seems (
http://lucene.472066.n3.nabble.com/Nrt-and-caching-td3993612.html) that
these caches are not per-segment. In this case, I want to (soft) commit
less frequently. Am I right?

Tomás, as the fieldValueCache is very similar to lucene's FieldCache, I
guess it has a big contribution to standard (not only faceted) queries
time. SolrWiki claims that it primarily used by faceting. What that says
about complex textual queries?

documentCache:
Erick, After a query processing is finished, doesn't some documents stay in
the documentCache? can't I use it to accelerate queries that should
retrieve stored fields of documents? In this case, a big documentCache can
hold more documents..

About commit frequency:
HardCommit: "openSearch=false" seems as a nice solution. Where can I read
about this? (found nothing but one unexplained sentence in SolrWiki).
SoftCommit: In my case, the required index freshness is 10 minutes. The
plan to soft commit every 10 minutes is similar to storing all of the
documents in a queue (outside to Solr), an indexing a bulk every 10 minutes.

Thanks.


On Fri, Jan 18, 2013 at 2:15 AM, Tomás Fernández Löbbe <
tomasflo...@gmail.com> wrote:

> I think fieldValueCache is not per segment, only fieldCache is. However,
> unless I'm missing something, this cache is only used for faceting on
> multivalued fields
>
>
> On Thu, Jan 17, 2013 at 8:58 PM, Erick Erickson  >wrote:
>
> > filterCache: This is bounded by 1M * (maxDoc) / 8 * (num filters in
> > cache). Notice the /8. This reflects the fact that the filters are
> > represented by a bitset on the _internal_ Lucene ID. UniqueId has no
> > bearing here whatsoever. This is, in a nutshell, why warming is
> > required, the internal Lucene IDs may change. Note also that it's
> > maxDoc, the internal arrays have "holes" for deleted documents.
> >
> > Note this is an _upper_ bound, if there are only a few docs that
> > match, the size will be (num of matching docs) * sizeof(int)).
> >
> > fieldValueCache. I don't think so, although I'm a bit fuzzy on this.
> > It depends on whether these are "per-segment" caches or not. Any "per
> > segment" cache is still valid.
> >
> > Think of documentCache as intended to hold the stored fields while
> > various components operate on it, thus avoiding repeatedly fetching
> > the data from disk. It's _usually_ not too big a worry.
> >
> > About hard-commits once a day. That's _extremely_ long. Think instead
> > of committing more frequently with openSearcher=false. If nothing
> > else, you transaction log will grow lots and lots and lots. I'm
> > thinking on the order of 15 minutes, or possibly even much less. With
> > softCommits happening more often, maybe every 15 seconds. In fact, I'd
> > start out with soft commits every 15 seconds and hard commits
> > (openSearcher=false) every 5 minutes. The problem with hard commits
> > being once a day is that, if for any reason the server is interrupted,
> > on startup Solr will try to replay the entire transaction log to
> > assure index integrity. Not to mention that your tlog will be huge.
> > Not to mention that there is some memory usage for each document in
> > the tlog. Hard commits roll over the tlog, flush the in-memory tlog
> > pointers, close index segments, etc.
> >
> > Best
> > Erick
> >
> > On Thu, Jan 17, 2013 at 1:29 PM, Isaac Hebsh 
> > wrote:
> > > Hi,
> > >
> > > I am going to build a big Solr (4.0?) index, which holds some dozens of
> > > millions of documents. Each document has some dozens of fields, and one
> > big
> > > textual field.
> > > The queries on the index are non-trivial, and a little-bit long (might
> be
> > > hundreds of terms). No query is identical to another.
> > >
> > > Now, I want to analyze the cache performance (before setting up the
> whole
> > > environment), in order to estimate how much RAM will I need.
> > >
> > > filterCache:
> > > In my scenariom, every query has some filters. let's say that each
> filter
> > > matches 1M documents, out of 10M. Does the estimated memory usage
> should
> > be
> > > 1M * sizeof(uniqueId) * num-of-filters-in-cache?
> > >
> > > fieldValueCache:
> > > Due to the difference between queries, I guess that fieldValueCache is
> > the
> > > most important factor on query performance. Here comes a generic
> > question:
> > > I'm indexing new documents to the index constantly. Soft commits will
> be
> > > performed every 10 mins. Does it say that the cache is meaningless,
> after
> > > every 10 minutes?
> > >
> > > documentCache:
> > > enableLazyFieldLoading will be enabled, and "fl" contains a very small
> > set
> > > of fields. BUT, I need to return highlighting on about (possibly) 20
> > > fields. Does the highlighting component use the documentCache? I guess
> > that
> > > highlighting requires the whole field to be loaded into the
> > documentCache.
> > > Will it happen only for fields that matched a term from the query?
> > >
> > > And one more question: I'm planning to hard-commit

Re: What is the difference in defining multiValued on field and or fieldtype?

2013-01-17 Thread Jack Krupansky

Specifying an attribute on the field type makes it the default for any field 
of that type.


Setting multiValued=true on "ignored" simply allows it to be used for any 
field, whether it is single or multi-valued, and any source data, whether it 
has one or multiple values for that ignored field. Otherwise, you would get 
an error if multiple values were given for an ignored field which had no 
multiValued attribute, while the stated goal is to simply ignore the field 
and its incoming values.


-- Jack Krupansky

-Original Message- 
From: Alexandre Rafalovitch

Sent: Thursday, January 17, 2013 6:20 PM
To: solr-user@lucene.apache.org
Subject: What is the difference in defining multiValued on field and or 
fieldtype?


Hello,

I was looking at the 'ignored' field in the example's schema.xml and
suddenly noticed that its field type has multiValued=true in the
definition. Wiki confirms that it is possible, but does not explains.

What's the difference between defining it on the type and on the field
itself? Because example has it defined on both.

I am confused suddenly, because we now have permutation of 9 different
values (true/false/missing ^ 2) and I am not sure what the exact semantics
is.

I am mostly interested in fieldType/@multiValued=true impact, but curious
about the other permutations.

Thanks,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

Re: searching for q terms that start with a dash/hyphen being interpreted as prohibited clauses

2013-01-17 Thread Jack Krupansky

Or put the term in quotes.

-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Thursday, January 17, 2013 6:59 PM
To: solr-user@lucene.apache.org
Subject: Re: searching for q terms that start with a dash/hyphen being 
interpreted as prohibited clauses

I think all you need to do is escape the hyphen, or have you tried that 
already?

Best
Erick

On Thu, Jan 17, 2013 at 1:38 PM, geeky2  wrote:

hello

environment: solr 3.5

problem statement:

i have a requirement to search for part numbers that start with a dash /
hyphen.

example q= term: *-0004A-0436*

example query:

http://some_url:some_port/some_core/select?facet=false&sort=score+desc%2C+rankNo+asc%2C+partCnt+desc&start=0&q=*-0004A-0436*+itemType%3A1&wt=xml&qt=itemModelNoProductTypeBrandSearch&rows=4

what is happening: query is returning a huge results set.  in reality 
there

is one (1) and only one record in the database with this part number.

i believe this is happening because the dash is being interpreted by the
query parser as a prohibited clause and the effective result is, "give me
everything that does NOT have this part number".

how is this handled so that the search is conducted for the actual part:
-0004A-0436

thx
mark

more information:

request handler in solrconfig.xml

  edismax
  all
  10
  itemModelNoExactMatchStr^30 itemModelNo^.9
divProductTypeDesc^.8 plsBrandDesc^.5
  *:*
  score desc, rankNo desc, partCnt desc
  true
  itemModelDescFacet
  plsBrandDescFacet
  divProductTypeIdFacet

field information from schema.xml (if helpful)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/searching-for-q-terms-that-start-with-a-dash-hyphen-being-interpreted-as-prohibited-clauses-tp4034310.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: build CMIS compatible Solr

2013-01-17 Thread Gora Mohanty

On 18 January 2013 10:36, Nicholas Li  wrote:
> hi
>
> I am new to solr and I would like to use Solr as my document server, plus
> search engine. But solr is not CMIS compatible( While it shoud not be, as
> it is not build as a pure document management server).  In that sense, I
> would build another layer beyond Solr so that the exposed interface would
> be CMIS compatible.
[...]

May I ask why? Solr is designed to be a search engine,
which is a very different beast from a document repository.
In the open-source world, Alfresco ( http://www.alfresco.com/ )
already exists, can index into Solr, and supports CMIS-based
access.

Regards,
Gora

Re: Questions about boosting

2013-01-17 Thread Jack Krupansky


Start with "Query Elevation" and see if that helps:
http://wiki.apache.org/solr/QueryElevationComponent

Index-time document boost is a possibility.

Maybe an ExternalFileField where every document could have a dynamic boost 
value that you add with a boost function.


-- Jack Krupansky

-Original Message- 
From: Shawn Heisey

Sent: Thursday, January 17, 2013 4:11 PM
To: solr-user@lucene.apache.org
Subject: Questions about boosting

I've been trying to figure this out on my own, but I've come up empty so
far.  I need to boost documents from a certain provider.  The idea is
that if any documents in a result match a separate query (like
provider:bigbucks), I need to multiply the score by X.  It's important
that the result set of the actual query is not changed, just the order.

I've tried a few things from the relevancy page on the wiki but so far I
can't seem to get anything to work.  What syntax should I be using?  Is
it possible to do this at query time?

Thanks,
Shawn

build CMIS compatible Solr

2013-01-17 Thread Nicholas Li

hi

I am new to solr and I would like to use Solr as my document server, plus
search engine. But solr is not CMIS compatible( While it shoud not be, as
it is not build as a pure document management server).  In that sense, I
would build another layer beyond Solr so that the exposed interface would
be CMIS compatible.

I did some investigation and looks like OpenCMIS is one of the choices. My
next step would be build this CMIS Bridge layer, which can marshall the
request as CMIS request, then within the CMIS implementation, marshall the
requst as Solr compatible request and send it to Solr. Finally marshall the
Solr response to CMIS compatible response.

Is my logic right?

And, is that any other library other than OpenCMIS to do this job?

cheers.
Nick

Re: Need 'stupid beginner' help with SolrCloud

2013-01-17 Thread Mark Miller

There are a couple ways you can proceed. You can preconfigure some SolrCores in 
solr.xml. Even if you don't, you want a solr.xml, because that is where a lot 
of cloud properties are defined. Or you can use the collections API or the core 
admin API.

I guess I'd recommend the collections API.

You have a couple options for getting in config. I'd recommend using the ZkCli 
tool to upload each of your config sets: 
http://wiki.apache.org/solr/SolrCloud#Getting_your_Configuration_Files_into_ZooKeeper

After that, use the collections API to create the necessary cores on each node.

Another options is to setup solr.xml like you would locally, then start with 
-Dconf_bootstrap=true and it will duplicate your local config and collection 
setup in ZooKeeper.

- Mark

On Jan 17, 2013, at 9:10 PM, Shawn Heisey  wrote:

> I'm trying to get a 2-node SolrCloud install off the ground with the 4.1 
> branch.  This is a new project for a different system than my existing Solr 
> 3.5.0 setup.  It will have one shard and two replicas.
> 
> I have part of the example in /opt/mbsolr4 -- jetty, the war file, logs, etc. 
>  This is the CWD.
> 
> I want all my config and data to live in /index/mbsolr4, so I am using 
> -Dsolr.solr.home=/index/mbsolr4.  This setup mirrors what I am doing for 
> upgrading the other system from 3.5.0 to 4.1, which is not using SolrCloud.
> 
> There is also a separate 3-node zookeeper ensemble, with two of those nodes 
> living on the two Solr servers.
> 
> What do I need in the solr home (/index/mbsolr4) before I start Solr? If I 
> was not using SolrCloud, I would put solr.xml in there, pointing at 
> directories relative to that location.
> 
> I'm going to have multiple collections.  Some of those collections will use 
> the same config/schema, others will use slightly different versions.  I have 
> worked out the zkHost value that I will need:
> 
> -DzkHost=mbzoo1:2181,mbzoo2:2181,mbzoo3:2181/mbsolr1
> 
> I have both Solr servers started and talking to zookeeper, but there are no 
> collections so the UI doesn't work.
> 
> Are the following options enough for me to get my first config & collection 
> into zookeeper/solrcloud -- assuming the config is right?  Do I need 
> numShards and the replica count at this phase?
> 
> -Dbootstrap_confdir=/index/mbsolr4/bootstrapconf
> -Dcollection.configName=mbbasecfg
> 
> Thanks,
> Shawn

Re: Using Solr Spatial in conjunction with HBASE/Hadoop

2013-01-17 Thread Otis Gospodnetic

You'd want to do your Solr spatial query, get IDs from the index, and then
*after* that do a multi get against your HBase table with top N IDs from
Solr's response and get thus get the data back to the caller.  I don't know
how fast multi gets are, what the limitations are, etc.  Maybe somebody
else can address that.

Alternatively, I suppose you could implement a custom collector that does
gets as matching documents are being collected by Solr.  I don't recall the
class/interface you'd need to implement off the top of my head.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/

On Thu, Jan 17, 2013 at 8:01 PM, oakstream
wrote:

> Thanks for your response!  I appreciate it.
>
> There will be cases where I want to "AND or OR" the query between HBASE and
> Lucene.  Would it make sense to custom code querying both repositories at
> the same time or sequentiallyOr are there any tools out there to do
> this?
>
> Basically I'm thinking that HBASE will keep the majority of my data columns
> and lucene will keep the index and a unique pointer to the HBASE record.
>
> Like
> HBASE
>
> UID = 12345, COL1, COL2, COL3, COL4, COL5, COL6
>
> LUCENE
> ID = 999, UID = 12345 , INDEX Columns (LAT/LON)
>
> My query would be something like where lat/lon in (Polygon) AND COL3 =
> 'ABC'
>
> Would this kind of setup make sense?  Is there a better way?
>
> I'll be working with Terabytes of data
>
> Thanks
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Using-Solr-Spatial-in-conjunction-with-HBASE-Hadoop-tp4034307p4034400.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Field Collapsing - Anything in the works for multi-valued fields?

2013-01-17 Thread Otis Gospodnetic

Hi,

Instead of the multi-valued fields, would parent-child setup for you here?

See http://search-lucene.com/?q=solr+join&fc_type=wiki

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Thu, Jan 17, 2013 at 8:04 PM, David Parks  wrote:

> The documents are individual products which come from 1 or more vendors.
> Example: a 'toy spiderman doll' is sold by 2 vendors, that is 1 document.
> Most fields are multi valued (short_description from each of the 2 vendors,
> long_description, product_name, vendor, etc. the same).
>
> I'd like to collapse on the vendor in an attempt to ensure that vast
> collections of books, music, and movies, by just a few vendors, don't
> overwhelm the results simply due to the fact that they have every search
> term imaginable due to the sheer volume of books, CDs, and DVDs, in
> relation
> to other product items.
>
> But in this case there is clearly 1...N vendors per document, solidly a
> multi-valued field. And it's hard to put a maximum number of vendors
> possible.
>
> Thanks,
> Dave
>
>
> -Original Message-
> From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com]
> Sent: Friday, January 18, 2013 2:32 AM
> To: solr-user
> Subject: Re: Field Collapsing - Anything in the works for multi-valued
> fields?
>
> David,
>
> What's the documents and the field? It can help to suggest workaround.
>
>
> On Thu, Jan 17, 2013 at 5:51 PM, David Parks 
> wrote:
>
> > I want to configure Field Collapsing, but my target field is
> > multi-valued (e.g. the field I want to group on has a variable # of
> > entries per document, 1-N entries).
> >
> > I read on the wiki (http://wiki.apache.org/solr/FieldCollapsing) that
> > grouping doesn't support multi-valued fields yet.
> >
> > Anything in the works on that front by chance?  Any common work-arounds?
> >
> >
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
>  
>
>

Need 'stupid beginner' help with SolrCloud

2013-01-17 Thread Shawn Heisey

I'm trying to get a 2-node SolrCloud install off the ground with the 4.1 
branch.  This is a new project for a different system than my existing 
Solr 3.5.0 setup.  It will have one shard and two replicas.


I have part of the example in /opt/mbsolr4 -- jetty, the war file, logs, 
etc.  This is the CWD.


I want all my config and data to live in /index/mbsolr4, so I am using 
-Dsolr.solr.home=/index/mbsolr4.  This setup mirrors what I am doing for 
upgrading the other system from 3.5.0 to 4.1, which is not using SolrCloud.


There is also a separate 3-node zookeeper ensemble, with two of those 
nodes living on the two Solr servers.


What do I need in the solr home (/index/mbsolr4) before I start Solr? 
If I was not using SolrCloud, I would put solr.xml in there, pointing at 
directories relative to that location.


I'm going to have multiple collections.  Some of those collections will 
use the same config/schema, others will use slightly different versions. 
 I have worked out the zkHost value that I will need:


-DzkHost=mbzoo1:2181,mbzoo2:2181,mbzoo3:2181/mbsolr1

I have both Solr servers started and talking to zookeeper, but there are 
no collections so the UI doesn't work.


Are the following options enough for me to get my first config & 
collection into zookeeper/solrcloud -- assuming the config is right?  Do 
I need numShards and the replica count at this phase?


-Dbootstrap_confdir=/index/mbsolr4/bootstrapconf
-Dcollection.configName=mbbasecfg

Thanks,
Shawn

RE: Field Collapsing - Anything in the works for multi-valued fields?

2013-01-17 Thread David Parks

The documents are individual products which come from 1 or more vendors.
Example: a 'toy spiderman doll' is sold by 2 vendors, that is 1 document.
Most fields are multi valued (short_description from each of the 2 vendors,
long_description, product_name, vendor, etc. the same).

I'd like to collapse on the vendor in an attempt to ensure that vast
collections of books, music, and movies, by just a few vendors, don't
overwhelm the results simply due to the fact that they have every search
term imaginable due to the sheer volume of books, CDs, and DVDs, in relation
to other product items.

But in this case there is clearly 1...N vendors per document, solidly a
multi-valued field. And it's hard to put a maximum number of vendors
possible.

Thanks,
Dave

-Original Message-
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] 
Sent: Friday, January 18, 2013 2:32 AM
To: solr-user
Subject: Re: Field Collapsing - Anything in the works for multi-valued
fields?

David,

What's the documents and the field? It can help to suggest workaround.

On Thu, Jan 17, 2013 at 5:51 PM, David Parks  wrote:

> I want to configure Field Collapsing, but my target field is 
> multi-valued (e.g. the field I want to group on has a variable # of 
> entries per document, 1-N entries).
>
> I read on the wiki (http://wiki.apache.org/solr/FieldCollapsing) that 
> grouping doesn't support multi-valued fields yet.
>
> Anything in the works on that front by chance?  Any common work-arounds?
>
>
>

--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Re: Using Solr Spatial in conjunction with HBASE/Hadoop

2013-01-17 Thread oakstream

Thanks for your response!  I appreciate it.  

There will be cases where I want to "AND or OR" the query between HBASE and
Lucene.  Would it make sense to custom code querying both repositories at
the same time or sequentiallyOr are there any tools out there to do
this?

Basically I'm thinking that HBASE will keep the majority of my data columns
and lucene will keep the index and a unique pointer to the HBASE record. 

Like
HBASE

UID = 12345, COL1, COL2, COL3, COL4, COL5, COL6

LUCENE
ID = 999, UID = 12345 , INDEX Columns (LAT/LON)

My query would be something like where lat/lon in (Polygon) AND COL3 = 'ABC'

Would this kind of setup make sense?  Is there a better way?

I'll be working with Terabytes of data

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Solr-Spatial-in-conjunction-with-HBASE-Hadoop-tp4034307p4034400.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr cache considerations

2013-01-17 Thread Tomás Fernández Löbbe

I think fieldValueCache is not per segment, only fieldCache is. However,
unless I'm missing something, this cache is only used for faceting on
multivalued fields


On Thu, Jan 17, 2013 at 8:58 PM, Erick Erickson wrote:

> filterCache: This is bounded by 1M * (maxDoc) / 8 * (num filters in
> cache). Notice the /8. This reflects the fact that the filters are
> represented by a bitset on the _internal_ Lucene ID. UniqueId has no
> bearing here whatsoever. This is, in a nutshell, why warming is
> required, the internal Lucene IDs may change. Note also that it's
> maxDoc, the internal arrays have "holes" for deleted documents.
>
> Note this is an _upper_ bound, if there are only a few docs that
> match, the size will be (num of matching docs) * sizeof(int)).
>
> fieldValueCache. I don't think so, although I'm a bit fuzzy on this.
> It depends on whether these are "per-segment" caches or not. Any "per
> segment" cache is still valid.
>
> Think of documentCache as intended to hold the stored fields while
> various components operate on it, thus avoiding repeatedly fetching
> the data from disk. It's _usually_ not too big a worry.
>
> About hard-commits once a day. That's _extremely_ long. Think instead
> of committing more frequently with openSearcher=false. If nothing
> else, you transaction log will grow lots and lots and lots. I'm
> thinking on the order of 15 minutes, or possibly even much less. With
> softCommits happening more often, maybe every 15 seconds. In fact, I'd
> start out with soft commits every 15 seconds and hard commits
> (openSearcher=false) every 5 minutes. The problem with hard commits
> being once a day is that, if for any reason the server is interrupted,
> on startup Solr will try to replay the entire transaction log to
> assure index integrity. Not to mention that your tlog will be huge.
> Not to mention that there is some memory usage for each document in
> the tlog. Hard commits roll over the tlog, flush the in-memory tlog
> pointers, close index segments, etc.
>
> Best
> Erick
>
> On Thu, Jan 17, 2013 at 1:29 PM, Isaac Hebsh 
> wrote:
> > Hi,
> >
> > I am going to build a big Solr (4.0?) index, which holds some dozens of
> > millions of documents. Each document has some dozens of fields, and one
> big
> > textual field.
> > The queries on the index are non-trivial, and a little-bit long (might be
> > hundreds of terms). No query is identical to another.
> >
> > Now, I want to analyze the cache performance (before setting up the whole
> > environment), in order to estimate how much RAM will I need.
> >
> > filterCache:
> > In my scenariom, every query has some filters. let's say that each filter
> > matches 1M documents, out of 10M. Does the estimated memory usage should
> be
> > 1M * sizeof(uniqueId) * num-of-filters-in-cache?
> >
> > fieldValueCache:
> > Due to the difference between queries, I guess that fieldValueCache is
> the
> > most important factor on query performance. Here comes a generic
> question:
> > I'm indexing new documents to the index constantly. Soft commits will be
> > performed every 10 mins. Does it say that the cache is meaningless, after
> > every 10 minutes?
> >
> > documentCache:
> > enableLazyFieldLoading will be enabled, and "fl" contains a very small
> set
> > of fields. BUT, I need to return highlighting on about (possibly) 20
> > fields. Does the highlighting component use the documentCache? I guess
> that
> > highlighting requires the whole field to be loaded into the
> documentCache.
> > Will it happen only for fields that matched a term from the query?
> >
> > And one more question: I'm planning to hard-commit once a day. Should I
> > prepare to a significant RAM usage growth between hard-commits?
> (consider a
> > lot of new documents in this period...)
> > Does this RAM comes from the same pool as the caches? An OutOfMemory
> > exception can happen is this scenario?
> >
> > Thanks a lot.
>

Re: searching for q terms that start with a dash/hyphen being interpreted as prohibited clauses

2013-01-17 Thread Erick Erickson

I think all you need to do is escape the hyphen, or have you tried that already?

Best
Erick

On Thu, Jan 17, 2013 at 1:38 PM, geeky2  wrote:
> hello
>
> environment: solr 3.5
>
> problem statement:
>
> i have a requirement to search for part numbers that start with a dash /
> hyphen.
>
> example q= term: *-0004A-0436*
>
> example query:
>
> http://some_url:some_port/some_core/select?facet=false&sort=score+desc%2C+rankNo+asc%2C+partCnt+desc&start=0&q=*-0004A-0436*+itemType%3A1&wt=xml&qt=itemModelNoProductTypeBrandSearch&rows=4
>
> what is happening: query is returning a huge results set.  in reality there
> is one (1) and only one record in the database with this part number.
>
> i believe this is happening because the dash is being interpreted by the
> query parser as a prohibited clause and the effective result is, "give me
> everything that does NOT have this part number".
>
> how is this handled so that the search is conducted for the actual part:
> -0004A-0436
>
> thx
> mark
>
> more information:
>
> request handler in solrconfig.xml
>
>class="solr.SearchHandler" default="false">
> 
>   edismax
>   all
>   10
>   itemModelNoExactMatchStr^30 itemModelNo^.9
> divProductTypeDesc^.8 plsBrandDesc^.5
>   *:*
>   score desc, rankNo desc, partCnt desc
>   true
>   itemModelDescFacet
>   plsBrandDescFacet
>   divProductTypeIdFacet
> 
> 
> 
> 
> 
>   
>
>
> field information from schema.xml (if helpful)
>
>  indexed="true" stored="true"/>
>
>  stored="true" omitNorms="true"/>
>
>  indexed="true" stored="true" multiValued="true"/>
>
>  stored="true" multiValued="true"/>
>
>
>  positionIncrementGap="100">
>   
> 
> 
> 
>   
>   
> 
>  ignoreCase="true" expand="true"/>
> 
>   
> 
>
>  positionIncrementGap="100">
>   
> 
>
>
>  words="stopwords.txt" enablePositionIncrements="true"/>
>  replacement="" replace="all"/>
>  maxGramSize="15" side="front"/>
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
> preserveOriginal="1"/>
> 
>  protected="protwords.txt"/>
> 
>   
>
>  positionIncrementGap="100">
>   
> 
>  words="stopwords.txt" enablePositionIncrements="true"/>
>  synonyms="synonyms_SHC.txt" ignoreCase="true" expand="true"/>
> 
>  maxGramSize="15" side="front"/>
>   
>   
> 
>  words="stopwords.txt" enablePositionIncrements="true"/>
> 
>   
> 
>
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/searching-for-q-terms-that-start-with-a-dash-hyphen-being-interpreted-as-prohibited-clauses-tp4034310.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr cache considerations

2013-01-17 Thread Erick Erickson

filterCache: This is bounded by 1M * (maxDoc) / 8 * (num filters in
cache). Notice the /8. This reflects the fact that the filters are
represented by a bitset on the _internal_ Lucene ID. UniqueId has no
bearing here whatsoever. This is, in a nutshell, why warming is
required, the internal Lucene IDs may change. Note also that it's
maxDoc, the internal arrays have "holes" for deleted documents.

Note this is an _upper_ bound, if there are only a few docs that
match, the size will be (num of matching docs) * sizeof(int)).

fieldValueCache. I don't think so, although I'm a bit fuzzy on this.
It depends on whether these are "per-segment" caches or not. Any "per
segment" cache is still valid.

Think of documentCache as intended to hold the stored fields while
various components operate on it, thus avoiding repeatedly fetching
the data from disk. It's _usually_ not too big a worry.

About hard-commits once a day. That's _extremely_ long. Think instead
of committing more frequently with openSearcher=false. If nothing
else, you transaction log will grow lots and lots and lots. I'm
thinking on the order of 15 minutes, or possibly even much less. With
softCommits happening more often, maybe every 15 seconds. In fact, I'd
start out with soft commits every 15 seconds and hard commits
(openSearcher=false) every 5 minutes. The problem with hard commits
being once a day is that, if for any reason the server is interrupted,
on startup Solr will try to replay the entire transaction log to
assure index integrity. Not to mention that your tlog will be huge.
Not to mention that there is some memory usage for each document in
the tlog. Hard commits roll over the tlog, flush the in-memory tlog
pointers, close index segments, etc.

Best
Erick

On Thu, Jan 17, 2013 at 1:29 PM, Isaac Hebsh  wrote:
> Hi,
>
> I am going to build a big Solr (4.0?) index, which holds some dozens of
> millions of documents. Each document has some dozens of fields, and one big
> textual field.
> The queries on the index are non-trivial, and a little-bit long (might be
> hundreds of terms). No query is identical to another.
>
> Now, I want to analyze the cache performance (before setting up the whole
> environment), in order to estimate how much RAM will I need.
>
> filterCache:
> In my scenariom, every query has some filters. let's say that each filter
> matches 1M documents, out of 10M. Does the estimated memory usage should be
> 1M * sizeof(uniqueId) * num-of-filters-in-cache?
>
> fieldValueCache:
> Due to the difference between queries, I guess that fieldValueCache is the
> most important factor on query performance. Here comes a generic question:
> I'm indexing new documents to the index constantly. Soft commits will be
> performed every 10 mins. Does it say that the cache is meaningless, after
> every 10 minutes?
>
> documentCache:
> enableLazyFieldLoading will be enabled, and "fl" contains a very small set
> of fields. BUT, I need to return highlighting on about (possibly) 20
> fields. Does the highlighting component use the documentCache? I guess that
> highlighting requires the whole field to be loaded into the documentCache.
> Will it happen only for fields that matched a term from the query?
>
> And one more question: I'm planning to hard-commit once a day. Should I
> prepare to a significant RAM usage growth between hard-commits? (consider a
> lot of new documents in this period...)
> Does this RAM comes from the same pool as the caches? An OutOfMemory
> exception can happen is this scenario?
>
> Thanks a lot.

Re: SOlr 3.5 and sharding

2013-01-17 Thread Erick Erickson

Hmmm, Maybe I'm finally getting it.

Right, that does seem odd. I would expect you to get 4x the number of
docs on any particular shard/replica in this situation.

What happens you look at the Solr logs for each partition? You should
be able to glean the num results from the logs. I guess there are a
couple of possibilities
1> each machine actually returns N documents, but the aggregator does
something weird and gives you < 4X. Indicating something's peculiar
with the Solr aggregation.
2> you find that, for some reason, you aren't getting the same count
_at the server level_, indicating your assertion that "all the indexes
are identical" isn't valid.

All of which means I'm pretty much out of ideas, it's hunt-and-seek time.

Erick

On Thu, Jan 17, 2013 at 10:53 AM, Jean-Sebastien Vachon
 wrote:
> Hi Erick,
>
> It looks like we are saying the exact same thing but with different terms ;)
> I looked at the Solr glossary and you might be right.. maybe I should talk 
> about partitions instead of shards.
>
> Since my last message, I`ve configured the replication between the master and 
> slave and everything is working fine except for my original question about 
> the number of documents not matching my expectations.
>
> I`ll try to clarify a few things and come back to this question...
>
> Machine A (which I called the master node) is where the indexation takes 
> place.
> It consist of four Solr instances that will (eventually ) contain  1/4 of the 
> entire collection. It`s just that, at this moment, since I have no control on 
> which partition a given document is sent, I made copies of the same index for 
> all partitions. Each Solr instance  has a replication handler configured. I 
> will eventually get to the point of changing the indexation code to 
> distribute documents evenly on all partitions but the person who can give me 
> access to this portion is not available right now so I can do nothing about 
> it.
>
> Machine B has the same four shards setup to be replicas of the corresponding 
> shard on machine A.
> Machine B also contains another Solr instance with the default handler 
> configured to use the four local partitions. This instance receives client`s 
> requests, collect the results from each partition and then select the best 
> matches to form the final response. We intent to add new slaves being exact 
> copies of Machine B and load balance clients requests on all slaves.
>
> My original question was that if each partition has 1000 documents matching a 
> certain keyword and that I know all partitions have the same content then I 
> was expecting to receive 4*1000 documents for the same keyword. But that is 
> not the case.
> The replication is not an issue here since the same request on the master 
> node will give me the same result.
>
> Each shard when called individually will give 1000 documents. But when I call 
> them using the shards=xxx parameters then I am getting a little less than 
> 4000 documents. I was just curious to know why this was happening... Is this 
> a bug? Or something I am misunderstanding...
>
> Thanks for your time and contribution to Solr!
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: January-17-13 8:46 AM
> To: solr-user@lucene.apache.org
> Subject: Re: SOlr 3.5 and sharding
>
> You're still confusing shards (or at least mixing up the terminology) with 
> simple replication. Shards are when you split up the index into several sub 
> indexes and configure the sub-indexes to "know about each other". Say you 
> have 1M docs in 2 shards. 500K of them would go on one shard and 500K on the 
> other. But logically you have a single index of 1M docs. So the two shards 
> have to know about each other and when you send a request to one of them, it 
> automatically queries the other (as well as itself), collects the response 
> and combines them, returning the top N to the requester.
>
> This is totally different from replication. In replication (master/slave), 
> each node has all 1M documents. Each node can work totally in isolation. An 
> incoming request is handled by the slave without contacting any other node.
>
> If you're copying around indexes AND configuring them as though they were 
> shards, each request will be distributed to all shards and the results 
> collated, giving you the same doc repeatedly in your result set.
>
> If you have no access to the indexing code, you really can't go to a sharded 
> setup.
>
> Polling is when the slaves periodically ask the master "has anything 
> changed"? If so then the slave pulls down the changes. The polling interval 
> is configured in solrconfig.xml _on the slave_. So let's say you index docs 
> to the master. For some interval, until the slaves poll the master and get an 
> updated index, the number of searchable docs on the master will be different 
> than for the slaves. Additionally, you may have the issue of the polling 
> intervals for the slaves being offset

Re: Using Solr Spatial in conjunction with HBASE/Hadoop

2013-01-17 Thread Otis Gospodnetic

Hi,

You certainly can do that, but you'll need to suck all data out of HBase
and index it in Solr first.  And then presumably you'll want to keep the 2
more or less in sync via incremental indexing.  Maybe Lily project can
help?  If not, you'll have to write something that scans HBase and indexes,
say via SolrJ.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Thu, Jan 17, 2013 at 1:26 PM, oakstream
wrote:

> Hello,
> I have point data (lat/lon) stored in hbase/hadoop and would like to query
> the data spatially with polygons.  (If I pass in a few polygons find me all
> the records that exist within these polygons.  I need it to support
> polygons
> not just box queries).  Hadoop doesn't really have much support that I
> could
> find for these types of queries.  I was wondering if I could leverage SOLR
> spatial 4 and create spatial indexes on the hbase data that could be used
> to
> query this data?? I need near real-time answers (within a couple seconds).
>
> If anyone has any thoughts on this I would greatly appreciate them.
>
> Thank you
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Using-Solr-Spatial-in-conjunction-with-HBASE-Hadoop-tp4034307.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: how to get abortOnConfigurationError=false working

2013-01-17 Thread Shawn Heisey


On 1/17/2013 2:01 PM, snake wrote:

I think your not understanding the issue.Imagine www.acme.com has created a
collection.
This resides in d:\acme.com\wwwroot\collections

Then they decide to redo their website, or they get a new developer who
decides not to use collections, or they simply move hosts, so they delete
the old one.
The collection is now gone.
Solr now cannot find the config files for that collection since they are
gone, so solr crashes and breaks every other website on the entire server
that is using solr.
The customers have no idea this will happen, no knowledge about having to
get collections removed properly etc, so saying "they should do this and
that" simply wont happen so is not a solution.


Solr has no security measures.  If you are giving customers direct 
access to one or more directories on your Solr server, there are a LOT 
of ways that they can cause you problems, intentionally or not.


By adding a jar to their data directory and referencing it in their 
config, they can do just about anything.  Custom Solr components could 
be written that do one or more of the following:


- Tie up all of Solr's memory and cause it to crash.
- Grant general access to the server as the user that runs solr.
- Utilize a security vulnerability and gain admin access.

Changes need to be checked before implementation.  If a customer wants 
to use custom components, that would require extra scrutiny.  I can't 
think of any way to fully protect your server without requiring human 
intervention for all changes.


Thanks,
Shawn

Re: how to get abortOnConfigurationError=false working

2013-01-17 Thread Alexandre Rafalovitch

Solr 4 most definitely ignores missing cores (just run into that
accidentally again myself). So, if you start Solr and directory is missing,
it will survive (but complain).

The other problem is what happens when a customer deletes the account and
the core directory disappears in a middle of open searcher. I would suggest
some-sort of pre-delete trigger that hits Solr admin interface and unloads
that core first.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

On Thu, Jan 17, 2013 at 4:03 PM, Yonik Seeley  wrote:

> On Thu, Jan 17, 2013 at 3:40 PM, snake  wrote:
> > Ok so is there any other to stop this problem I am having where any site
> > can break solr by delering their collection?
> > Seems odd everyone would vote to remove a feature that would make solr
> more
> > stable.
>
> I agree.
>
> abortOnConfigurationError was more about a single core... if the core
> would still be loaded if there were config errors.
>
> There *should* be a way to still load other cores if one core has an
> error and is not loaded.  If there's not currently, then we should
> implement it.
>
> -Yonik
> http://lucidworks.com
>

Re: how to get abortOnConfigurationError=false working

2013-01-17 Thread snake

my knowledge of solr is pretty limited, I have only been investigating this
in the last couple of days due to this issue.
The way SOLR is implemented in ColdFusion is with a single core, so all
sites run under same core. I presume a core is like multiple instances ?


On Thu, Jan 17, 2013 at 9:03 PM, Yonik Seeley-4 [via Lucene] <
ml-node+s472066n403435...@n3.nabble.com> wrote:

> On Thu, Jan 17, 2013 at 3:40 PM, snake <[hidden 
> email]>
> wrote:
> > Ok so is there any other to stop this problem I am having where any site
> > can break solr by delering their collection?
> > Seems odd everyone would vote to remove a feature that would make solr
> more
> > stable.
>
> I agree.
>
> abortOnConfigurationError was more about a single core... if the core
> would still be loaded if there were config errors.
>
> There *should* be a way to still load other cores if one core has an
> error and is not loaded.  If there's not currently, then we should
> implement it.
>
> -Yonik
> http://lucidworks.com
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149p4034355.html
>  To unsubscribe from how to get abortOnConfigurationError=false working, click
> here
> .
> NAML
>



-- 

--

Russ Michaels

www.bluethunderinternet.com  : Business hosting services & solutions
www.cfmldeveloper.com: ColdFusion developer community
www.michaels.me.uk   : my blog
www.cfsearch.com : ColdFusion search engine
**
*skype me* : russmichaels




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149p4034358.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Why do I keep seeing org.apache.solr.core.SolrCore execute in the tomcat logs

2013-01-17 Thread Alexandre Rafalovitch

You must have an Admin UI open and pointing at Logging section. So, it
sends a ping to see if any new log entries were added.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Thu, Jan 17, 2013 at 4:00 PM, eShard  wrote:

> I keep seeing these in the tomcat logs:
> Jan 17, 2013 3:57:33 PM org.apache.solr.core.SolrCore execute
> INFO: [Lisa] webapp=/solr path=/admin/logging
> params={since=1358453312320&wt=jso
> n} status=0 QTime=0
>
> I'm just curious:
> What is getting executed here? I'm not running any queries against this
> core
> or using it in any way currently.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Why-do-I-keep-seeing-org-apache-solr-core-SolrCore-execute-in-the-tomcat-logs-tp4034353.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Questions about boosting

2013-01-17 Thread Shawn Heisey

I've been trying to figure this out on my own, but I've come up empty so 
far.  I need to boost documents from a certain provider.  The idea is 
that if any documents in a result match a separate query (like 
provider:bigbucks), I need to multiply the score by X.  It's important 
that the result set of the actual query is not changed, just the order.


I've tried a few things from the relevancy page on the wiki but so far I 
can't seem to get anything to work.  What syntax should I be using?  Is 
it possible to do this at query time?


Thanks,
Shawn

Re: how to get abortOnConfigurationError=false working

2013-01-17 Thread Yonik Seeley

On Thu, Jan 17, 2013 at 3:40 PM, snake  wrote:
> Ok so is there any other to stop this problem I am having where any site
> can break solr by delering their collection?
> Seems odd everyone would vote to remove a feature that would make solr more
> stable.

I agree.

abortOnConfigurationError was more about a single core... if the core
would still be loaded if there were config errors.

There *should* be a way to still load other cores if one core has an
error and is not loaded.  If there's not currently, then we should
implement it.

-Yonik
http://lucidworks.com

Re: how to get abortOnConfigurationError=false working

2013-01-17 Thread snake

I think your not understanding the issue.Imagine www.acme.com has created a
collection.
This resides in d:\acme.com\wwwroot\collections

Then they decide to redo their website, or they get a new developer who
decides not to use collections, or they simply move hosts, so they delete
the old one.
The collection is now gone.
Solr now cannot find the config files for that collection since they are
gone, so solr crashes and breaks every other website on the entire server
that is using solr.
The customers have no idea this will happen, no knowledge about having to
get collections removed properly etc, so saying "they should do this and
that" simply wont happen so is not a solution.

I need a way to avoid the above scenarios, is it possible?
On Jan 17, 2013 8:43 PM, "Walter Underwood [via Lucene]" <
ml-node+s472066n4034351...@n3.nabble.com> wrote:

> Or a different design.
>
> You can mark collections for deletion, then delete them in an organized,
> safe manner later.
>
> wunder
>
> On Jan 17, 2013, at 12:40 PM, snake wrote:
>
> > Ok so is there any other to stop this problem I am having where any site
> > can break solr by delering their collection?
> > Seems odd everyone would vote to remove a feature that would make solr
> more
> > stable.
> >
>
>
>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149p4034351.html
>  To unsubscribe from how to get abortOnConfigurationError=false working, click
> here
> .
> NAML
>

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149p4034354.html
Sent from the Solr - User mailing list archive at Nabble.com.

Why do I keep seeing org.apache.solr.core.SolrCore execute in the tomcat logs

2013-01-17 Thread eShard

I keep seeing these in the tomcat logs:
Jan 17, 2013 3:57:33 PM org.apache.solr.core.SolrCore execute
INFO: [Lisa] webapp=/solr path=/admin/logging
params={since=1358453312320&wt=jso
n} status=0 QTime=0

I'm just curious:
What is getting executed here? I'm not running any queries against this core
or using it in any way currently.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-do-I-keep-seeing-org-apache-solr-core-SolrCore-execute-in-the-tomcat-logs-tp4034353.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to get abortOnConfigurationError=false working

2013-01-17 Thread Walter Underwood

Or a different design.

You can mark collections for deletion, then delete them in an organized, safe 
manner later.

wunder

On Jan 17, 2013, at 12:40 PM, snake wrote:

> Ok so is there any other to stop this problem I am having where any site
> can break solr by delering their collection?
> Seems odd everyone would vote to remove a feature that would make solr more
> stable.
>

Re: how to get abortOnConfigurationError=false working

2013-01-17 Thread snake

Ok so is there any other to stop this problem I am having where any site
can break solr by delering their collection?
Seems odd everyone would vote to remove a feature that would make solr more
stable.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149p4034349.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Missing documents with ConcurrentUpdateSolrServer (vs. HttpSolrServer) ?

2013-01-17 Thread Shawn Heisey


On 1/17/2013 12:38 PM, Chris Hostetter wrote:


: You're not only giving up the ability to monitor things, you're also giving up
: the ability to detect errors.  All exceptions that get thrown by the internals
: of ConcurrentUpdateSolrServer are swallowed, your code will never know they
: happened.  The client log (slf4j with whatever binding & config you chose) may
: have such errors logged, but they are completely undetectable by the code.

This isn't the first time i've seen someone make this claim, but i really
don't understand it -- ConcurrentUpdateSolrServer has a handleError()
method that gets called when an error happens during the async processing.
By default it just logs the exception, if you want to do something more
interesting with it in your code, just subclass ConcurrentUpdateSolrServer
and override that method -- that's the entire point of that method.

The bigger issue is wether your client cod could reasonable do anything
if/when that method is called -- because it's all async, you probably
can't do much more then log/report it in your own custom way instead of
just using org.slf4j.Logger.


I have my update process (using HttpSolrServer) encapsulated in a method 
that has several parts -- deletes, reinserts, a specific kind of partial 
reindex, and inserting new content.  It ends with a commit().  Any 
exceptions that happen down inside this method are either rethrown or 
propagate.  When the method is called, update position information is 
only updated if it returns without throwing an exception.


For my use case, it is enough to know that an error happened, exactly 
where it happened is not critical unless the problem turns out to be in 
the data - a scenario that has not happened so far.  All failures so far 
have been due to the server or Solr being down.


I understand that many people would want to know which update failed.  I 
hope to come up with a way to make this possible with CUSS out of the box.


Do you have an example of how to override handleError that would make 
error detection easy?  IMHO, either that information should be easily 
accessible to someone who's looking at the javadoc for CUSS, or the 
class should provide an out of the box way to detect errors.


I will work on this problem, not just complain about the current state.

Thanks,
Shawn

Re: Missing documents with ConcurrentUpdateSolrServer (vs. HttpSolrServer) ?

2013-01-17 Thread Uwe Reh


Hi Shawn,

"don't panic"
Due 'historical' reasons, like comparing the different subclasses of 
SolrServer, I have an HttpSolrServer for querys and commits. I've never 
tried to to use the CUSS for anything else than adding documents.


As I wrote, it was a home made problem and not a bug. Sometimes I hope, 
not to be the only dumbass and others may caught in the same trap.


Uwe


Am 17.01.2013 15:52, schrieb Shawn Heisey:

If you are using the same ConcurrentUpdateSolrServer object for all
update interaction with Solr (including commits) and you still have to
do the blockUntilFinished() in your own code before you issue an
explicit commit, that sounds like a bug, and you should put all the
details in a Jira issue.

Re: MultiValue

2013-01-17 Thread Alexandre Rafalovitch

Try my suggested field definition and see if it helps with faceting. It
should. Try it on a small example or a fake schema.

But I would still recommend escalating the problem up the chain to an
architect or similar. Because I bet that data is stored in multiple places
(e.g. in the database) and you will hit a real problem later when you will
try to match a particular data/configuration set back to original sources.

Otherwise, like suggested somewhere else in the chain, you can also look at
update.chain and Request Processors. But you will have to write one
yourself for this situation.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

On Thu, Jan 17, 2013 at 2:50 PM, anurag.jain  wrote:

> @Alexandre Rafalovitch Thanks.
>
> yeah you got my point.
>
>
> training_skill:["c", "c++", "php", "java", ".net"]
> but it is not possible for me to split "php,java,.net"  because data can
> very and data is very large. i mean i have to perform on 5 line  data.
>
> it might come["c++,php,java",".net","c#,ruby", "python  java"] like that.
>
> so i have to perform on this list. just want to ignore [ " , ]
>

Re: MultiValue

2013-01-17 Thread anurag.jain

@Alexandre Rafalovitch Thanks. 

yeah you got my point.


training_skill:["c", "c++", "php", "java", ".net"]
but it is not possible for me to split "php,java,.net"  because data can
very and data is very large. i mean i have to perform on 5 line  data. 

it might come["c++,php,java",".net","c#,ruby", "python  java"] like that. 

so i have to perform on this list. just want to ignore [ " , ] 











--
View this message in context: 
http://lucene.472066.n3.nabble.com/MultiValue-tp4034305p4034339.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: MultiValue

2013-01-17 Thread Alexandre Rafalovitch

I think the problem here is that the list has 3-values, but the last one is
actually a set of several as well. Anurag seem to be able to split them
into separate values whether they came as individual array items or as part
of joint list. So, we have a mix of multiValue submission and desire to
split it out.

The correct solution I suspect would be to normalize everything to just be
training_skill:["c", "c++", "php", "java", ".net"] before this hits Solr.

However, since he wants this for facets and as a training exercise, one
could remember that facets values come from the tokens, not stored value.
So, it might be possible to do this:

I think the filter code will probably just aggregate all tokens despite the
fact that they are spread over multiple values.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

On Thu, Jan 17, 2013 at 2:33 PM, Gora Mohanty  wrote:

> On 18 January 2013 00:31, anurag.jain  wrote:
> >
> >   [ { "last_name" : "jain", "training_skill":["c", "c++",
> "php,java,.net"]
> > }
> > ]
> >
> > actually i want to tokenize in   c c++ php java .net
>
> What do you mean by "tokenize" in this case? It has
> been a while since I had occasion to use JSON input,
> and also do not remember which Solr version introduced
> this, but with a JSON array mapped to a multi-valued
> Solr field, you should get one value per entry in the array.
> http://wiki.apache.org/solr/UpdateJSON#Update_Commands
> seems to be in agreement.
>
> > so through this i can make them as facet.
> >
> >
> > but problem is in list
> > "training_skill":["c", "c++", *"php,java,.net"*]
>
> Faceting should be straightforward. Are you not
> seeing the behaviour described above? Could
> you describe the issues that you are facing in
> more detail?
>
> Regards,
> Gora
>

Re: Missing documents with ConcurrentUpdateSolrServer (vs. HttpSolrServer) ?

2013-01-17 Thread Chris Hostetter


: You're not only giving up the ability to monitor things, you're also giving up
: the ability to detect errors.  All exceptions that get thrown by the internals
: of ConcurrentUpdateSolrServer are swallowed, your code will never know they
: happened.  The client log (slf4j with whatever binding & config you chose) may
: have such errors logged, but they are completely undetectable by the code.

This isn't the first time i've seen someone make this claim, but i really 
don't understand it -- ConcurrentUpdateSolrServer has a handleError() 
method that gets called when an error happens during the async processing.  
By default it just logs the exception, if you want to do something more 
interesting with it in your code, just subclass ConcurrentUpdateSolrServer 
and override that method -- that's the entire point of that method.

The bigger issue is wether your client cod could reasonable do anything 
if/when that method is called -- because it's all async, you probably 
can't do much more then log/report it in your own custom way instead of 
just using org.slf4j.Logger.


-Hoss

Re: MultiValue

2013-01-17 Thread Gora Mohanty

On 18 January 2013 00:31, anurag.jain  wrote:
>
>   [ { "last_name" : "jain", "training_skill":["c", "c++", "php,java,.net"]
> }
> ]
>
> actually i want to tokenize in   c c++ php java .net

What do you mean by "tokenize" in this case? It has
been a while since I had occasion to use JSON input,
and also do not remember which Solr version introduced
this, but with a JSON array mapped to a multi-valued
Solr field, you should get one value per entry in the array.
http://wiki.apache.org/solr/UpdateJSON#Update_Commands
seems to be in agreement.

> so through this i can make them as facet.
>
>
> but problem is in list
> "training_skill":["c", "c++", *"php,java,.net"*]

Faceting should be straightforward. Are you not
seeing the behaviour described above? Could
you describe the issues that you are facing in
more detail?

Regards,
Gora

Re: Field Collapsing - Anything in the works for multi-valued fields?

2013-01-17 Thread Mikhail Khludnev

David,

What's the documents and the field? It can help to suggest workaround.


On Thu, Jan 17, 2013 at 5:51 PM, David Parks  wrote:

> I want to configure Field Collapsing, but my target field is multi-valued
> (e.g. the field I want to group on has a variable # of entries per
> document,
> 1-N entries).
>
> I read on the wiki (http://wiki.apache.org/solr/FieldCollapsing) that
> grouping doesn't support multi-valued fields yet.
>
> Anything in the works on that front by chance?  Any common work-arounds?
>
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Re: how to get abortOnConfigurationError=false working

2013-01-17 Thread Mikhail Khludnev

Snake,

It was killed in 4.0/trunk more than two years ago
https://issues.apache.org/jira/browse/SOLR-1846
"Setting abortOnConfigurationError==false has not worked for some time, and
based on a POLL of existing users, no one seems to need/want it,"
You might be in that rare case when it used to don't work before.


On Thu, Jan 17, 2013 at 6:21 PM, snake  wrote:

> here is what it says in the SOLR info page
>
> Solr Specification Version: 1.4.0.2009.11.18.10.19.05
>  Solr Implementation Version: 1.4.1-dev exported - kvinu - 2009-11-18
> 10:19:05
>  Lucene Specification Version: 2.9.1
>  Lucene Implementation Version: 2.9.1 832363 - 2009-11-03 04:37:25
>
>
>
> On Thu, Jan 17, 2013 at 1:33 PM, Alexandre Rafalovitch [via Lucene] <
> ml-node+s472066n4034156...@n3.nabble.com> wrote:
>
> > Which version of Solr is it for?
> >
> > I had a situation on Solr4, where I basically did not have a directory
> > that
> > solr.xml was pointing at for one of the cores. And Solr continued working
> > but the Admin interface was showing big red banners about configuration
> > problem.
> >
> > So, maybe it was a bug that was fixed for Solr 4?
> >
> > Regards,
> >Alex.
> >
> > Personal blog: http://blog.outerthoughts.com/
> > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > - Time is the quality of nature that keeps events from happening all at
> > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
> >
> >
> > On Thu, Jan 17, 2013 at 8:03 AM, snake <[hidden email]<
> http://user/SendEmail.jtp?type=node&node=4034156&i=0>>
> > wrote:
> >
> > > I will explain the scenario just to avoid all the potential replies
> > asking
> > > why.
> > >
> > > We run coldFusion servers (windows) which has SOLR built in (running on
> > > Jetty).
> > > A customer creates a collection which is stored within their own
> > webspace,
> > > they only have read/write access to their own webspace so cannot put
> > them
> > > anywhere else.
> > >
> > > the default value for abortOnConfigurationError is true.
> > > This causes endless problems when customers make changes to their
> > websites
> > > or cancel their hosting, the collection gets deleted, and SOLR then
> > crashes
> > > because it cannot find the config files for that collection.
> > > We then have to find out which collection is causing the problem, and
> > > manually remove its entry from solr.xml
> > >
> > > Obviously this is a PITA.
> > >
> > > In the error output it says.
> > >
> > > If you want solr to continue after configuration errors, change:
> > > false
> > > in solr.xml
> > >
> > > I have tried this, but it has no effect.
> > > I have also tried putting it in all the solrconfig.xml files
> > > I tried this
> > >
> > >
> >
> ${solr.abortOnConfigurationError:false}
> >
> > > and this
> > > false
> > >
> > > neither had any effect.
> > >
> > > How do you get this to work ?
> > >
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
> >
> > --
> >  If you reply to this email, your message will be added to the discussion
> > below:
> >
> >
> http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149p4034156.html
> >  To unsubscribe from how to get abortOnConfigurationError=false working,
> click
> > here<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4034149&code=cnVzc0BtaWNoYWVscy5tZS51a3w0MDM0MTQ5fDEwMDg4NTg5MzM=
> >
> > .
> > NAML<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
> >
> >
>
>
>
> --
>
> --
>
> Russ Michaels
>
> www.bluethunderinternet.com  : Business hosting services & solutions
> www.cfmldeveloper.com: ColdFusion developer community
> www.michaels.me.uk   : my blog
> www.cfsearch.com : ColdFusion search engine
> **
> *skype me* : russmichaels
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149p4034178.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Re: MultiValue

2013-01-17 Thread anurag.jain

actually  [ { "last_name" : "jain", "training_skill":*["c", "c++",
"php,java,.net"]* }  ]   training_skill is list. and if i want to store in
string field type then it will include [ and , also. so how to avoid ? or it
will not. 


or do you have any other field type definition through which my work will be
easy. 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/MultiValue-tp4034305p4034327.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Function Query vs. Analyzing results

2013-01-17 Thread Mikhail Khludnev

no-no-no. your implementation as slow as result processing, due to using
stored fields.
Fast way is something like
*org.apache.solr.schema.IntField.getValueSource(SchemaField,
QParser)* .
it's worth to check how the standard functions are build - check the static
{} block in org.apache.solr.search.ValueSourceParser
I just googled this tutorial and find it rather useful for you. Feel free
to check.
http://www.solrtutorial.com/custom-solr-functionquery.html


On Thu, Jan 17, 2013 at 8:53 PM, John  wrote:

> Hi Mikhail,
>
> Thanks for the info.
>
> If my FunctionQuery accesses stored fields like that:
>
> public float floatVal(int docNum) {
>
>   Document doc = null;
>   try { doc = reader.document(docNum); } catch (Exception e) {}
>   return getSimilarityScore(doc);
> }
>
> Is it still the same case? Is there a faster way to access document info?
>
>
> Cheers,
>
> John
>
>
>
>
> On Thu, Jan 17, 2013 at 6:40 PM, Mikhail Khludnev <
> mkhlud...@griddynamics.com> wrote:
>
> > Hello John,
> >
> > > getting all the documents and analyzing their result fields?
> >
> > is almost not ever possible. Lucene stored fields usually are really
> slow.
> >
> > when FunctionQueries is backed of field values it uses Lucene FieldCache,
> > which is array of field values that's damn faster.
> >
> > You are welcome.
> >
> >
> > On Thu, Jan 17, 2013 at 8:20 PM, John  wrote:
> >
> > > Hi,
> > >
> > > Is there any performance boost when using FunctionQuery over getting
> all
> > > the documents and analyzing their result fields?
> > >
> > > As far as I understand, Function Query does exactly that, for each
> > matched
> > > document it feches the fields you're interested at, and then it
> > calculates
> > > whatever score mechanism you need.
> > >
> > > Are there some special configurations that I can use that take make
> > > FunctionQueries faster?
> > >
> > > Cheers,
> > > John
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > 
> >  
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Re: MultiValue

2013-01-17 Thread Dikchant Sahi

You mean to say that the problem is with json which is being ingested.

What you are trying to achieve is that you want to split the values on the
basis of comma and index it as multiple value.

What problem you are facing in indexing json in format Solr expects. If you
don't have control over it, probably you can try playing with custom
processors.

On Fri, Jan 18, 2013 at 12:31 AM, anurag.jain  wrote:

>   [ { "last_name" : "jain", "training_skill":["c", "c++", "php,java,.net"]
> }
> ]
>
> actually i want to tokenize in   c c++ php java .net
>
>
> so through this i can make them as facet.
>
>
> but problem is in list
> "training_skill":["c", "c++", *"php,java,.net"*]
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/MultiValue-tp4034305p4034316.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: MultiValue

2013-01-17 Thread anurag.jain

  [ { "last_name" : "jain", "training_skill":["c", "c++", "php,java,.net"] }
]

actually i want to tokenize in   c c++ php java .net


so through this i can make them as facet.


but problem is in list
"training_skill":["c", "c++", *"php,java,.net"*]






--
View this message in context: 
http://lucene.472066.n3.nabble.com/MultiValue-tp4034305p4034316.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: MultiValue

2013-01-17 Thread Dikchant Sahi

you just need to make the field as multivalued.




type should be set based on your search requirements.

On Thu, Jan 17, 2013 at 11:27 PM, anurag.jain  wrote:

> my json file look like
>
> [ { "last_name" : "jain", "training_skill":["c", "c++", "php,java,.net"] }]
>
> can u please suggest me how should i declare field in schema for
> "trainingskill" field
>
>
>
> please reply
>
> urgent
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/MultiValue-tp4034305.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Using Solr Spatial in conjunction with HBASE/Hadoop

2013-01-17 Thread oakstream

Hello,
I have point data (lat/lon) stored in hbase/hadoop and would like to query
the data spatially with polygons.  (If I pass in a few polygons find me all
the records that exist within these polygons.  I need it to support polygons
not just box queries).  Hadoop doesn't really have much support that I could
find for these types of queries.  I was wondering if I could leverage SOLR
spatial 4 and create spatial indexes on the hbase data that could be used to
query this data?? I need near real-time answers (within a couple seconds). 

If anyone has any thoughts on this I would greatly appreciate them.

Thank you



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Solr-Spatial-in-conjunction-with-HBASE-Hadoop-tp4034307.html
Sent from the Solr - User mailing list archive at Nabble.com.

searching for q terms that start with a dash/hyphen being interpreted as prohibited clauses

2013-01-17 Thread geeky2

hello

environment: solr 3.5

problem statement:

i have a requirement to search for part numbers that start with a dash /
hyphen.

example q= term: *-0004A-0436*

example query:

http://some_url:some_port/some_core/select?facet=false&sort=score+desc%2C+rankNo+asc%2C+partCnt+desc&start=0&q=*-0004A-0436*+itemType%3A1&wt=xml&qt=itemModelNoProductTypeBrandSearch&rows=4

what is happening: query is returning a huge results set.  in reality there
is one (1) and only one record in the database with this part number.

i believe this is happening because the dash is being interpreted by the
query parser as a prohibited clause and the effective result is, "give me
everything that does NOT have this part number".

how is this handled so that the search is conducted for the actual part:
-0004A-0436

thx
mark

more information:

request handler in solrconfig.xml

  

  edismax
  all
  10
  itemModelNoExactMatchStr^30 itemModelNo^.9
divProductTypeDesc^.8 plsBrandDesc^.5
  *:*
  score desc, rankNo desc, partCnt desc
  true
  itemModelDescFacet
  plsBrandDescFacet
  divProductTypeIdFacet





  


field information from schema.xml (if helpful)


 








  



  
  



  



  










  


  





  
  



  







--
View this message in context: 
http://lucene.472066.n3.nabble.com/searching-for-q-terms-that-start-with-a-dash-hyphen-being-interpreted-as-prohibited-clauses-tp4034310.html
Sent from the Solr - User mailing list archive at Nabble.com.

MultiValue

2013-01-17 Thread anurag.jain

my json file look like

[ { "last_name" : "jain", "training_skill":["c", "c++", "php,java,.net"] }]

can u please suggest me how should i declare field in schema for
"trainingskill" field



please reply 

urgent





--
View this message in context: 
http://lucene.472066.n3.nabble.com/MultiValue-tp4034305.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Index data from multiple tables into Solr

2013-01-17 Thread hassancrowdc

and if i want to search only three column of my data base and want my
search result to show all the columns(even the one that are not indexed
because i dont want them to be searched) how can i do that?


On Thu, Jan 17, 2013 at 11:56 AM, hassan altaf wrote:

> Can anyone explain delta import section in DIH solr. Also what is pk
> supposed to be in this code? and when it says select description as
> features from feature what is feature and what is features?
>
>
> On Tue, Jan 15, 2013 at 5:34 PM, Shawn Heisey-4 [via Lucene] <
> ml-node+s472066n4033674...@n3.nabble.com> wrote:
>
>> On 1/15/2013 1:37 PM, hassancrowdc wrote:
>> > After indexing data from database to solr. I want to search such that
>> if i
>> > write any word (that is included in the documents been indexed) it
>> should
>> > return all the documents that include that word. But it does not. When
>> i
>> > write http://localhost:8983/solr/select?q=anyword   i gives me error.
>>
>> You haven't told it which core (or collection if using SolrCloud) you
>> want to search.
>>
>> http://localhost:8983/solr/corename/select?q=anyword
>>
>>
>>
>> --
>>  If you reply to this email, your message will be added to the
>> discussion below:
>>
>> http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033674.html
>>  To unsubscribe from Index data from multiple tables into Solr, click
>> here
>> .
>> NAML
>>
>
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4034290.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Function Query vs. Analyzing results

2013-01-17 Thread John

Hi Mikhail,

Thanks for the info.

If my FunctionQuery accesses stored fields like that:

public float floatVal(int docNum) {

  Document doc = null;
  try { doc = reader.document(docNum); } catch (Exception e) {}
  return getSimilarityScore(doc);
}

Is it still the same case? Is there a faster way to access document info?


Cheers,

John




On Thu, Jan 17, 2013 at 6:40 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Hello John,
>
> > getting all the documents and analyzing their result fields?
>
> is almost not ever possible. Lucene stored fields usually are really slow.
>
> when FunctionQueries is backed of field values it uses Lucene FieldCache,
> which is array of field values that's damn faster.
>
> You are welcome.
>
>
> On Thu, Jan 17, 2013 at 8:20 PM, John  wrote:
>
> > Hi,
> >
> > Is there any performance boost when using FunctionQuery over getting all
> > the documents and analyzing their result fields?
> >
> > As far as I understand, Function Query does exactly that, for each
> matched
> > document it feches the fields you're interested at, and then it
> calculates
> > whatever score mechanism you need.
> >
> > Are there some special configurations that I can use that take make
> > FunctionQueries faster?
> >
> > Cheers,
> > John
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
>  
>

Re: Function Query vs. Analyzing results

2013-01-17 Thread Mikhail Khludnev

Hello John,

> getting all the documents and analyzing their result fields?

is almost not ever possible. Lucene stored fields usually are really slow.

when FunctionQueries is backed of field values it uses Lucene FieldCache,
which is array of field values that's damn faster.

You are welcome.

On Thu, Jan 17, 2013 at 8:20 PM, John  wrote:

> Hi,
>
> Is there any performance boost when using FunctionQuery over getting all
> the documents and analyzing their result fields?
>
> As far as I understand, Function Query does exactly that, for each matched
> document it feches the fields you're interested at, and then it calculates
> whatever score mechanism you need.
>
> Are there some special configurations that I can use that take make
> FunctionQueries faster?
>
> Cheers,
> John
>

-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Function Query vs. Analyzing results

2013-01-17 Thread John

Hi,

Is there any performance boost when using FunctionQuery over getting all
the documents and analyzing their result fields?

As far as I understand, Function Query does exactly that, for each matched
document it feches the fields you're interested at, and then it calculates
whatever score mechanism you need.

Are there some special configurations that I can use that take make
FunctionQueries faster?

Cheers,
John

Re: Solr 4 slower than Solr 3.x?

2013-01-17 Thread Otis Gospodnetic

Hello,

Here is another one from the other day:
http://search-lucene.com/m/tqmNjXO51B/SolrCloud+Performance+for+High+Query+Volume

Am I the only one seeing people reporting this? :)

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Mon, Jan 14, 2013 at 10:55 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> I've seen this mentioned on the ML a few times now with the most recent
> one being:
>
>
> http://search-lucene.com/m/mbT4g1fQPr91/?subj=Solr+4+0+upgrade+reduced+performance
>
> Are there any known, good Solr 3.x vs. Solr 4.x benchmarks?
>
> Thanks,
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>

RE: SOlr 3.5 and sharding

2013-01-17 Thread Jean-Sebastien Vachon

Hi Erick,

It looks like we are saying the exact same thing but with different terms ;) 
I looked at the Solr glossary and you might be right.. maybe I should talk 
about partitions instead of shards.

Since my last message, I`ve configured the replication between the master and 
slave and everything is working fine except for my original question about the 
number of documents not matching my expectations.

I`ll try to clarify a few things and come back to this question...

Machine A (which I called the master node) is where the indexation takes place.
It consist of four Solr instances that will (eventually ) contain  1/4 of the 
entire collection. It`s just that, at this moment, since I have no control on 
which partition a given document is sent, I made copies of the same index for 
all partitions. Each Solr instance  has a replication handler configured. I 
will eventually get to the point of changing the indexation code to distribute 
documents evenly on all partitions but the person who can give me access to 
this portion is not available right now so I can do nothing about it.

Machine B has the same four shards setup to be replicas of the corresponding 
shard on machine A.
Machine B also contains another Solr instance with the default handler 
configured to use the four local partitions. This instance receives client`s 
requests, collect the results from each partition and then select the best 
matches to form the final response. We intent to add new slaves being exact 
copies of Machine B and load balance clients requests on all slaves.

My original question was that if each partition has 1000 documents matching a 
certain keyword and that I know all partitions have the same content then I was 
expecting to receive 4*1000 documents for the same keyword. But that is not the 
case.
The replication is not an issue here since the same request on the master node 
will give me the same result.

Each shard when called individually will give 1000 documents. But when I call 
them using the shards=xxx parameters then I am getting a little less than 4000 
documents. I was just curious to know why this was happening... Is this a bug? 
Or something I am misunderstanding...

Thanks for your time and contribution to Solr!

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: January-17-13 8:46 AM
To: solr-user@lucene.apache.org
Subject: Re: SOlr 3.5 and sharding

You're still confusing shards (or at least mixing up the terminology) with 
simple replication. Shards are when you split up the index into several sub 
indexes and configure the sub-indexes to "know about each other". Say you have 
1M docs in 2 shards. 500K of them would go on one shard and 500K on the other. 
But logically you have a single index of 1M docs. So the two shards have to 
know about each other and when you send a request to one of them, it 
automatically queries the other (as well as itself), collects the response and 
combines them, returning the top N to the requester.

This is totally different from replication. In replication (master/slave), each 
node has all 1M documents. Each node can work totally in isolation. An incoming 
request is handled by the slave without contacting any other node.

If you're copying around indexes AND configuring them as though they were 
shards, each request will be distributed to all shards and the results 
collated, giving you the same doc repeatedly in your result set.

If you have no access to the indexing code, you really can't go to a sharded 
setup.

Polling is when the slaves periodically ask the master "has anything changed"? 
If so then the slave pulls down the changes. The polling interval is configured 
in solrconfig.xml _on the slave_. So let's say you index docs to the master. 
For some interval, until the slaves poll the master and get an updated index, 
the number of searchable docs on the master will be different than for the 
slaves. Additionally, you may have the issue of the polling intervals for the 
slaves being offset from one another, so for some brief interval the counts on 
the slaves may be different as well.

Best
Erick

On Tue, Jan 15, 2013 at 10:18 AM, Jean-Sebastien Vachon 
 wrote:
> Ok I see what Erick`s meant now.. Thanks.
>
> The original index I`m working on contains about 120k documents. Since I have 
> no access to the code that pushes documents into the index, I made four 
> copies of the same index.
>
> The master node contains no data at all, it simply use the data available in 
> its four shards. Knowing that I have 1000 documents matching the keyword 
> "java" on each shard I was expecting to receive 4000 documents out of my 
> sharded setup. There are only a few documents that are not accounted for (The 
> result count is about 3996 which is pretty close but not accurate).
>
> Right now, the index is static so there is no need for any replication so the 
> polling interval has no effect.
> Later this week, I will configure the r

Solr multicore aborts with socket timeout exceptions

2013-01-17 Thread eShard

I'm currently running Solr 4.0 final on tomcat v7.0.34 with ManifoldCF v1.2
dev running on Jetty.

I have solr multicore set up with 10 cores. (Is this too much?)
I so I also have at least 10 connectors set up in ManifoldCF (1 per core, 10
JVMs per connection)
>From the look of it; Solr couldn't handle all the data that ManifoldCF was
sending it and the connection would abort socket timeout exceptions.
I tried increasing the maxThreads to 200 on tomcat and it didn't work.
In the ManifoldCF throttling section, I decreased the number of JVMs per
connection from 10 down to 1 and not only did the crawl speed up
significantly, the socket exceptions went away (for the most part)
Here's the ticket for this issue:
https://issues.apache.org/jira/browse/CONNECTORS-608

My question is this: how do I increase the number of connections on the solr
side so I can run multiple ManifoldCF jobs concurrently without aborting or
timeouts?

The ManifoldCF team did mention that there was a committer who had socket
timeout exceptions in a newer version of Solr and he fixed it by increasing
the timeout window. I'm looking for that patch if available.

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-multicore-aborts-with-socket-timeout-exceptions-tp4034250.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr commit taking too long

2013-01-17 Thread Shawn Heisey


On 1/16/2013 11:22 PM, Cool Techi wrote:

We have an index of approximately 400GB in size, indexing 5000 documents was 
taking 20 seconds. But lately, the indexing is taking very long, committing the 
same amount of document is taking 5-20 mins.

On checking the logs I can see that their a frequent merges happening, which I 
am guessing is the reason for this, how can this be improved. My configurations 
are given below,

false
30
64


What version of Solr?  Version 4 will finish merges in the background 
even after indexing and commits are complete, although you do have to 
have a high enough maxMergeCount so that indexing stays in the 
foreground.  I use a maxMergeCount of 6 which seems to work for all 
situations.


Another thing that makes commits take an extremely long time is high 
autowarmCount values on Solr caches, especially filterCache.


Thanks,
Shawn

Re: group.ngroups behavior in response

2013-01-17 Thread Otis Gospodnetic

I'd think adding a new response attribute would be more flexible and
powerful, thinking about clients, UIs, etc.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Thu, Jan 17, 2013 at 10:15 AM, Tomás Fernández Löbbe <
tomasflo...@gmail.com> wrote:

> Bu Amit is right, when you use group.main, the number of groups is not
> displayed, even if you set grop.ngroups.
>
> I think in this case NumFound should display the number of groups instead
> of the number of docs matching. Other option would be to keep "numFound" as
> the number of docs matching and add another attribute to the response that
> shows the number of groups.
>
>
> On Thu, Jan 17, 2013 at 11:51 AM, denl0  >wrote:
>
> > There's a parameter to enable that. :D
> >
> > In solrJ
> >
> > solrQuery.setParam("group.ngroups", true);
> >
> > http://wiki.apache.org/solr/FieldCollapsing
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/group-ngroups-behavior-in-response-tp4033924p4034187.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>

Re: group.ngroups behavior in response

2013-01-17 Thread Tomás Fernández Löbbe

Bu Amit is right, when you use group.main, the number of groups is not
displayed, even if you set grop.ngroups.

I think in this case NumFound should display the number of groups instead
of the number of docs matching. Other option would be to keep "numFound" as
the number of docs matching and add another attribute to the response that
shows the number of groups.

On Thu, Jan 17, 2013 at 11:51 AM, denl0 wrote:

> There's a parameter to enable that. :D
>
> In solrJ
>
> solrQuery.setParam("group.ngroups", true);
>
> http://wiki.apache.org/solr/FieldCollapsing
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/group-ngroups-behavior-in-response-tp4033924p4034187.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Large data importing getting rollback with solr

2013-01-17 Thread Shawn Heisey


ashimbose,

It is possible that this is happening because Solr reaches a point where 
it is doing so many simultaneous merges that ongoing indexing is stopped 
until a huge merge finishes.  This causes the JDBC driver to time out 
and disconnect, and there is no viable generic way to recover from that 
problem.


I used to run into this with large MySQL imports.  If this is what's 
happening, the following change/addition in the mergeScheduler section 
of indexConfig in solrconfig.xml will fix it:


  
1
6
  

If that doesn't fix it, then I would look for a problem with either your 
JDBC driver or your DB server.


Thanks,
Shawn


On 1/17/2013 7:19 AM, Otis Gospodnetic wrote:

Hi,

It looks like this is the cause:
JBC0016E: Remote call failed
(return code=-2,220). SDK9019E: internal errorSDK9019X:

Interestingly, Google gives just 1 hit for the above as query - your post.
But it seems you should look up what the above codes mean first...

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Thu, Jan 17, 2013 at 2:43 AM, ashimbose  wrote:


I am trying to index large data (not rich document) about 5GB, but Its not
getting index. In case of small data it's perfectly indexing.For Large data
import XML response..

Re: Search strategy - improving search quality for short search terms such as "doll"

2013-01-17 Thread Otis Gospodnetic

Hi David,

I think this is where search analytics can help.  If your intuition is
right and people who search for "doll" are not actually searching for "doll
face..." CD, then search analytics will confirm that.  This analytics I'm
talking about involves search and click tracking and analysis.  Once you
have this data you can play with boosting queries, altering queries, etc.
based on this "historical knowledge" about what people who searched for X
tend to do after the search.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Wed, Jan 16, 2013 at 9:51 PM, David Parks  wrote:

> My issue is more that the search term doll shows up in both documents on
> CDs
> as well as documents about toys. But I have 10 CD documents for every toy
> document, so my searches for "doll" tend to show the CDs most prominently.
> But that's not the way a user thinks. If they want the CD documents they'll
> search for "doll face", or "doll face song", more specific queries (which
> work fine), but if they want the toy they might just search for "doll".
>
> If I run the searches "doll" and "doll song" on google image search you'll
> clearly see that google has solved this problem perfectly. "doll" returns
> toy dolls, and "doll song" returns music and anime results.
>
> I'm striving for this type of result.
>
>
>
> -Original Message-
> From: Amit Jha [mailto:shanuu@gmail.com]
> Sent: Wednesday, January 16, 2013 11:41 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Search strategy - improving search quality for short search
> terms such as "doll"
>
> Its all about the data data set, here I mean index. If you have documents
> containing "toy" and "doll" it will return that in result set.
>
> What I understood that you are talking about the context of the query. For
> example if you search "books on MK Gandhi" and "books by MK Gandhi" both
> queries have different context.
>
> Context based search at some level achieved by natural language processing.
> This one you can look at for better search.
>
> Look for solr wiki & mailing list would be great source of learning.
>
>
> Rgds
> AJ
>
> On 16-Jan-2013, at 15:10, "David Parks"  wrote:
>
> > I'm a beginner-intermediate solr admin, I've set up the basics for our
> > application and it runs well.
> >
> >
> >
> > Now it's time for me to dig in and start tuning and improving queries.
> >
> >
> >
> > My next target is searches on simple terms such as "doll" which, in
> > google, would return documents about, well, "toy dolls", because
> > that's the most common usage of the simple term "doll". But in my
> > index it predominantly returns documents about CDs with the song "Doll
> > Face", and "My baby doll" in them.
> >
> >
> >
> > I'm not directly asking how to solve this as much as I'm asking what
> > direction I should be looking in to learn what I need to know to
> > tackle the general issue myself.
> >
> >
> >
> > Left on my own I would start looking at categorizing the CD's into a
> > facet called "music", reasonably doable in my dataset. Then I need to
> > reduce the boost-value of the entire facet/category of music unless
> > certain pre-defined query terms exist, such as [music, cd, song,
> > listen, dvd,  exhaustive list>, etc.].
> >
> >
> >
> > I don't yet know how to do all of this, but after a couple more good
> > books I should be "dangerous".
> >
> >
> >
> > So the question to this list:
> >
> >
> >
> > -  Am I on the right track here?  If not, can you point me in a
> > direction to go?
> >
> >
> >
> >
> >
>
>

Re: Missing documents with ConcurrentUpdateSolrServer (vs. HttpSolrServer) ?

2013-01-17 Thread Shawn Heisey


On 1/17/2013 3:32 AM, Uwe Reh wrote:

one entry in my long list of self made problems is:
"Done the commit before the ConcurrentUpdateSolrServer was finished."

Since the ConcurrentUpdateSolrServer is asynchronous, it's very easy to
create a race conditions. Make sure that your program is waiting ()
before it's doing the commit.

if (solrserver instanceof ConcurrentUpdateSolrServer) {
   ((ConcurrentUpdateSolrServer) solrserver).blockUntilFinished();
}


If you are using the same ConcurrentUpdateSolrServer object for all 
update interaction with Solr (including commits) and you still have to 
do the blockUntilFinished() in your own code before you issue an 
explicit commit, that sounds like a bug, and you should put all the 
details in a Jira issue.


The following code is part of the request method in CUSS:

// this happens for commit...
if (req.getDocuments() == null || req.getDocuments().isEmpty()) {
  blockUntilFinished();
  return server.request(request);
}

This means that if you use the same CUSS object for update interaction 
with Solr (including commits), the object will do the waiting for you 
when you make an explicit commit() call.  If you issue a commit with a 
different object (either another instance of CUSS or HttpSolrServer), 
then this won't work and you'd have to handle it yourself.


For error handling, I filed SOLR-3284 and provided a patch.  It hasn't 
been committed, I think mostly because it doesn't give any specific 
information about what failed.  I have an idea for how to improve the 
patch to address committer concerns, but until I have some time to 
actually look at it, I won't know if it's viable.  When I have a moment, 
I'll update the issue with details about my idea.


Thanks,
Shawn

Re: group.ngroups behavior in response

2013-01-17 Thread denl0

There's a parameter to enable that. :D

In solrJ

solrQuery.setParam("group.ngroups", true);

http://wiki.apache.org/solr/FieldCollapsing



--
View this message in context: 
http://lucene.472066.n3.nabble.com/group-ngroups-behavior-in-response-tp4033924p4034187.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: URL encoding problems

2013-01-17 Thread Jack Park

Similar thoughts: I used unit tests to explore that issue with SolrJ,
originally encoding with ClientUtils; The returned results had "|"
many places in the text, with no clear way to un-encode. I eventually
ran some tests with no encoding at all, including strings like
"hello & goodbye"; such strings were served and fetched
without errors. In queries at the admin console, they show up in the
JSON results correctly.  What's left? I share the confusion about what
is really going on.

Jack

On Thu, Jan 17, 2013 at 2:44 AM, Bruno Dusausoy  wrote:
> Hi,
>
> I have some problems related to URL encoding.
> I'm using Solr 3.6.1 on a Windows (32 bit) system.
> Apache Tomcat is version 6.0.36.
> I'm accessing Solr through solrj-3.3.0.
>
> When using the Solr admin and specifying my request, the URL looks like this
> (${SOLR} is there for the sake of brevity) :
> ${SOLR}/select?q=rapporteur_name%3A%28John+%2BSmith+%2B%5C%28FOO%5C%29%29
>
> But when my app launching the query, the URL looks like this :
> ${SOLR}/select?q=rapporteur_name%3A%28John%5C+Smith%5C+%5C%28FOO%5C%29%29
>
> My "decoded" query, as entered in the admin interface, is :
> rapporteur_name:(John +Smith +\(FOO\))
>
> Both request return results, but only the one returns the correct ones.
>
> The code that escapes the query is :
>
> SolrQuery query = new SolrQuery();
> query.setQuery("rapporteur_name:(" + ClientUtils.escapeQueryChars("John
> Smith (FOO)") + ")");
>
> I don't know if it's the right way to encode the query.
>
> Any ideas or directions ?
>
> Regards.
> --
> Bruno Dusausoy
> Software Engineer
> YP5 Software
> --
> Pensez environnement : limitez l'impression de ce mail.
> Please don't print this e-mail unless you really need to.

Re: how to get abortOnConfigurationError=false working

2013-01-17 Thread snake

here is what it says in the SOLR info page

Solr Specification Version: 1.4.0.2009.11.18.10.19.05
 Solr Implementation Version: 1.4.1-dev exported - kvinu - 2009-11-18
10:19:05
 Lucene Specification Version: 2.9.1
 Lucene Implementation Version: 2.9.1 832363 - 2009-11-03 04:37:25



On Thu, Jan 17, 2013 at 1:33 PM, Alexandre Rafalovitch [via Lucene] <
ml-node+s472066n4034156...@n3.nabble.com> wrote:

> Which version of Solr is it for?
>
> I had a situation on Solr4, where I basically did not have a directory
> that
> solr.xml was pointing at for one of the cores. And Solr continued working
> but the Admin interface was showing big red banners about configuration
> problem.
>
> So, maybe it was a bug that was fixed for Solr 4?
>
> Regards,
>Alex.
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Thu, Jan 17, 2013 at 8:03 AM, snake <[hidden 
> email]>
> wrote:
>
> > I will explain the scenario just to avoid all the potential replies
> asking
> > why.
> >
> > We run coldFusion servers (windows) which has SOLR built in (running on
> > Jetty).
> > A customer creates a collection which is stored within their own
> webspace,
> > they only have read/write access to their own webspace so cannot put
> them
> > anywhere else.
> >
> > the default value for abortOnConfigurationError is true.
> > This causes endless problems when customers make changes to their
> websites
> > or cancel their hosting, the collection gets deleted, and SOLR then
> crashes
> > because it cannot find the config files for that collection.
> > We then have to find out which collection is causing the problem, and
> > manually remove its entry from solr.xml
> >
> > Obviously this is a PITA.
> >
> > In the error output it says.
> >
> > If you want solr to continue after configuration errors, change:
> > false
> > in solr.xml
> >
> > I have tried this, but it has no effect.
> > I have also tried putting it in all the solrconfig.xml files
> > I tried this
> >
> >
> ${solr.abortOnConfigurationError:false}
>
> > and this
> > false
> >
> > neither had any effect.
> >
> > How do you get this to work ?
> >
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149p4034156.html
>  To unsubscribe from how to get abortOnConfigurationError=false working, click
> here
> .
> NAML
>



-- 

--

Russ Michaels

www.bluethunderinternet.com  : Business hosting services & solutions
www.cfmldeveloper.com: ColdFusion developer community
www.michaels.me.uk   : my blog
www.cfsearch.com : ColdFusion search engine
**
*skype me* : russmichaels




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149p4034178.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr commit taking too long

2013-01-17 Thread Otis Gospodnetic

Hi,

That's a juicy index.  Is this on a single server?  Have you considered
sharding it and thus spreading the indexing work over multiple servers,
disks, etc.?
You could increase ramBufferSizeMB, which will help a bit with indexing
speed, but not with actual merging.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/

On Thu, Jan 17, 2013 at 1:22 AM, Cool Techi  wrote:

> Hi,
>
> We have an index of approximately 400GB in size, indexing 5000 documents
> was taking 20 seconds. But lately, the indexing is taking very long,
> committing the same amount of document is taking 5-20 mins.
>
> On checking the logs I can see that their a frequent merges happening,
> which I am guessing is the reason for this, how can this be improved. My
> configurations are given below,
>
> false
> 30
> 64
>
> regards,
> Ayush
>

Re: Large data importing getting rollback with solr

2013-01-17 Thread Otis Gospodnetic

Hi,

It looks like this is the cause:
JBC0016E: Remote call failed
(return code=-2,220). SDK9019E: internal errorSDK9019X:

Interestingly, Google gives just 1 hit for the above as query - your post.
But it seems you should look up what the above codes mean first...

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Thu, Jan 17, 2013 at 2:43 AM, ashimbose  wrote:

> I am trying to index large data (not rich document) about 5GB, but Its not
> getting index. In case of small data it's perfectly indexing.For Large data
> import XML response..  00  data-config.xml
> full-import  busy  A command is still running...  0:9:12.738169
> 18107902013-01-17 12:50:13Indexing failed. Rolled back all
> changes.2013-01-17 12:50:30This response format is experimental.
>  It
> is likely to change in the future.BUT for small data index XML response
> perfectly OK as below...  00  data-config.xml
> full-import  busy  A command is still running...  0:0:12.43611
> 3820902013-01-17 12:56:57Indexing completed. Added/Updated:
> 38209 documents. Deleted 0 documents.This response format is
> experimental.  It is likely to change in the future.For Large data error
> log
> response is as below...Its getting RollbackINFO: Time taken for
> getConnection(): 1343Jan 17, 2013 12:36:21 PM
> org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
> connection for entity PS_JOBCODE_HAZ_BRA with URL:
> jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
> 2013 12:36:23 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
> callINFO: Time taken for getConnection(): 1341Jan 17, 2013 12:36:23 PM
> org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
> connection for entity PS_JOBCODE_HAZ_TBL with URL:
> jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
> 2013 12:36:24 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
> callINFO: Time taken for getConnection(): 1357Jan 17, 2013 12:36:24 PM
> org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
> connection for entity PS_JOBCODE_LANG with URL:
> jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
> 2013 12:36:26 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
> callINFO: Time taken for getConnection(): 1392Jan 17, 2013 12:36:26 PM
> org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
> connection for entity PS_JOBCODE_TBL with URL:
> jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
> 2013 12:36:27 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
> callINFO: Time taken for getConnection(): 1535Jan 17, 2013 12:36:41 PM
> org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
> connection for entity PS_JOBCODE_TBL_ARG with URL:
> jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
> 2013 12:36:43 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
> callINFO: Time taken for getConnection(): 1467Jan 17, 2013 12:36:43 PM
> org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
> connection for entity PS_JOBCODE_TBL_BRA with URL:
> jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
> 2013 12:36:44 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
> callINFO: Time taken for getConnection(): 1373Jan 17, 2013 12:36:44 PM
> org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
> connection for entity PS_JOBCOMP_TMP_MC with URL:
> jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
> 2013 12:36:45 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
> callINFO: Time taken for getConnection(): 1404Jan 17, 2013 12:36:45 PM
> org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
> connection for entity PS_JOBFUNCTION_LNG with URL:
> jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
> 2013 12:36:47 PM org.apache.solr.core.SolrCore executeINFO: [core1]
> webapp=/solr path=/dataimport params={command=full-import} status=0
> QTime=0Jan 17, 2013 12:36:47 PM
> org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Time taken
> for
> getConnection(): 1357Jan 17, 2013 12:36:47 PM
> org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
> connection for entity PS_JOBFUNCTION_TBL with URL:
> jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
> 2013 12:36:48 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
> callINFO: Time taken for getConnection(): 1310Jan 17, 2013 12:36:48 PM
> org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Creating a
> connection for entity PS_JOB_APPROVALS with URL:
> jdbc:attconnect://192.168.1.29:2551/NAVIGATOR;DefTdpName=sampleDBJan 17,
> 2013 12:36:50 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
> callINFO: Time taken for getConnection(): 1342Jan 17, 2013 12:

Field Collapsing - Anything in the works for multi-valued fields?

2013-01-17 Thread David Parks

I want to configure Field Collapsing, but my target field is multi-valued
(e.g. the field I want to group on has a variable # of entries per document,
1-N entries).

I read on the wiki (http://wiki.apache.org/solr/FieldCollapsing) that
grouping doesn't support multi-valued fields yet.

Anything in the works on that front by chance?  Any common work-arounds?

Re: SOlr 3.5 and sharding

2013-01-17 Thread Erick Erickson

You're still confusing shards (or at least mixing up the terminology)
with simple replication. Shards are when you split up the index into
several sub indexes and configure the sub-indexes to "know about each
other". Say you have 1M docs in 2 shards. 500K of them would go on one
shard and 500K on the other. But logically you have a single index of
1M docs. So the two shards have to know about each other and when you
send a request to one of them, it automatically queries the other (as
well as itself), collects the response and combines them, returning
the top N to the requester.

This is totally different from replication. In replication
(master/slave), each node has all 1M documents. Each node can work
totally in isolation. An incoming request is handled by the slave
without contacting any other node.

If you're copying around indexes AND configuring them as though they
were shards, each request will be distributed to all shards and the
results collated, giving you the same doc repeatedly in your result
set.

If you have no access to the indexing code, you really can't go to a
sharded setup.

Polling is when the slaves periodically ask the master "has anything
changed"? If so then the slave pulls down the changes. The polling
interval is configured in solrconfig.xml _on the slave_. So let's say
you index docs to the master. For some interval, until the slaves poll
the master and get an updated index, the number of searchable docs on
the master will be different than for the slaves. Additionally, you
may have the issue of the polling intervals for the slaves being
offset from one another, so for some brief interval the counts on the
slaves may be different as well.

Best
Erick

On Tue, Jan 15, 2013 at 10:18 AM, Jean-Sebastien Vachon
 wrote:
> Ok I see what Erick`s meant now.. Thanks.
>
> The original index I`m working on contains about 120k documents. Since I have 
> no access to the code that pushes documents into the index, I made four 
> copies of the same index.
>
> The master node contains no data at all, it simply use the data available in 
> its four shards. Knowing that I have 1000 documents matching the keyword 
> "java" on each shard I was expecting to receive 4000 documents out of my 
> sharded setup. There are only a few documents that are not accounted for (The 
> result count is about 3996 which is pretty close but not accurate).
>
> Right now, the index is static so there is no need for any replication so the 
> polling interval has no effect.
> Later this week, I will configure the replication and have the indexation 
> modified to  distribute the documents to each shard using a simple ID modulo 
> 4 rule.
>
> Were my expectations wrong about the number  of documents?
>
> -Original Message-
> From: Upayavira [mailto:u...@odoko.co.uk]
> Sent: January-15-13 9:21 AM
> To: solr-user@lucene.apache.org
> Subject: Re: SOlr 3.5 and sharding
>
> He was referring to master/slave setup, where a slave will poll the master 
> periodically asking for index updates. That frequency is configured in 
> solrconfig.xml on the slave.
>
> So, you are saying that you have, say 1m documents in your master index.
> You then copy your index to four other boxes. At that point you have 1m 
> documents on each of those four. Eventually, you'll delete some docs, so'd 
> you have 250k on each. You're wondering, before the deletes, you're not 
> seeing 1m docs on each of your instances.
>
> Or are you wondering why you're not seeing 1m docs when you do a distributed 
> query across all for of these boxes?
>
> Is that correct?
>
> Upayavira
>
> On Tue, Jan 15, 2013, at 02:11 PM, Jean-Sebastien Vachon wrote:
>> Hi Erick,
>>
>> Thanks for your comments but I am migrating an existing index (single
>> instance) to a sharded setup and currently I have no access to the
>> code involved in the indexation process. That`s why I made a simple
>> copy of the index on each shards.
>>
>> In the end, the data will be distributed among all shards.
>>
>> I was just curious to know why I had not the expected number of
>> documents with my four shards.
>>
>> Can you elaborate on  this "polling interval" thing? I am pretty sure
>> I never eared about this...
>>
>> Regards
>>
>> -Original Message-
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: January-15-13 8:00 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: SOlr 3.5 and sharding
>>
>> You're confusing shards and slaves here. Shards are splitting a
>> logical index amongst N machines, where each machine contains a
>> portion of the index. In that setup, you have to configure the slaves
>> to know about the other shards, and the incoming query has to be
>> distributed amongst all the shards to find all the docs.
>>
>> In your case, since you're really replicating (rather than sharding),
>> you only have to query _one_ slave, the query doesn't need to be distributed.
>>
>> So pull all the sharding stuff out of your config files, put a load
>> balancer

Re: Suggestion that preserve original phrase case

2013-01-17 Thread Erick Erickson

You could write a custom Filter (or perhaps Tokenizer), but I usually
just do it on the input side before things get sent to Solr.

I don't think PatternReplaceCharFilterFactory will help, you could
easily turn the input into original:original, but then you'd need to
write a custom filter that normalized the left-hand-side but not the
right-hand-side

Best
Erick

On Tue, Jan 15, 2013 at 11:27 AM, Selvam  wrote:
> Thanks Erick, can you tell me how to do the appending
> (lowercaseversion:LowerCaseVersion) before indexing. I tried pattern
> factory filters, but I could not get it right.
>
>
> On Sun, Jan 13, 2013 at 8:49 PM, Erick Erickson 
> wrote:
>
>> One way I've seen this done is to index pairs like
>> lowercaseversion:LowerCaseVersion. You can't push this whole thing through
>> your field as defined since it'll all be lowercased, you have to produce
>> the left hand side of the above yourself and just use KeywordTokenizer
>> without LowercaseFilter.
>>
>> Then, your application displays the right-hand-side of the returned token.
>>
>> Simple solution, not very elegant, but sometimes the easiest...
>>
>> Best
>> Erick
>>
>>
>> On Fri, Jan 11, 2013 at 1:30 AM, Selvam  wrote:
>>
>> > Hi*,
>> >
>> > *
>> > I have been trying to figure out a way for case insensitive suggestion
>> but
>> > which should return original phrase as result.* *I am using* *solr 3.5*
>> >
>> > *
>> > *For eg:
>> >
>> > *
>> > If I index 'Hello world' and search  for 'hello' it needs to return
>> *'Hello
>> > world'* not *'hello world'. *My configurations are as follows,*
>> > *
>> > *
>> > New field type:*
>> > 
>> >   
>> >
>> > 
>> > 
>> >
>> > *Field values*:
>> >> > termVectors="true" omitNorms="true"/>
>> >> > stored="true" multiValued="false"/>
>> >
>> >
>> > *Spellcheck Component*:
>> >   
>> > text_auto
>> > 
>> >  suggest
>> >  > name="classname">org.apache.solr.spelling.suggest.Suggester
>> >  > > name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup
>> > true
>> > true
>> > label_autocomplete
>> >   
>> > 
>> >
>> >
>> > Kindly share your suggestions to implement this behavior.
>> >
>> > --
>> > Regards,
>> > Selvam
>> > KnackForge 
>> > Acquia Service Partner
>> > No. 1, 12th Line, K.K. Road, Venkatapuram,
>> > Ambattur, Chennai,
>> > Tamil Nadu, India.
>> > PIN - 600 053.
>> >
>>
>
>
>
> --
> Regards,
> Selvam
> KnackForge 
> Acquia Service Partner
> No. 1, 12th Line, K.K. Road, Venkatapuram,
> Ambattur, Chennai,
> Tamil Nadu, India.
> PIN - 600 053.

how to get abortOnConfigurationError=false working

2013-01-17 Thread snake

I will explain the scenario just to avoid all the potential replies asking
why.

We run coldFusion servers (windows) which has SOLR built in (running on
Jetty).
A customer creates a collection which is stored within their own webspace,
they only have read/write access to their own webspace so cannot put them
anywhere else.

the default value for abortOnConfigurationError is true.
This causes endless problems when customers make changes to their websites
or cancel their hosting, the collection gets deleted, and SOLR then crashes
because it cannot find the config files for that collection.
We then have to find out which collection is causing the problem, and
manually remove its entry from solr.xml

Obviously this is a PITA.

In the error output it says.

If you want solr to continue after configuration errors, change:
false
in solr.xml

I have tried this, but it has no effect.
I have also tried putting it in all the solrconfig.xml files
I tried this
${solr.abortOnConfigurationError:false}
and this
false

neither had any effect.

How do you get this to work ?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Response time in client was much longer than QTime in tomcat

2013-01-17 Thread Mikhail Khludnev

Hello,

QTime counts only searching and filtering, but not writing response, which
includes retrieving the stored fields (&fl=...). So, it's quite reasonable.


On Thu, Jan 17, 2013 at 7:09 AM, 张浓飞  wrote:

>  I have a solr website with about 500 docs ( 30 fileds defined in schema
> ), and a c# client on the same machine which would sent http get request to
> that solr website.
>
> These logs were recorded by my c# client:
>
> ** **
>
> 01-16 23:54:49,301 [107] INFO LogHelper - requst time too long: 1054, solr
> time: 1003
>
> 01-16 23:54:49,847 [63] INFO LogHelper - requst time too long: 1068, solr
> time: 1021
>
> 01-16 23:57:17,813 [108] INFO LogHelper - requst time too long: 1051, solr
> time: 1027
>
> 01-16 23:57:18,313 [111] INFO LogHelper - requst time too long: 1031, solr
> time: 1007
>
> and so on…
>
> ** **
>
> You can see , the query time from solr were so long and every similar
> (between 1000ms to 1050ms). On the same time, the corresponding logs in
> tomcat:
>
> ** **
>
> 2013-1-16 23:54:49 org.apache.solr.core.SolrCore execute
>
> Info: [suit1] webapp=/vanclsearchV2 path=/select/
> params={fl=id,typeid,createtime,vprice,sprice,price,totalassesscount,totalsalescount,productcode,productname,stylecode,tag,vpricesku,spricesku,pricesku,userrate,assesscount,lstphotos,mainphotos,salesflag,isduanma,detailsalescount,productplusstyleinfo&sort=createtime+desc&start=0&q=*:*&wt=json&fq=ancestorsid:(28976+OR+28978)&fq=typeid:(1)&rows=30}
> hits=43 status=0 QTime=0 
>
> 2013-1-16 23:54:49 org.apache.solr.core.SolrCore execute
>
> Info: [suit1] webapp=/vanclsearchV2 path=/select/
> params={fl=id,typeid,createtime,vprice,sprice,price,totalassesscount,totalsalescount,productcode,productname,stylecode,tag,vpricesku,spricesku,pricesku,userrate,assesscount,lstphotos,mainphotos,salesflag,isduanma,detailsalescount,productplusstyleinfo&sort=createtime+desc&start=0&q=*:*&wt=json&fq=ancestorsid:(28976+OR+28978)&fq=typeid:(1)&rows=30}
> hits=43 status=0 QTime=0
>
> 2013-1-16 23:57:17 org.apache.solr.core.SolrCore execute
>
> Info: [suit1] webapp=/vanclsearchV2 path=/select/
> params={fl=id,typeid,createtime,vprice,sprice,price,totalassesscount,totalsalescount,productcode,productname,stylecode,tag,vpricesku,spricesku,pricesku,userrate,assesscount,lstphotos,mainphotos,salesflag,isduanma,detailsalescount,productplusstyleinfo&sort=createtime+desc&start=0&q=*:*&wt=json&fq=ancestorsid:(27547+OR+27614)&rows=30}
> hits=9 status=0 QTime=0 
>
> 2013-1-16 23:57:18 org.apache.solr.core.SolrCore execute
>
> Info: [suit1] webapp=/vanclsearchV2 path=/select/
> params={fl=id,typeid,createtime,vprice,sprice,price,totalassesscount,totalsalescount,productcode,productname,stylecode,tag,vpricesku,spricesku,pricesku,userrate,assesscount,lstphotos,mainphotos,salesflag,isduanma,detailsalescount,productplusstyleinfo&sort=createtime+desc&start=0&q=*:*&wt=json&fq=ancestorsid:(27547+OR+27614)&rows=30}
> hits=9 status=0 QTime=0
>
> ** **
>
> Every strange, all the QTime were zero! Can anyone explain this
> circumstance, and how to solve the problem?
>
> ** **
>
>
> --
> 
>
> [image: 说明: 说明: 说明: 说明: image001]
>
> Domi.N.Zhang | Dev Center
>
> Email : zhangnong...@vancl.cn
>
> Tel：86-028-65528402
>
> I’m the coming days…
>
> ** **
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

URL encoding problems

2013-01-17 Thread Bruno Dusausoy


Hi,

I have some problems related to URL encoding.
I'm using Solr 3.6.1 on a Windows (32 bit) system.
Apache Tomcat is version 6.0.36.
I'm accessing Solr through solrj-3.3.0.

When using the Solr admin and specifying my request, the URL looks like 
this (${SOLR} is there for the sake of brevity) :

${SOLR}/select?q=rapporteur_name%3A%28John+%2BSmith+%2B%5C%28FOO%5C%29%29

But when my app launching the query, the URL looks like this :
${SOLR}/select?q=rapporteur_name%3A%28John%5C+Smith%5C+%5C%28FOO%5C%29%29

My "decoded" query, as entered in the admin interface, is :
rapporteur_name:(John +Smith +\(FOO\))

Both request return results, but only the one returns the correct ones.

The code that escapes the query is :

SolrQuery query = new SolrQuery();
query.setQuery("rapporteur_name:(" + ClientUtils.escapeQueryChars("John 
Smith (FOO)") + ")");


I don't know if it's the right way to encode the query.

Any ideas or directions ?

Regards.
--
Bruno Dusausoy
Software Engineer
YP5 Software
--
Pensez environnement : limitez l'impression de ce mail.
Please don't print this e-mail unless you really need to.

Re: Missing documents with ConcurrentUpdateSolrServer (vs. HttpSolrServer) ?

2013-01-17 Thread Uwe Reh


Hi Mark,

one entry in my long list of self made problems is:
"Done the commit before the ConcurrentUpdateSolrServer was finished."

Since the ConcurrentUpdateSolrServer is asynchronous, it's very easy to 
create a race conditions. Make sure that your program is waiting () 
before it's doing the commit.

if (solrserver instanceof ConcurrentUpdateSolrServer) {
   ((ConcurrentUpdateSolrServer) solrserver).blockUntilFinished();
}


Uwe

Re: Solr commit taking too long

2013-01-17 Thread Upayavira

Some questions:

What version of Solr?
Has the number of documents in your index changed in the meantime? 
How many before, how many now?
How does maxdocs compare to numdocs? 
Has this system ever been upgraded from an older Solr?
Is it committing that is taking that long, or opening a searcher one the
commit is done?

Maybe answers to these might help unpick your issue.

Upayavira

On Thu, Jan 17, 2013, at 06:22 AM, Cool Techi wrote:
> Hi,
> 
> We have an index of approximately 400GB in size, indexing 5000 documents
> was taking 20 seconds. But lately, the indexing is taking very long,
> committing the same amount of document is taking 5-20 mins. 
> 
> On checking the logs I can see that their a frequent merges happening,
> which I am guessing is the reason for this, how can this be improved. My
> configurations are given below,
> 
> false
> 30
> 64
> 
> regards,
> Ayush
>

85 matches

Mail list logo