Solr ranking query..

2014-02-03 Thread Chris
Hi,

I have a document structure that looks like the below. I would like to
implement something like -

(urlKeywords:"+keyword+" AND domainRank:[3 TO 1] AND adultFlag:N)^60 " +
 "OR (title:"+keyword+" AND domainRank:[3 TO 1] AND adultFlag:N)^20 " +
  "OR (title:"+keyword+" AND domainRank:[10001 TO *] AND adultFlag:N)^2 " +
  "OR (fulltxt:"+keyword+") ");


In case we have multiple words in keywords - "A B C D" then for the
documents that have all the words should rank highest (Group1), then 3
words(Group2), then 2 words(Group 3) etc
AND - Within each group (Group1, 2, 3) I would want the ones with the
lowest domain rank value to rank higher (but within the group)

How can i do this in a single query? and please advice on the fastest way
possible,
(open to implementing fq & other techniques to speed it up)

Please advice.


Document Structure in XML -

 
www
ncoah.com
/links.html
http://www.ncoah.com/links.html
North Carolina Office of Administrative Hearings
- Links

  North Carolina Office of Administrative Hearings - Links

 - http://www.ncoah.com/links.html";  title="Hearings">Hearings
- http://www.ncoah.com/links.html";  title="Rules">Rules -
http://www.ncoah.com/links.html";  title="Civil Rights">Civil
Rights - http://www.ncoah.com/links.html";
title="Welcome">Welcome - http://www.ncoah.com/links.html";  title="General
Information">General Information - http://www.ncoah.com/links.html";  title="Directions to
OAH">Directions to OAH - http://www.ncoah.com/links.html";
 title="Establishment of OAH">Establishment of OAH - http://www.ncoah.com/links.html";  title="G.S. 150B">G.S.
150B - http://www.ncoah.com/links.html";
title="Forms">Forms - http://www.ncoah.com/links.html";
title="Links">Links - http://www.nc.gov/";  title="Visit
the North Carolina State web portal">Visit the North Carolina State
web portal - http://ncinfo.iog.unc.edu/library/counties.html";  title="North
Carolina Counties">North Carolina Counties - http://ncinfo.iog.unc.edu/library/cities.html";  title="North
Carolina Cities & Towns">North Carolina Cities & Towns - http://www.nccourts.org/";  title="Administrative Office of the
Courts">Administrative Office of the Courts - http://www.ncleg.net/";  title="North Carolina General
Assembly">North Carolina General Assembly - http://www.doa.state.nc.us/";  title="Department of
Administration">Department of Administration - http://www.ncagr.com/";  title="Department of
Agriculture">Department of Agriculture - http://www.nccommerce.com";  title="Department of
Commerce">Department of Commerce - http://www.doc.state.nc.us/";  title="Department of
Correction">Department of Correction - http://www.nccrimecontrol.org/";  title="Department of Crime
Control & Public Safety">Department of Crime Control & Public
Safety - http://www.ncdcr.gov/";  title="Department of
Cultural Resources">Department of Cultural Resources - http://www.ncdenr.gov/";  title="Department of Environment and
Natural Resources">Department of Environment and Natural Resources
- http://www.dhhs.state.nc.us";  title="Department of Health
and Human Services">Department of Health and Human Services - http://www.ncdoi.com/";  title="Department of
Insurance">Department of Insurance - http://www.ncdoj.com/";  title="Department of Justice">Department
of Justice - http://www.juvjus.state.nc.us/";
title="Department of Juvenile Justice and Delinquency
Prevention">Department of Juvenile Justice and Delinquency
Prevention - http://www.nclabor.com/";  title="Department
of Labor">Department of Labor - http://www.dpi.state.nc.us/";  title="Department of Public
Instruction">Department of Public Instruction - http://www.dor.state.nc.us/";  title="Department of
Revenue">Department of Revenue - http://www.treasurer.state.nc.us/";  title="Department of State
Treasurer">Department of State Treasurer - http://www.ncdot.org/";  title="Department of
Transportation">Department of Transportation - http://www.secstate.state.nc.us/";  title="Department of the
Secretary of State">Department of the Secretary of State - http://www.osp.state.nc.us/";  title="Office of State
Personnel">Office of State Personnel - http://www.governor.state.nc.us/";  title="Office of the
Governor">Office of the Governor - http://www.ltgov.state.nc.us/";  title="Office of the Lt.
Governor">Office of the Lt. Governor - http://www.ncauditor.net/";  title="Office of the State
Auditor">Office of the State Auditor - http://www.osc.nc.gov/";  title="Office of the State
Controller">Office of the State Controller - http://www.ncbar.org/";  title="North Carolina Bar
Association">North Carolina Bar Association - http://www.ncbar.com/index.asp";  title="North Carolina State
Bar">North Carolina State Bar - http://ncrules.state.nc.us/ncadministrativ_/default.htm";
title="North Carolina Administrative Code">North Carolina
Administrative Code - http://www.ncoah.com/rules/register/";  title="North Carolina
Register">North Carolina Register - http://www.g

Re: Solr and SDL Tridion Integration

2014-02-03 Thread Prasi S
Thanks a lot for the options. Our site has dynamic content as well. I would
look into what best suits.

Thanks,
Prasi


On Mon, Feb 3, 2014 at 10:34 PM, Chris Warner wrote:

> There are many ways to do this, Prasi. You have a lot of thinking to do on
> the subject.
>
> You could decide to publish your content to database, and then index that
> database in Solr.
>
> You could publish XML or CSV files of your content for Solr to read and
> index.
>
> You could use nutch or some other tool to crawl your web server.
>
> There are many more methods, probably. These being some of the more common.
>
> Does your site have dynamic content presentation? If so, you may want to
> consider having Solr examine your broker database.
>
> Static pages on your site? You may want to go with either a crawler or
> publishing a special file for Solr.
>
> Please check out https://tridion.stackexchange.com/ for more on this
> topic.
>
> --
> chris_war...@yahoo.com
>
>
>
> On Monday, February 3, 2014 3:54 AM, Jack Krupansky <
> j...@basetechnology.com> wrote:
> If SDL Tridion can export to CSV format, Solr can then import from CSV
> format.
>
> Otherwise, you may have to write a custom script or even maybe Java code to
> read from SDL Tridion and output a supported Solr format, such as Solr XML,
> Solr JSON, or CSV.
>
> -- Jack Krupansky
>
>
> -Original Message-
> From: Prasi S
> Sent: Monday, February 3, 2014 4:16 AM
> To: solr-user@lucene.apache.org
> Subject: Solr and SDL Tridion Integration
>
> Hi,
> I want to index sdl tridion content to solr. Can you suggest how this can
> be achieved. Is there any document/tutorial for this? Thanks
>
> Thanks,
> Prasi
>


Re: Duplicate Facet.FIelds cause same results, should dedupe?

2014-02-03 Thread William Bell
THis is in 4.6.1.


On Mon, Feb 3, 2014 at 9:11 PM, Otis Gospodnetic  wrote:

> Hi,
>
> Don't know if this is old or new problem, but it does feel like a bug to
> me.
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Mon, Feb 3, 2014 at 10:48 AM, William Bell  wrote:
>
> > If we add :
> >
> > facet.field=prac_spec_heir&facet.field=prac_spec_heir
> >
> > we get it twice in the results. This breaks deserialization on wt=json
> > since you cannot have the same name twice
> >
> > Thoughts? Seems like a new bug in 4.6 ?
> >
> >
> > "facet.field":
> > ["prac_spec_heir","all_proc_name_code","all_cond_name_code","
> > prac_spec_heir","{!ex=exgender}gender","{!ex=expayor}payor_code_name"],
> >
> > --
> > Bill Bell
> > billnb...@gmail.com
> > cell 720-256-8076
> >
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Duplicate Facet.FIelds cause same results, should dedupe?

2014-02-03 Thread Otis Gospodnetic
Hi,

Don't know if this is old or new problem, but it does feel like a bug to me.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Feb 3, 2014 at 10:48 AM, William Bell  wrote:

> If we add :
>
> facet.field=prac_spec_heir&facet.field=prac_spec_heir
>
> we get it twice in the results. This breaks deserialization on wt=json
> since you cannot have the same name twice
>
> Thoughts? Seems like a new bug in 4.6 ?
>
>
> "facet.field":
> ["prac_spec_heir","all_proc_name_code","all_cond_name_code","
> prac_spec_heir","{!ex=exgender}gender","{!ex=expayor}payor_code_name"],
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>


Re: need help in understating solr cloud stats data

2014-02-03 Thread Otis Gospodnetic
Hi,

Oh, I just saw Greg's email on dev@ about this.
IMHO aggregating in the search engine is not the way to do.  Leave that to
external tools, which are likely to be more flexible when it comes to this.
 For example, our SPM for Solr can do all kinds of aggregations and
filtering by a number of Solr and SolrCloud-specific dimensions already,
without Solr having to do any sort of aggregation that it thinks Ops people
will really want.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Feb 3, 2014 at 11:08 AM, Mark Miller  wrote:

> You should contribute that and spread the dev load with others :)
>
> We need something like that at some point, it's just no one has done it.
> We currently expect you to aggregate in the monitoring layer and it's a lot
> to ask IMO.
>
> - Mark
>
> http://about.me/markrmiller
>
> On Feb 3, 2014, at 10:49 AM, Greg Walters 
> wrote:
>
> > I've had some issues monitoring Solr with the per-core mbeans and ended
> up writing a custom "request handler" that gets loaded then registers
> itself as an mbean. When called it polls all the per-core mbeans then adds
> or averages them where appropriate before returning the requested value.
> I'm not sure if there's a better way to get jvm-wide stats via jmx but it
> is *a* way to get it done.
> >
> > Thanks,
> > Greg
> >
> > On Feb 3, 2014, at 1:33 AM, adfel70  wrote:
> >
> >> I'm sending all solr stats data to graphite.
> >> I have some questions:
> >> 1. query_handler/select requestTime -
> >> if i'm looking at some metric, lets say 75thPcRequestTime - I see that
> each
> >> core in a single collection has different values.
> >> Is each value of each core is the time that specific core spent on a
> >> request?
> >> so to get an idea of total request time, I should summarize all the
> values
> >> of all the cores?
> >>
> >>
> >> 2.update_handler/commits - does this include auto_commits? becuaste I'm
> >> pretty sure I'm not doing any manual commits and yet I see a number
> there.
> >>
> >> 3. update_handler/docs pending - what does this mean? pending for what?
> for
> >> flush to disk?
> >>
> >> thanks.
> >>
> >>
> >>
> >> --
> >> View this message in context:
> http://lucene.472066.n3.nabble.com/need-help-in-understating-solr-cloud-stats-data-tp4114992.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>


Re: Adding DocValues in an existing field

2014-02-03 Thread Otis Gospodnetic
Hi,

You can change the field definition and then reindex.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Jan 30, 2014 1:12 PM, "yriveiro"  wrote:

> Hi,
>
> Can I add to an existing field the docvalue feature without wipe the
> actual?
>
> The modification on the schema will be something like this:
>  multiValued="false" />
>  multiValued="false"  docValues="true"/>
>
> I want use the actual data to reindex it again in the same collection but
> in
> the process create the docvalues too, it's possible?
>
> I'm using solr 4.6.1
>
>
>
> -
> Best regards
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Adding-DocValues-in-an-existing-field-tp4114462.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: how to write an efficient query with a subquery to restrict the search space?

2014-02-03 Thread Otis Gospodnetic
Hi,

Sounds like a possible document and query routing use case.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Jan 31, 2014 7:11 AM, "svante karlsson"  wrote:

> It seems to be faster to first restrict the search space and then do the
> scoring compared to just use the full query and let solr handle everything.
>
> For example in my application one of the scoring fields effectivly hits
> 1/12 of the database (a month field) and if we have 100'' items in the
> database the this matters.
>
> /svante
>
>
> 2014-01-30 Jack Krupansky :
>
> > Lucene's default scoring should give you much of what you want - ranking
> > hits of low-frequency terms higher - without any special query syntax -
> > just list out your terms and use "OR" as your default operator.
> >
> > -- Jack Krupansky
> >
> > -Original Message- From: svante karlsson
> > Sent: Thursday, January 23, 2014 6:42 AM
> > To: solr-user@lucene.apache.org
> > Subject: how to write an efficient query with a subquery to restrict the
> > search space?
> >
> >
> > I have a solr db containing 1 billion records that I'm trying to use in a
> > NoSQL fashion.
> >
> > What I want to do is find the best matches using all search terms but
> > restrict the search space to the most unique terms
> >
> > In this example I know that val2 and val4 is rare terms and val1 and val3
> > are more common. In my real scenario I'll have 20 fields that I want to
> > include or exclude in the inner query depending on the uniqueness of the
> > requested value.
> >
> >
> > my first approach was:
> > q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND
> (field2:val2
> > OR field4:val4)&rows=100&fl=*
> >
> > but what I think I get is
> > .  field4:val4 AND (field2:val2 OR field4:val4)   this result is then
> > OR'ed with the rest
> >
> > if I write
> > q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND
> > (field2:val2 OR field4:val4)&rows=100&fl=*
> >
> > then what I think I get is two sub-queries that is evaluated separately
> and
> > then joined - performance wise this is bad.
> >
> > Whats the best way to write these types of queries?
> >
> >
> > Are there any performance issues when running it on several solrcloud
> nodes
> > vs a single instance or should it scale?
> >
> >
> >
> > /svante
> >
>


Re: Special NGRAMish requirement

2014-02-03 Thread Otis Gospodnetic
Hi,

Can you provide an example, Alexander?

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Feb 3, 2014 5:28 AM, "Lochschmied, Alexander" <
alexander.lochschm...@vishay.com> wrote:

> Hi,
>
> we need to use something very similar to EdgeNGram (minGramSize="1"
> maxGramSize="50" side="front").
> The only thing missing is that we would like to reduce the number of
> matches. The request we need to implement is returning only those matches
> with the longest tokens (or terms if that is the right word).
>
> Is there a way to do this in Solr (not necessarily with EdgeNGram)?
>
> Thanks,
> Alexander
>


Re: Adding HTTP Request Header in SolrJ

2014-02-03 Thread Shawn Heisey

On 2/3/2014 3:40 PM, Andrew Doyle wrote:

Our web services are using PKI authentication so we have a user DN, however
we're querying an external Solr which is managed via a proxy which is
expecting our server DN proxying the user DN. My question is, how do we add
an HTTP header to the request being made by SolrJ?

I looked through the source code and I see that we can specify an
HttpClient when we create a new instance of an HttpSolrServer. I can set
the header there, but that seems slightly hackey to me. I'd prefer to use a
servlet filter if possible.

Do you have any other suggestions?


I don't think there's any servlet information (like the filters you 
mentioned) available in SolrJ.  There is in Solr itself, which uses 
SolrJ, but unless you're writing a servlet or custom server side code 
for Solr, you won't have access to any of that.  If you are writing a 
servlet or custom server-side code, then they'll be available -- but not 
from SolrJ.


I could be wrong about what I just said, but just now when I looked 
through the code for HttpSolrServer and SolrServer, I did not see 
anything about servlets or filters.


In my own SolrJ application, I create an HttpClient instance that is 
used across dozens of HttpSolrServer instances. The following is part of 
the constructor code for my custom "Core" class.


/*
 * If this is the first time a Core has been created, create 
the shared
 * httpClient with some increased connection properties. 
Synchronized to

 * ensure thread safety.
 */
synchronized (firstInstance)
{
if (firstInstance)
{
ModifiableSolrParams params = new ModifiableSolrParams();
params.add(HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, "200");
params.add(HttpClientUtil.PROP_MAX_CONNECTIONS, "5000");
httpClient = HttpClientUtil.createClient(params);
firstInstance = false;
}
}

These are the static class members used in the above code:

/**
 * A static boolean value indicating whether this is the first 
instance of

 * this object. Also used for thread synchronization.
 */
private static Boolean firstInstance = true;

/**
 * A static http client to use on all Solr server objects.
 */
private static HttpClient httpClient = null;

Just so you know, the deprecations introduced by the recent upgrade to 
HttpClient 4.3 might complicate things further when it comes to user 
code.  See SOLR-5604.  I have some ideas about how to proceed on that 
issue, but haven'thad a lot of time to look into it, and before I do 
anything, I need to discuss it with people who are smarter than me.


https://issues.apache.org/jira/browse/SOLR-5604

Thanks,
Shawn



Adding HTTP Request Header in SolrJ

2014-02-03 Thread Andrew Doyle
Our web services are using PKI authentication so we have a user DN, however
we're querying an external Solr which is managed via a proxy which is
expecting our server DN proxying the user DN. My question is, how do we add
an HTTP header to the request being made by SolrJ?

I looked through the source code and I see that we can specify an
HttpClient when we create a new instance of an HttpSolrServer. I can set
the header there, but that seems slightly hackey to me. I'd prefer to use a
servlet filter if possible.

Do you have any other suggestions?

Thanks!


*-- Andrew Doyle*
Software Engineer II

 

10620 Guilford Road, Suite 200
Jessup, MD 20794
direct: 410 854 5560
cell: 410 440 8478

*ado...@clearedgeit.com *
* www.ClearEdgeIT.com *


Re: SolrCloud multiple data center support

2014-02-03 Thread Daniel Collins
Option a) doesn't really work out of the box, *if you need NRT support*.
 The main reason (for us at least) is the ZK ensemble and maintaining
quorum. If you have a single ensemble, say 3 ZKs in 1 DC and 2 in another,
then if you lose DC 2, you lose 2 ZKs and the rest are fine.  But if you
lose the main DC that has 3 ZKs, you lose quorum.  Searches will be ok, but
if you are an NRT-setup, your updates will all stall until you get another
ZK started (and reload the whole Solr Cloud to give them the ID of that new
ZK).

For us, availability is more important than consistency, so we currently
have 2 independent setups, 1 ZK ensemble and Solr Cloud per DC.  We already
had an indexing system that serviced DCs so we didn't need something like
Flume.  We also have external systems that handle routing to some extent,
so we can route "locally" to each Cloud, and not have to worry about
cross-DC traffic.

One solution to that is have a 3rd DC with few instances in, say another 2
ZKs. That would take your total ensemble to 7, and you can lose 3 whilst
still maintaining quorum.  Since ZK is relatively light-weight, that 3rd
"Data Centre" doesn't have to be as robust, or contain Solr replicas, its
just a place to house 1 or 2 machines for holding ZKs.  We will probably
migrate to this kind of setup soon as it ticks more of our boxes.

One other option is in ZK trunk (but not yet in a release) is the ability
to dynamically reconfigure ZK ensembles (
https://issues.apache.org/jira/browse/ZOOKEEPER-107).  That would give the
ability to create new ZK instances in the event of a DC failure, and
reconfigure the Solr Cloud without having to reload everything. That would
help to some extent.

If you don't need NRT, then the solution is somewhat easier, as you don't
have to worry as much about ZK quorum, a single ZK ensemble across DCs
might be sufficient for you in that case.


On 3 February 2014 17:44, Mark Miller  wrote:

> SolrCloud has not tackled multi data center yet.
>
> I don't think a or b are very good options yet.
>
> Honestly, I think the best current bet is to use something like Apache
> Flume to send data to both data centers - it will handle retries and
> keeping things in sync and splitting the stream. Doesn't satisfy all use
> cases though.
>
> At some point, multi data center support will happen.
>
> I can't remember where ZooKeeper's support for it is at, but with that and
> some logic to favor nodes in your data center, that might be a viable route.
>
> - Mark
>
> http://about.me/markrmiller
>
> On Feb 3, 2014, at 11:48 AM, Darrell Burgan 
> wrote:
>
> > Hello, we are using Solr in a SolrCloud configuration, with two Solr
> instances running with three Zookeepers in a single data center. We
> presently have a single search index with about 35 million entries in it,
> about 60GB disk space on each of the two Solr servers (120GB total). I
> would expect our usage of Solr to grow to include other search indexes, and
> likely larger data volumes.
> >
> > I'm writing because we're needing to grow beyond a single data center,
> with two (potentially incompatible) goals:
> >
> > 1.   We need to be able to have a hot disaster recovery site, in a
> completely separate data center, that has a near-realtime replica of the
> search index.
> >
> > 2.   We'd like to have the option to have multiple active/active
> data centers that each see and update the same search index, distributed
> across data centers.
> >
> > The options I'm aware of from reading archives:
> >
> > a.   Simply set up the remote Solr instances as active parts of the
> same SolrCloud cluster. This will  essentially involve us standing up
> multiple Zookeepers in the second data center, and multiple Solr instances,
> and they will all keep each other in sync magically. This will also solve
> both of our goals. However, I'm concerned about performance and whether
> SolrCloud is smart enough to route local search queries only to local Solr
> servers ... ? Also, how does such a cluster tolerate and recover from network
> partitions?
> >
> > b.  The remote Solr instances form their own completely unrelated
> SolrCloud cluster. I have to invent some kind of replication logic of my
> own to sync data between them. This replication would have to be
> bidirectional to satisfy both of our goals. I strongly dislike this option
> since the application really should not concern itself with data
> distribution. But I'll do it if I must.
> >
> > So my questions are:
> >
> > -  Can anyone give me any guidance as to option a? Anyone using
> this in a real production setting? Words of wisdom? Does it work?
> >
> > -  Are there any other options that I'm not considering?
> >
> > -  What is Solr's answer to such configurations (we can't be
> alone in needing one)? Any big enhancements coming on the Solr road map to
> deal with this?
> >
> > Thanks!
> > Darrell Burgan
> >
> >
> >
> > Darrell Burgan | Chief Architect, PeopleAnswers

Re: Solr and Polygon/Radius based spatial searches

2014-02-03 Thread Smiley, David W.
Hi Lee,

On 2/3/14, 1:59 PM, "leevduhl"  wrote:

>We have a public property search site that we are looking to replace the
>back
>end index server on and we are looking at Solr as a possible replacement
>(ElasticSearch is another possibility).

Both should work equally well.

>
>One of the key search components of out site is to search on a bounding
>box
>(rectangle), custom multi-point polygon, and/or a radius from a point.
>
>It appears that Solr3 and Solr4 both supported spatial searching, but
>using
>different methods.  Also, per this link,
>http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4, it appears that
>Solr only supports point, rectangle and circle shapes and needs JTS and/or
>WKT to support multi-point non non rectangular polygon shapes.

Yup.  I¹m not sure what you mean by a "multi-point² polygon thoughŠ is
that somehow different than a polygon that isn¹t multi-point?  All
polygons are comprised of at least 3 distinct points (a triangle).

>
>Our indexed data will included the long/lat values for all property
>records.
>
>If someone can provide sample queries for the following situations, it
>would
>be appreciated:
>- All properties/points that fall within a multi-point polygon (ie:
>Polygon
>points: Lo1 La1, Lo2 La2, Lo3 La3, Lo4 La4, Lo5 La5, Lo1, La1)

mygeorptfieldname:²Intersects(POLYGON((x1 y1, x2 y2, x3 y3, Š, x1 y1)))²

Inside of the immediate parenthesis of Intersects is a standard WKT
formatted polygon.  Note ³x y² order (longitude space latitude).

>
>- All properties that fall within 1.5 miles (radius) of point: Lo1 La1

Just use Solr¹s standard ³geofilt² query parser:
fq={!geofilt}&pt=lat,lon&d=0.021710

I got the distance value by converting miles to kilometers which is what
goofily expects (1.5 * 1.60934400061469).

>
>Other spatial search type functionality that may be targeted included:
>- Ability to search within multiple polygons (both intersecting, non
>intersecting and combinations

No problem for union: Use standard WKT: MULTIPOLYGON or
GEOMETRYCOLLECTION.  If you want to combine them in interesting ways then
you¹re going to have to compute that client-side and send the resulting
polygon(s) to Solr (or ElasticSearch).  You could use JTS to do that,
which has a trove of spatial functionality for such things.  I¹m thinking
of some day adding some basic operator extensions to the WKT so you don¹t
have to do this on the client end.  Leveraging JTS server-side it would be
particularly easy, but it would also be pretty easy do it as a custom
shape aggregate, similar to Spatial4j 0.4¹s ShapeCollection.

>- Ability to search for properties that fall outside of a polygon

You could use ³IsDisjointTo" (instead of ³Intersects²) but you¹ll
generally get faster results by negating intersects.  For an example,
simply precede the first polygonal example with a ³NOT ³.

>
>Thanks 
>Lee

~ David



Re: Not finding part of fulltext field when word ends in dot

2014-02-03 Thread Thomas Michael Engelke
That was a complicated answer, but ultimately the right one. Thank you very
much.


2014-01-30 Jack Krupansky :

> The word delimiter filter will turn 26KA into two tokens, as if you had
> written "26 KA" without the quotes. The autoGeneratePhraseQueries option
> will cause the multiple terms to be treated as if they actually were
> enclosed within quotes, otherwise they will be treated as separate and
> unquoted terms. If you do enclose "26KA" in quotes in your query then
> autoGeneratePhraseQueries is not relevant.
>
> Ah... maybe the problem is that you have preserveOriginal="true" in your
> query analyzer. Do you have your default query operator set to "AND"? If
> so, it would treat "26KA" as "26" AND "KA" AND "26KA", which requires that
> "26KA" (without the trailing dot) to be in the index.
>
> It seems counter-intuitive, but the attributes of the index and query word
> delimiter filters need to be slightly asymmetric.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Thomas Michael Engelke
> Sent: Thursday, January 30, 2014 2:16 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Not finding part of fulltext field when word ends in dot
>
> I'm not sure I got my problem across. If I understand the snippet of
> documentation right, autoGeneratePhraseQueries only affects queries that
> result in multiple tokens, which mine does not. The version also is
> 3.6.0.1, and we're not planning on upgrading to any 4.x version.
>
>
> 2014-01-29 Jack Krupansky 
>
>  You might want to add autoGeneratePhraseQueries="true" to your field
>> type, but I don't think that would cause a break when going from 3.6 to
>> 4.x. The default for that attribute changed in Solr 3.5. What release was
>> your data indexed using? There may have been some subtle word delimiter
>> filter changes between 3.x and 4.x.
>>
>> Read:
>> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201202.mbox/%
>> 3CC0551C512C863540BC59694A118452AA0764A434@ITS-EMBX-03.
>> adsroot.itcs.umich.edu%3E
>>
>>
>>
>> -Original Message- From: Thomas Michael Engelke
>> Sent: Wednesday, January 29, 2014 11:16 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Not finding part of fulltext field when word ends in dot
>>
>>
>> The fieldType definition is a tad on the longer side:
>>
>>> positionIncrementGap="100">
>>
>>> class="solr.WhitespaceTokenizerFactory"/>
>>
>>> class="solr.WordDelimiterFilterFactory"
>>catenateWords="1"
>>catenateNumbers="1"
>>generateNumberParts="1"
>>splitOnCaseChange="1"
>>generateWordParts="1"
>>catenateAll="0"
>>preserveOriginal="1"
>>splitOnNumerics="0"
>>/>
>>
>>> class="solr.LowerCaseFilterFactory"/>
>>> synonyms="german/synonyms.txt" ignoreCase="true" expand="true"/>
>>> class="solr.DictionaryCompoundWordTokenFilterFactory"
>>
>> dictionary="german/german-common-nouns.txt"
>>minWordSize="5"
>>minSubwordSize="4"
>>maxSubwordSize="15"
>>onlyLongestMatch="true"
>>/>
>>
>>> words="german/stopwords.txt" ignoreCase="true"
>> enablePositionIncrements="true"/>
>>> class="solr.SnowballPorterFilterFactory" language="German2"
>> protected="german/protwords.txt"/>
>>> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>
>>
>>> class="solr.WhitespaceTokenizerFactory"/>
>>
>>> class="solr.WordDelimiterFilterFactory"
>>catenateWords="0"
>>catenateNumbers="0"
>>generateWordParts="1"
>>splitOnCaseChange="1"
>>generateNumberParts="1"
>>catenateAll="0"
>>preserveOriginal="1"
>>splitOnNumerics="0"
>>/>
>>> class="solr.LowerCaseFilterFactory"/>
>>> words="german/stopwords.txt" ignoreCase="true"
>> enablePositionIncrements="true"/>
>>> class="solr.Sn

Re: need help in understating solr cloud stats data

2014-02-03 Thread Erick Erickson
See:

http://wiki.apache.org/solr/HowToContribute

It outlines how to get the code, how to work with patches, how to set
up IntelliJ and Eclipse IDEs (links near the bottom?). There are
formatting files for both IntelliJ and Eclipse that'll do the right
thing in terms of indents and such.

Legal issues aside, you don't to be very compulsive about cleaning up
the code before posting the first patch! Just let people know you
don't consider it ready to commit. You'll want to open a JIRA to
attach it to. People often put in //nocommit in places they especially
don't like, and the "precommit" ant target takes care of keeping these
from getting into the code.

People are quite happy to see hack, first-cut patches. You'll often
get suggestions on approaches that may be easier and nobody will
complain about "bad code" when they know that _you_ don't consider it
submittable. Google for "Yonik's law of half-baked patches".

One thing that escapes people often... When attaching a patch to a
JIRA, just call it SOLR-.patch, where  is the JIRA number.
Successive versions of the patch should have the _same_ name, they'll
all be listed and the newest one will be "live". It's easier to know
what is the right patch that way. No big deal either way.

Best,
Erick

On Mon, Feb 3, 2014 at 8:25 AM, Greg Walters  wrote:
> The code I wrote is currently a bit of an ugly hack so I'm a bit reluctant to 
> share it and there's some legal concerns with open-sourcing code within my 
> company. That being said, I wouldn't mind rewriting it on my own time. Where 
> can I find a starter kit for contributors with coding guidelines and the 
> like? Spruced up some I'd be OK with submitting a patch.
>
> Thanks,
> Greg
>
> On Feb 3, 2014, at 10:08 AM, Mark Miller  wrote:
>
>> You should contribute that and spread the dev load with others :)
>>
>> We need something like that at some point, it's just no one has done it. We 
>> currently expect you to aggregate in the monitoring layer and it's a lot to 
>> ask IMO.
>>
>> - Mark
>>
>> http://about.me/markrmiller
>>
>> On Feb 3, 2014, at 10:49 AM, Greg Walters  wrote:
>>
>>> I've had some issues monitoring Solr with the per-core mbeans and ended up 
>>> writing a custom "request handler" that gets loaded then registers itself 
>>> as an mbean. When called it polls all the per-core mbeans then adds or 
>>> averages them where appropriate before returning the requested value. I'm 
>>> not sure if there's a better way to get jvm-wide stats via jmx but it is 
>>> *a* way to get it done.
>>>
>>> Thanks,
>>> Greg
>>>
>>> On Feb 3, 2014, at 1:33 AM, adfel70  wrote:
>>>
 I'm sending all solr stats data to graphite.
 I have some questions:
 1. query_handler/select requestTime -
 if i'm looking at some metric, lets say 75thPcRequestTime - I see that each
 core in a single collection has different values.
 Is each value of each core is the time that specific core spent on a
 request?
 so to get an idea of total request time, I should summarize all the values
 of all the cores?


 2.update_handler/commits - does this include auto_commits? becuaste I'm
 pretty sure I'm not doing any manual commits and yet I see a number there.

 3. update_handler/docs pending - what does this mean? pending for what? for
 flush to disk?

 thanks.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/need-help-in-understating-solr-cloud-stats-data-tp4114992.html
 Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>
>


Re: SolrCloud query results order master vs replica

2014-02-03 Thread Erick Erickson
This should only be happening if the scores are _exactly_ the same,
which is actually
quite rare. In that case, the tied scores are broken by the internal
Lucene document
ID, and the relative order of the docs on the two machines isn't
guaranteed to be the
same, the internal ID can change during segment merging, which is NOT the same
on both machines.

But this should be relatively rare. If you're doing *:* queries or
other such, then they
aren't scored (see ConstantScoreQuery). So in practical terms, I suspect you're
seeing some kind of test artifact. Try adding &debug=all to the query
and you'll see
how documents are scored.

Best,
Erick

On Mon, Feb 3, 2014 at 6:57 AM, M. Flatterie  wrote:
> Greetings,
>
> My setup is:
> - SolrCloud V4.3
> - On collection
> - one shard
> - 1 master, 1 replica
>
> so each instance contains the entire index.  The index is rather small and 
> the replica is used for robustness.  There is no need (IMHO) to split shard 
> the index (yet, until the index gets bigger).
>
> My question:
> - if I do a query on a product name (that is what the index is about) on the 
> master I get a certain number of results and the documents.
> - if I do the same query on the replica, I get the same number of results but 
> the docs are in a different order.
> - I do not specify a sort parameter in my query, simply a q=.
> - obviously if I force a sort order, everything is ok, same results, same 
> order from both instances.
> - am I wrong in expecting the same results, in the SAME order?
>
> Follow up question if the order is not guaranteed:
> - should I force the dev. to use an explicit sort order?
> - if we force the sort, we then bypass the ranking / score order do we not?
> - should I force all queries to go to the master and fall back on the replica 
> only in the context of a total loss of the master?
>
> Other useful information:
>   - the admin page shows same number of documents in both instances.
>   - logs are clean, load and replication and queries worked ok.
>   - the web application that queries SOLR round robins between the two 
> instances, so getting results in a different order is bad for consistency.
>
> Thank you for your help!
>
> Nic
>


Re: Score of Search Term for every character remove

2014-02-03 Thread Jack Krupansky
I think he want to do a bunch of separate queries and return separate result 
sets for each.


Hmmm... maybe it would be nice to allow multiple "q" parameters in one query 
request, each returning a separate set of results.


-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Monday, February 3, 2014 2:08 PM
To: solr-user@lucene.apache.org
Subject: Re: Score of Search Term for every character remove

Maybe edgeNgram tokenizer? You haven't told us what the fields in the docs
you care about are

Best,
Erick


On Mon, Feb 3, 2014 at 4:48 AM, Lusung, Abner 
wrote:



 Hi,



I'm new with using SOLR and I'm curious if this is capable of doing the
following or similar.



Sample:

Query: "ABCDEF"

Returns:

ABCDEF > 0 hits

ABCDE > 2 hits

ABCD > 3 hits

ABC > 10 hits

AB > 20 hits

A > 100 hits



In one request only.



Thanks.



*Abner G. Lusung Jr.*| Java Web Development, Internet and Commerce,
Global Web Services  | Vishay Philippines Inc.

10th Floor Pacific Star Building, Makati Avenue corner Buendia Avenue,
Makati City, Philippines 1200

Phone: +63 2 8387421 loc. 7995 | Mobile: +63 9169674514

Website : www.vishay.com



[image: Vishay] 









Re: Score of Search Term for every character remove

2014-02-03 Thread Erick Erickson
Maybe edgeNgram tokenizer? You haven't told us what the fields in the docs
you care about are

Best,
Erick


On Mon, Feb 3, 2014 at 4:48 AM, Lusung, Abner wrote:

>  Hi,
>
>
>
> I'm new with using SOLR and I'm curious if this is capable of doing the
> following or similar.
>
>
>
> Sample:
>
> Query: "ABCDEF"
>
> Returns:
>
> ABCDEF > 0 hits
>
> ABCDE > 2 hits
>
> ABCD > 3 hits
>
> ABC > 10 hits
>
> AB > 20 hits
>
> A > 100 hits
>
>
>
> In one request only.
>
>
>
> Thanks.
>
>
>
> *Abner G. Lusung Jr.*| Java Web Development, Internet and Commerce,
> Global Web Services  | Vishay Philippines Inc.
>
> 10th Floor Pacific Star Building, Makati Avenue corner Buendia Avenue,
> Makati City, Philippines 1200
>
> Phone: +63 2 8387421 loc. 7995 | Mobile: +63 9169674514
>
> Website : www.vishay.com
>
>
>
> [image: Vishay] 
>
>
>
>
>


Solr and Polygon/Radius based spatial searches

2014-02-03 Thread leevduhl
We have a public property search site that we are looking to replace the back
end index server on and we are looking at Solr as a possible replacement
(ElasticSearch is another possibility).

One of the key search components of out site is to search on a bounding box
(rectangle), custom multi-point polygon, and/or a radius from a point.

It appears that Solr3 and Solr4 both supported spatial searching, but using
different methods.  Also, per this link,
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4, it appears that
Solr only supports point, rectangle and circle shapes and needs JTS and/or
WKT to support multi-point non non rectangular polygon shapes.

Our indexed data will included the long/lat values for all property records.

If someone can provide sample queries for the following situations, it would
be appreciated:
- All properties/points that fall within a multi-point polygon (ie: Polygon
points: Lo1 La1, Lo2 La2, Lo3 La3, Lo4 La4, Lo5 La5, Lo1, La1)

- All properties that fall within 1.5 miles (radius) of point: Lo1 La1

Other spatial search type functionality that may be targeted included:
- Ability to search within multiple polygons (both intersecting, non
intersecting and combinations
- Ability to search for properties that fall outside of a polygon

Thanks 
Lee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-and-Polygon-Radius-based-spatial-searches-tp4115121.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Announce list

2014-02-03 Thread Chris Hostetter

: Is there a mailing list for getting just announcements about new versions?

This is the primary usecase for the "general" list, although it does 
occasionally get other traffic from people with questions/discussion about 
the project as a whole...

https://lucene.apache.org/solr/discussion.html#general-discussion-generallucene
https://mail-archives.apache.org/mod_mbox/lucene-general/

If you are looking for a really low volume list where release 
announcements are made, that's the place to start.


-Hoss
http://www.lucidworks.com/


Re: need help in understating solr cloud stats data

2014-02-03 Thread David Santamauro


Zabbix 2.2 has a jmx client built in as well as a few JVM templates. I 
wrote my own templates for my solr instance and monitoring and graphing 
is wonderful.


David


On 02/03/2014 12:55 PM, Joel Cohen wrote:

I had to come up with some Solr stats monitoring for my Zabbix instance. I
found that using JMX was the easiest way for us.

There is a command line jmx client that works quite well for me.
http://crawler.archive.org/cmdline-jmxclient/

I wrote a shell script to wrap around that and shove the data back to
Zabbix for ingestion and monitoring. I've listed the stats that I am
gathering, and the mbean that is called. My shell script is rather
simplistic.

!/bin/bash

cmdLineJMXJar=/usr/local/lib/cmdline-jmxclient.jar
jmxHost=$1
port=$2
query=$3
value=$4

java -jar ${cmdLineJMXJar} user:pass ${jmxHost}:${port} ${query} ${value}
2>&1 | awk '{print $NF}'

The script is called as so: jmxstats.sh  
 
My collection name is productCatalog, so swap that with yours.

*select requests*:
solr/productCatalog:id=org.apache.solr.handler.component.SearchHandler,type=/select
requests
*select errors:
*solr/productCatalog:id=org.apache.solr.handler.component.SearchHandler,type=/select
errors
*95th percentile request time*:
solr/productCatalog:id=org.apache.solr.handler.component.SearchHandler,type=/select
95thPcRequestTime
*update requests*:
solr/productCatalog:id=org.apache.solr.handler.UpdateRequestHandler,type=/update
requests
*update errors:*
solr/productCatalog:id=org.apache.solr.handler.UpdateRequestHandler,type=/update
errors
*95th percentile update time:*
solr/productCatalog:id=org.apache.solr.handler.UpdateRequestHandler,type=/update
95thPcRequestTime

*query result cache lookups*:
solr/productCatalog:id=org.apache.solr.search.LRUCache,type=queryResultCache
cumulative_lookups
*query result cache inserts*:
solr/productCatalog:id=org.apache.solr.search.LRUCache,type=queryResultCache
cumulative_inserts
*query result cache evictions*:
solr/productCatalog:id=org.apache.solr.search.LRUCache,type=queryResultCache
cumulative_evictions
*query result cache hit ratio:
*solr/productCatalog:id=org.apache.solr.search.LRUCache,type=queryResultCache
cumulative_hitratio

*document cache lookups:
*solr/productCatalog:id=org.apache.solr.search.LRUCache,type=documentCache
cumulative_lookups
*document cache inserts:
*solr/productCatalog:id=org.apache.solr.search.LRUCache,type=documentCache
cumulative_inserts
*document cache evictions:
*solr/productCatalog:id=org.apache.solr.search.LRUCache,type=documentCache
cumulative_evictions
*document cache hit ratio:
*solr/productCatalog:id=org.apache.solr.search.LRUCache,type=documentCache
cumulative_hitratio

*filter cache lookups:
*solr/productCatalog:type=filterCache,id=org.apache.solr.search.FastLRUCache
cumulative_lookups
*filter cache inserts:
*solr/productCatalog:type=filterCache,id=org.apache.solr.search.FastLRUCache
cumulative_inserts
*filter cache evictions:
*solr/productCatalog:type=filterCache,id=org.apache.solr.search.FastLRUCache
cumulative_evictions
*filter cache hit ratio:
*solr/productCatalog:type=filterCache,id=org.apache.solr.search.FastLRUCache
cumulative_hitratio

*field value cache lookups:
*solr/productCatalog:type=fieldValueCache,id=org.apache.solr.search.FastLRUCache
cumulative_lookups
*field value cache inserts:
*solr/productCatalog:type=fieldValueCache,id=org.apache.solr.search.FastLRUCache
cumulative_inserts
*field value cache evictions:
*solr/productCatalog:type=fieldValueCache,id=org.apache.solr.search.FastLRUCache
cumulative_evictions
*field value cache hit ratio:
*solr/productCatalog:type=fieldValueCache,id=org.apache.solr.search.FastLRUCache
cumulative_evictions

This set of stats gets me a pretty good idea of what's going on with my
SolrCloud at any time. Anyone have any thoughts or suggestions?

Joel Cohen
Senior System Engineer
Bluefly, Inc.


On Mon, Feb 3, 2014 at 11:25 AM, Greg Walters wrote:


The code I wrote is currently a bit of an ugly hack so I'm a bit reluctant
to share it and there's some legal concerns with open-sourcing code within
my company. That being said, I wouldn't mind rewriting it on my own time.
Where can I find a starter kit for contributors with coding guidelines and
the like? Spruced up some I'd be OK with submitting a patch.

Thanks,
Greg

On Feb 3, 2014, at 10:08 AM, Mark Miller  wrote:


You should contribute that and spread the dev load with others :)

We need something like that at some point, it's just no one has done it.

We currently expect you to aggregate in the monitoring layer and it's a lot
to ask IMO.


- Mark

http://about.me/markrmiller

On Feb 3, 2014, at 10:49 AM, Greg Walters 

wrote:



I've had some issues monitoring Solr with the per-core mbeans and ended

up writing a custom "request handler" that gets loaded then registers
itself as an mbean. When called it polls all the per-core mbeans then adds
or averages them where appropriate before returning the requested value.
I'm not sure if there's a better way to

Re: need help in understating solr cloud stats data

2014-02-03 Thread Joel Cohen
I had to come up with some Solr stats monitoring for my Zabbix instance. I
found that using JMX was the easiest way for us.

There is a command line jmx client that works quite well for me.
http://crawler.archive.org/cmdline-jmxclient/

I wrote a shell script to wrap around that and shove the data back to
Zabbix for ingestion and monitoring. I've listed the stats that I am
gathering, and the mbean that is called. My shell script is rather
simplistic.

!/bin/bash

cmdLineJMXJar=/usr/local/lib/cmdline-jmxclient.jar
jmxHost=$1
port=$2
query=$3
value=$4

java -jar ${cmdLineJMXJar} user:pass ${jmxHost}:${port} ${query} ${value}
2>&1 | awk '{print $NF}'

The script is called as so: jmxstats.sh  
 
My collection name is productCatalog, so swap that with yours.

*select requests*:
solr/productCatalog:id=org.apache.solr.handler.component.SearchHandler,type=/select
requests
*select errors:
*solr/productCatalog:id=org.apache.solr.handler.component.SearchHandler,type=/select
errors
*95th percentile request time*:
solr/productCatalog:id=org.apache.solr.handler.component.SearchHandler,type=/select
95thPcRequestTime
*update requests*:
solr/productCatalog:id=org.apache.solr.handler.UpdateRequestHandler,type=/update
requests
*update errors:*
solr/productCatalog:id=org.apache.solr.handler.UpdateRequestHandler,type=/update
errors
*95th percentile update time:*
solr/productCatalog:id=org.apache.solr.handler.UpdateRequestHandler,type=/update
95thPcRequestTime

*query result cache lookups*:
solr/productCatalog:id=org.apache.solr.search.LRUCache,type=queryResultCache
cumulative_lookups
*query result cache inserts*:
solr/productCatalog:id=org.apache.solr.search.LRUCache,type=queryResultCache
cumulative_inserts
*query result cache evictions*:
solr/productCatalog:id=org.apache.solr.search.LRUCache,type=queryResultCache
cumulative_evictions
*query result cache hit ratio:
*solr/productCatalog:id=org.apache.solr.search.LRUCache,type=queryResultCache
cumulative_hitratio

*document cache lookups:
*solr/productCatalog:id=org.apache.solr.search.LRUCache,type=documentCache
cumulative_lookups
*document cache inserts:
*solr/productCatalog:id=org.apache.solr.search.LRUCache,type=documentCache
cumulative_inserts
*document cache evictions:
*solr/productCatalog:id=org.apache.solr.search.LRUCache,type=documentCache
cumulative_evictions
*document cache hit ratio:
*solr/productCatalog:id=org.apache.solr.search.LRUCache,type=documentCache
cumulative_hitratio

*filter cache lookups:
*solr/productCatalog:type=filterCache,id=org.apache.solr.search.FastLRUCache
cumulative_lookups
*filter cache inserts:
*solr/productCatalog:type=filterCache,id=org.apache.solr.search.FastLRUCache
cumulative_inserts
*filter cache evictions:
*solr/productCatalog:type=filterCache,id=org.apache.solr.search.FastLRUCache
cumulative_evictions
*filter cache hit ratio:
*solr/productCatalog:type=filterCache,id=org.apache.solr.search.FastLRUCache
cumulative_hitratio

*field value cache lookups:
*solr/productCatalog:type=fieldValueCache,id=org.apache.solr.search.FastLRUCache
cumulative_lookups
*field value cache inserts:
*solr/productCatalog:type=fieldValueCache,id=org.apache.solr.search.FastLRUCache
cumulative_inserts
*field value cache evictions:
*solr/productCatalog:type=fieldValueCache,id=org.apache.solr.search.FastLRUCache
cumulative_evictions
*field value cache hit ratio:
*solr/productCatalog:type=fieldValueCache,id=org.apache.solr.search.FastLRUCache
cumulative_evictions

This set of stats gets me a pretty good idea of what's going on with my
SolrCloud at any time. Anyone have any thoughts or suggestions?

Joel Cohen
Senior System Engineer
Bluefly, Inc.


On Mon, Feb 3, 2014 at 11:25 AM, Greg Walters wrote:

> The code I wrote is currently a bit of an ugly hack so I'm a bit reluctant
> to share it and there's some legal concerns with open-sourcing code within
> my company. That being said, I wouldn't mind rewriting it on my own time.
> Where can I find a starter kit for contributors with coding guidelines and
> the like? Spruced up some I'd be OK with submitting a patch.
>
> Thanks,
> Greg
>
> On Feb 3, 2014, at 10:08 AM, Mark Miller  wrote:
>
> > You should contribute that and spread the dev load with others :)
> >
> > We need something like that at some point, it's just no one has done it.
> We currently expect you to aggregate in the monitoring layer and it's a lot
> to ask IMO.
> >
> > - Mark
> >
> > http://about.me/markrmiller
> >
> > On Feb 3, 2014, at 10:49 AM, Greg Walters 
> wrote:
> >
> >> I've had some issues monitoring Solr with the per-core mbeans and ended
> up writing a custom "request handler" that gets loaded then registers
> itself as an mbean. When called it polls all the per-core mbeans then adds
> or averages them where appropriate before returning the requested value.
> I'm not sure if there's a better way to get jvm-wide stats via jmx but it
> is *a* way to get it done.
> >>
> >> Thanks,
> >> Greg
> >>
> >> On Feb 3, 2014, at 1:33 AM, adfel70  wrote:
> >

Re: SolrCloud multiple data center support

2014-02-03 Thread Mark Miller
SolrCloud has not tackled multi data center yet.

I don’t think a or b are very good options yet.

Honestly, I think the best current bet is to use something like Apache Flume to 
send data to both data centers - it will handle retries and keeping things in 
sync and splitting the stream. Doesn’t satisfy all use cases though.

At some point, multi data center support will happen.

I can’t remember where ZooKeeper’s support for it is at, but with that and some 
logic to favor nodes in your data center, that might be a viable route.

- Mark

http://about.me/markrmiller

On Feb 3, 2014, at 11:48 AM, Darrell Burgan  wrote:

> Hello, we are using Solr in a SolrCloud configuration, with two Solr 
> instances running with three Zookeepers in a single data center. We presently 
> have a single search index with about 35 million entries in it, about 60GB 
> disk space on each of the two Solr servers (120GB total). I would expect our 
> usage of Solr to grow to include other search indexes, and likely larger data 
> volumes.
>  
> I’m writing because we’re needing to grow beyond a single data center, with 
> two (potentially incompatible) goals:
>  
> 1.   We need to be able to have a hot disaster recovery site, in a 
> completely separate data center, that has a near-realtime replica of the 
> search index.
> 
> 2.   We’d like to have the option to have multiple active/active data 
> centers that each see and update the same search index, distributed across 
> data centers.
>  
> The options I’m aware of from reading archives:
>  
> a.   Simply set up the remote Solr instances as active parts of the same 
> SolrCloud cluster. This will  essentially involve us standing up multiple 
> Zookeepers in the second data center, and multiple Solr instances, and they 
> will all keep each other in sync magically. This will also solve both of our 
> goals. However, I’m concerned about performance and whether SolrCloud is 
> smart enough to route local search queries only to local Solr servers … ? 
> Also, how does such a cluster tolerate and recover from network partitions?
> 
> b.  The remote Solr instances form their own completely unrelated 
> SolrCloud cluster. I have to invent some kind of replication logic of my own 
> to sync data between them. This replication would have to be bidirectional to 
> satisfy both of our goals. I strongly dislike this option since the 
> application really should not concern itself with data distribution. But I’ll 
> do it if I must.
>  
> So my questions are:
>  
> -  Can anyone give me any guidance as to option a? Anyone using this 
> in a real production setting? Words of wisdom? Does it work?
> 
> -  Are there any other options that I’m not considering?
> 
> -  What is Solr’s answer to such configurations (we can’t be alone in 
> needing one)? Any big enhancements coming on the Solr road map to deal with 
> this?
>  
> Thanks!
> Darrell Burgan
>  
>  
> 
> Darrell Burgan | Chief Architect, PeopleAnswers
> office: 214 445 2172 | mobile: 214 564 4450 | fax: 972 692 5386 | 
> darrell.bur...@infor.com | http://www.infor.com
> CONFIDENTIALITY NOTE: This email (including any attachments) is confidential 
> and may be protected by legal privilege. If you are not the intended 
> recipient, be aware that any disclosure, copying, distribution, or use of the 
> information contained herein is prohibited.  If you have received this 
> message in error, please notify the sender by replying to this message and 
> then delete this message in its entirety. Thank you for your cooperation.
> 



Getting index schema in SolrCloud mode

2014-02-03 Thread Peter Keegan
I'm indexing data with a SolrJ client via SolrServer. Currently, I parse
the schema returned from a HttpGet on:
localhost:8983/solr/collection/schema/fields

What is the recommended way to read the schema with CloudSolrServer? Can it
be done with a single HttpGet to a ZK server?

Thanks,
Peter


RE: JVM heap constraints and garbage collection

2014-02-03 Thread Michael Della Bitta
> i2.xlarge looks vastly better than m2.2xlarge at about the same price, so
I must be missing something: Is it the 120 IPs that explains why anyone
would choose m2.2xlarge?

i2.xlarge is a relatively new instance type (December 2013). In our case,
we're partway through a yearlong reservation of m2.2xlarges and won't be up
for reconsidering that for a few months. I don't think that Amazon has ever
dropped a legacy instance type, so there's bound to be some overlap as they
roll out new ones. And I imagine someone setting up a huge memcached pool
might rather have the extra RAM over the SSD, so it still makes sense for
the m2.2xlarge to be around.

It can be kind of hard to understand how the various parameters that make
up an instance type get decided on, though. I have to consult that
ec2instances.info link all the time to make sure I'm not missing something
regarding what types we should be using.


On Feb 1, 2014 1:51 PM, "Toke Eskildsen"  wrote:

> Michael Della Bitta [michael.della.bi...@appinions.com] wrote:
> > Here at Appinions, we use mostly m2.2xlarges, but the new i2.xlarges look
> > pretty tasty primarily because of the SSD, and I'll probably push for a
> > switch to those when our reservations run out.
>
> > http://www.ec2instances.info/
>
> i2.xlarge looks vastly better than m2.2xlarge at about the same price, so
> I must be missing something: Is it the 120 IPs that explains why anyone
> would choose m2.2xlarge?
>
> Anyhow, it is good to see that Amazon now has 11 different setups with
> SSD. The IOPS looks solid at around 40K/s (estimated) for the i2.xlarge and
> they even have TRIM (
> http://aws.amazon.com/about-aws/whats-new/2013/12/19/announcing-the-next-generation-of-amazon-ec2-high-i/o-instance/).
>
> - Toke Eskildsen


Re: Apache Solr.

2014-02-03 Thread solr2020
You can have this kind of configuration in Data import handler xml file to
index different type of files.



  

  


  

 
  
  



Hope this helps.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Apache-Solr-tp4114996p4115102.html
Sent from the Solr - User mailing list archive at Nabble.com.


[ANN] Heliosearch 0.03 with off-heap field cache

2014-02-03 Thread Yonik Seeley
A new Heliosearch pre-release has been cut for people to try out:
https://github.com/Heliosearch/heliosearch/releases

Release Notes:
-
This is Heliosearch v0.03
Heliosearch is forked from Apache Solr and includes the following
additional features:

- Off-Heap Filters to reduce garbage collection pauses and overhead.
http://www.heliosearch.org/off-heap-filters

- Removed the 1024 limit on the number of clauses in a boolean query.
For example: q=id:(doc1 doc2 doc3 doc4 doc5 ... doc2000) will now work
correctly without throwing an exception.

- Deep Paging with cursorMark.  This is not yet in a current release
of Apache Solr, but should be in Solr 4.7
http://heliosearch.org/solr/paging-and-deep-paging/

- nCache - the new Off-Heap FieldCache to reduce garbage collection
overhead and accelerate sorting, faceting, and function queries.
http://heliosearch.org/solr-off-heap-fieldcache



-Yonik
http://heliosearch.com -- making solr shine


Re: Solr and SDL Tridion Integration

2014-02-03 Thread Chris Warner
There are many ways to do this, Prasi. You have a lot of thinking to do on the 
subject.

You could decide to publish your content to database, and then index that 
database in Solr.

You could publish XML or CSV files of your content for Solr to read and index.

You could use nutch or some other tool to crawl your web server.

There are many more methods, probably. These being some of the more common.

Does your site have dynamic content presentation? If so, you may want to 
consider having Solr examine your broker database.

Static pages on your site? You may want to go with either a crawler or 
publishing a special file for Solr.

Please check out https://tridion.stackexchange.com/ for more on this topic.
 
--
chris_war...@yahoo.com



On Monday, February 3, 2014 3:54 AM, Jack Krupansky  
wrote:
If SDL Tridion can export to CSV format, Solr can then import from CSV 
format.

Otherwise, you may have to write a custom script or even maybe Java code to 
read from SDL Tridion and output a supported Solr format, such as Solr XML, 
Solr JSON, or CSV.

-- Jack Krupansky


-Original Message- 
From: Prasi S
Sent: Monday, February 3, 2014 4:16 AM
To: solr-user@lucene.apache.org
Subject: Solr and SDL Tridion Integration

Hi,
I want to index sdl tridion content to solr. Can you suggest how this can
be achieved. Is there any document/tutorial for this? Thanks

Thanks,
Prasi


Re: Announce list

2014-02-03 Thread Daniel Collins
I have seen other projects that have a releases mailing list, the only use
cases I can think of are:

1) users who want notifications about new releases, but don't want the
"flood" of the full user-list.
2) historical searching to see how often releases were made.  Given there
isn't an official timetable, its not really going to be useful as a forward
planner, but might have some value looking at how often patch releases come
out.  One could attempt to infer some degree of stability (or more
accurately lack of stability) if lots of patches for a given release came
out quickly.

Wasn't aware of the RSS feed, that's useful as an indicator for use case 1
at least.  Use case 2 is probably too vague and has lots of
assumptions/inferences that mean its a bad idea anyway :)



On 3 February 2014 14:37, Lajos  wrote:

> There's always http://projects.apache.org/feeds/rss.xml.
>
> L
>
>
>
> On 03/02/2014 14:59, Arie Zilberstein wrote:
>
>> Hi,
>>
>> Is there a mailing list for getting just announcements about new versions?
>>
>> Thanks,
>> Arie
>>
>>


SolrCloud multiple data center support

2014-02-03 Thread Darrell Burgan
Hello, we are using Solr in a SolrCloud configuration, with two Solr instances 
running with three Zookeepers in a single data center. We presently have a 
single search index with about 35 million entries in it, about 60GB disk space 
on each of the two Solr servers (120GB total). I would expect our usage of Solr 
to grow to include other search indexes, and likely larger data volumes.

I'm writing because we're needing to grow beyond a single data center, with two 
(potentially incompatible) goals:


1.   We need to be able to have a hot disaster recovery site, in a 
completely separate data center, that has a near-realtime replica of the search 
index.


2.   We'd like to have the option to have multiple active/active data 
centers that each see and update the same search index, distributed across data 
centers.

The options I'm aware of from reading archives:


a.   Simply set up the remote Solr instances as active parts of the same 
SolrCloud cluster. This will  essentially involve us standing up multiple 
Zookeepers in the second data center, and multiple Solr instances, and they 
will all keep each other in sync magically. This will also solve both of our 
goals. However, I'm concerned about performance and whether SolrCloud is smart 
enough to route local search queries only to local Solr servers ... ? Also, how 
does such a cluster tolerate and recover from network partitions?


b.  The remote Solr instances form their own completely unrelated SolrCloud 
cluster. I have to invent some kind of replication logic of my own to sync data 
between them. This replication would have to be bidirectional to satisfy both 
of our goals. I strongly dislike this option since the application really 
should not concern itself with data distribution. But I'll do it if I must.

So my questions are:


-  Can anyone give me any guidance as to option a? Anyone using this in 
a real production setting? Words of wisdom? Does it work?


-  Are there any other options that I'm not considering?


-  What is Solr's answer to such configurations (we can't be alone in 
needing one)? Any big enhancements coming on the Solr road map to deal with 
this?

Thanks!
Darrell Burgan


[Description: Infor]

Darrell Burgan | Chief Architect, PeopleAnswers
office: 214 445 2172 | mobile: 214 564 4450 | fax: 972 692 5386 | 
darrell.bur...@infor.com | http://www.infor.com

CONFIDENTIALITY NOTE: This email (including any attachments) is confidential 
and may be protected by legal privilege. If you are not the intended recipient, 
be aware that any disclosure, copying, distribution, or use of the information 
contained herein is prohibited.  If you have received this message in error, 
please notify the sender by replying to this message and then delete this 
message in its entirety. Thank you for your cooperation.



Re: need help in understating solr cloud stats data

2014-02-03 Thread Greg Walters
The code I wrote is currently a bit of an ugly hack so I'm a bit reluctant to 
share it and there's some legal concerns with open-sourcing code within my 
company. That being said, I wouldn't mind rewriting it on my own time. Where 
can I find a starter kit for contributors with coding guidelines and the like? 
Spruced up some I'd be OK with submitting a patch.

Thanks,
Greg

On Feb 3, 2014, at 10:08 AM, Mark Miller  wrote:

> You should contribute that and spread the dev load with others :)
> 
> We need something like that at some point, it’s just no one has done it. We 
> currently expect you to aggregate in the monitoring layer and it’s a lot to 
> ask IMO.
> 
> - Mark
> 
> http://about.me/markrmiller
> 
> On Feb 3, 2014, at 10:49 AM, Greg Walters  wrote:
> 
>> I've had some issues monitoring Solr with the per-core mbeans and ended up 
>> writing a custom "request handler" that gets loaded then registers itself as 
>> an mbean. When called it polls all the per-core mbeans then adds or averages 
>> them where appropriate before returning the requested value. I'm not sure if 
>> there's a better way to get jvm-wide stats via jmx but it is *a* way to get 
>> it done.
>> 
>> Thanks,
>> Greg
>> 
>> On Feb 3, 2014, at 1:33 AM, adfel70  wrote:
>> 
>>> I'm sending all solr stats data to graphite.
>>> I have some questions:
>>> 1. query_handler/select requestTime - 
>>> if i'm looking at some metric, lets say 75thPcRequestTime - I see that each
>>> core in a single collection has different values.
>>> Is each value of each core is the time that specific core spent on a
>>> request?
>>> so to get an idea of total request time, I should summarize all the values
>>> of all the cores?
>>> 
>>> 
>>> 2.update_handler/commits - does this include auto_commits? becuaste I'm
>>> pretty sure I'm not doing any manual commits and yet I see a number there.
>>> 
>>> 3. update_handler/docs pending - what does this mean? pending for what? for
>>> flush to disk?
>>> 
>>> thanks.
>>> 
>>> 
>>> 
>>> --
>>> View this message in context: 
>>> http://lucene.472066.n3.nabble.com/need-help-in-understating-solr-cloud-stats-data-tp4114992.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
> 



Re: need help in understating solr cloud stats data

2014-02-03 Thread Mark Miller
You should contribute that and spread the dev load with others :)

We need something like that at some point, it’s just no one has done it. We 
currently expect you to aggregate in the monitoring layer and it’s a lot to ask 
IMO.

- Mark

http://about.me/markrmiller

On Feb 3, 2014, at 10:49 AM, Greg Walters  wrote:

> I've had some issues monitoring Solr with the per-core mbeans and ended up 
> writing a custom "request handler" that gets loaded then registers itself as 
> an mbean. When called it polls all the per-core mbeans then adds or averages 
> them where appropriate before returning the requested value. I'm not sure if 
> there's a better way to get jvm-wide stats via jmx but it is *a* way to get 
> it done.
> 
> Thanks,
> Greg
> 
> On Feb 3, 2014, at 1:33 AM, adfel70  wrote:
> 
>> I'm sending all solr stats data to graphite.
>> I have some questions:
>> 1. query_handler/select requestTime - 
>> if i'm looking at some metric, lets say 75thPcRequestTime - I see that each
>> core in a single collection has different values.
>> Is each value of each core is the time that specific core spent on a
>> request?
>> so to get an idea of total request time, I should summarize all the values
>> of all the cores?
>> 
>> 
>> 2.update_handler/commits - does this include auto_commits? becuaste I'm
>> pretty sure I'm not doing any manual commits and yet I see a number there.
>> 
>> 3. update_handler/docs pending - what does this mean? pending for what? for
>> flush to disk?
>> 
>> thanks.
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/need-help-in-understating-solr-cloud-stats-data-tp4114992.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 



Re: need help in understating solr cloud stats data

2014-02-03 Thread Greg Walters
I've had some issues monitoring Solr with the per-core mbeans and ended up 
writing a custom "request handler" that gets loaded then registers itself as an 
mbean. When called it polls all the per-core mbeans then adds or averages them 
where appropriate before returning the requested value. I'm not sure if there's 
a better way to get jvm-wide stats via jmx but it is *a* way to get it done.

Thanks,
Greg

On Feb 3, 2014, at 1:33 AM, adfel70  wrote:

> I'm sending all solr stats data to graphite.
> I have some questions:
> 1. query_handler/select requestTime - 
> if i'm looking at some metric, lets say 75thPcRequestTime - I see that each
> core in a single collection has different values.
> Is each value of each core is the time that specific core spent on a
> request?
> so to get an idea of total request time, I should summarize all the values
> of all the cores?
> 
> 
> 2.update_handler/commits - does this include auto_commits? becuaste I'm
> pretty sure I'm not doing any manual commits and yet I see a number there.
> 
> 3. update_handler/docs pending - what does this mean? pending for what? for
> flush to disk?
> 
> thanks.
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/need-help-in-understating-solr-cloud-stats-data-tp4114992.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Duplicate Facet.FIelds cause same results, should dedupe?

2014-02-03 Thread William Bell
If we add :

facet.field=prac_spec_heir&facet.field=prac_spec_heir

we get it twice in the results. This breaks deserialization on wt=json
since you cannot have the same name twice

Thoughts? Seems like a new bug in 4.6 ?


"facet.field": ["prac_spec_heir","all_proc_name_code","all_cond_name_code","
prac_spec_heir","{!ex=exgender}gender","{!ex=expayor}payor_code_name"],

-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: SolrCloudServer questions

2014-02-03 Thread Greg Walters
I've seen best throughput while indexing by sending in batches of documents 
rather than individual documents per request. You might try queueing on your 
indexing machines for a bit then sending off a batch every N documents.

Thanks,
Greg

On Feb 1, 2014, at 6:49 PM, Software Dev  wrote:

> Also, if we are seeing a huge cpu spike on the leader when doing a bulk
> index, would changing any of the options help?
> 
> 
> On Sat, Feb 1, 2014 at 2:59 PM, Software Dev wrote:
> 
>> Out use case is we have 3 indexing machines pulling off a kafka queue and
>> they are all sending individual updates.
>> 
>> 
>> On Fri, Jan 31, 2014 at 12:54 PM, Mark Miller wrote:
>> 
>>> Just make sure parallel updates is set to true.
>>> 
>>> If you want to load even faster, you can use the bulk add methods, or if
>>> you need more fine grained responses, use the single add from multiple
>>> threads (though bulk add can also be done via multiple threads if you
>>> really want to try and push the max).
>>> 
>>> - Mark
>>> 
>>> http://about.me/markrmiller
>>> 
>>> On Jan 31, 2014, at 3:50 PM, Software Dev 
>>> wrote:
>>> 
 Which of any of these settings would be beneficial when bulk uploading?
 
 
 On Fri, Jan 31, 2014 at 11:05 AM, Mark Miller 
>>> wrote:
 
> 
> 
> On Jan 31, 2014, at 1:56 PM, Greg Walters 
> wrote:
> 
>> I'm assuming you mean CloudSolrServer here. If I'm wrong please ignore
> my response.
>> 
>>> -updatesToLeaders
>> 
>> Only send documents to shard leaders while indexing. This saves
> cross-talk between slaves and leaders which results in more efficient
> document routing.
> 
> Right, but recently this has less of an affect because CloudSolrServer
>>> can
> now hash documents and directly send them to the right place. This
>>> option
> has become more historical. Just make sure you set the correct id
>>> field on
> the CloudSolrServer instance for this hashing to work (I think it
>>> defaults
> to "id").
> 
>> 
>>> shutdownLBHttpSolrServer
>> 
>> CloudSolrServer uses a LBHttpSolrServer behind the scenes to
>>> distribute
> requests (that aren't updates directly to leaders). Where did you find
> this? I don't see this in the javadoc anywhere but it is a boolean in
>>> the
> CloudSolrServer class. It looks like when you create a new
>>> CloudSolrServer
> and pass it your own LBHttpSolrServer the boolean gets set to false
>>> and the
> CloudSolrServer won't shut down the LBHttpSolrServer when it gets shut
>>> down.
>> 
>>> parellelUpdates
>> 
>> The javadoc's done have any description for this one but I checked out
> the code for CloudSolrServer and if parallelUpdates it looks like it
> executes update statements to multiple shards at the same time.
> 
> Right, we should def add some javadoc, but this sends updates to
>>> shards in
> parallel rather than with a single thread. Can really increase update
> speed. Still not as powerful as using CloudSolrServer from multiple
> threads, but a nice improvement non the less.
> 
> 
> - Mark
> 
> http://about.me/markrmiller
> 
>> 
>> I'm no dev but I can read so please excuse any errors on my part.
>> 
>> Thanks,
>> Greg
>> 
>> On Jan 31, 2014, at 11:40 AM, Software Dev >>> 
> wrote:
>> 
>>> Can someone clarify what the following options are:
>>> 
>>> - updatesToLeaders
>>> - shutdownLBHttpSolrServer
>>> - parallelUpdates
>>> 
>>> Also, I remember in older version of Solr there was an efficient
>>> format
>>> that was used between SolrJ and Solr that is more compact. Does this
> sill
>>> exist in the latest version of Solr? If so, is it the default?
>>> 
>>> Thanks
>> 
> 
> 
>>> 
>>> 
>> 



Re: Need help for integrating solr-4.5.1 with UIMA

2014-02-03 Thread Luca Foppiano
On Mon, Feb 3, 2014 at 10:20 AM, rashi gandhi wrote:

> Hi,
>
>
Hi,


> I'm trying to integrate Solr 4.5.1 with UIMA and following the steps of the
> solr-4.5.1\contrib\uima\readme.txt.
>
> Edited the solrconfig.xml as given in readme.txt. Also I have registered
> the required keys.
>

[...]


> at java.lang.Thread.run(Thread.java:619)
>
> *Caused by: java.net.ConnectException: Connection timed out:*

*connect*
>

[...]

>
> What is going wrong?
>
> Please help me on this.
>

In principle I've never integrate UIMA and solr, but quickly looking at
your exception (please send only the meaningful part of the stack trace)
seems you have a problem to connect. I would start from there.

Regards
Luca
-- 
Luca Foppiano

Software Engineer
+31615253280
l...@foppiano.org
www.foppiano.org


Elevation and nested queries

2014-02-03 Thread Holger Rieß
I have a simple query 'q=hurco' (parser type edismax). Elevation is properly 
configured, so I get the expected results:
...

7HURCO

  0~*

true


A similar query with a nested query 'q=(hurco AND _query_:"{!field f=debtoritem 
v=0~*}")' returns the same document but without elevation:
...

7HURCO

  0~*

false


Does a nested query disable elevation?

There is an additional spellcheck component added to the query. This is working 
as expected.

spellcheck
elevator


Thanks,
Holger


SolrCloud query results order master vs replica

2014-02-03 Thread M. Flatterie
Greetings,

My setup is:
- SolrCloud V4.3
- On collection
- one shard
- 1 master, 1 replica

so each instance contains the entire index.  The index is rather small and the 
replica is used for robustness.  There is no need (IMHO) to split shard the 
index (yet, until the index gets bigger).

My question:
- if I do a query on a product name (that is what the index is about) on the 
master I get a certain number of results and the documents.
- if I do the same query on the replica, I get the same number of results but 
the docs are in a different order.
- I do not specify a sort parameter in my query, simply a q=.
- obviously if I force a sort order, everything is ok, same results, same order 
from both instances.
- am I wrong in expecting the same results, in the SAME order?

Follow up question if the order is not guaranteed:
- should I force the dev. to use an explicit sort order?
- if we force the sort, we then bypass the ranking / score order do we not?
- should I force all queries to go to the master and fall back on the replica 
only in the context of a total loss of the master?

Other useful information:
  - the admin page shows same number of documents in both instances.
  - logs are clean, load and replication and queries worked ok.
  - the web application that queries SOLR round robins between the two 
instances, so getting results in a different order is bad for consistency.

Thank you for your help!

Nic



Re: shard1 gone missing ... (upgrade to 4.6.1)

2014-02-03 Thread David Santamauro


Mark, I am testing the upgrade and indexing gives me this error:

914379 [http-apr-8080-exec-4] ERROR org.apache.solr.core.SolrCore  ? 
org.apache.solr.common.SolrException: Invalid UTF-8 middle byte 0xe0 (at 
char #1, byte #-1)


... and a bunch of these

request: 
http://xx.xx.xx.xx/col1/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fxx.xx.xx.xx%3A8080%2Fcol1%2F&wt=javabin&version=2
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:240)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:744)
1581335 [updateExecutor-1-thread-7] ERROR 
org.apache.solr.update.StreamingSolrServers  ? error

org.apache.solr.common.SolrException: Bad Request


Nothing else in the process chain has changed. Does this have anything 
to do with the deprecated warnings:


WARN  org.apache.solr.handler.UpdateRequestHandler  ? Using deprecated 
class: XmlUpdateRequestHandler -- replace with UpdateRequestHandler


thanks

David


On 01/31/2014 11:22 AM, Mark Miller wrote:



On Jan 31, 2014, at 11:15 AM, David Santamauro  
wrote:


On 01/31/2014 10:22 AM, Mark Miller wrote:


I’d also highly recommend you try moving to Solr 4.6.1 when you can though. We 
have fixed many, many, many bugs around SolrCloud in the 4 releases since 4.4. 
You can follow the progress in the CHANGES file we update for each release.


Can I do a drop-in replacement of 4.4.0 ?




It should be a drop in replacement. For some that use deep API’s in plugins, 
sometimes you might have to make a couple small changes to your code.

Alway best to do a test with a copy of your index, but for most, it should be a 
drop in replacement.

- Mark

http://about.me/markrmiller





Re: weird exception on update

2014-02-03 Thread Dmitry Kan
The solution (or workaround?) is to drop the defType from one of the cores
and use {!qparser} local param on every query, including the delete by
query. It would be really great, if this could be handled on the solr
config side only without involving the client changes.




On Mon, Feb 3, 2014 at 4:02 PM, Dmitry Kan  wrote:

> This exception is similar to what is talked about here:
> https://gist.github.com/mbklein/6367133
> http://irc.projecthydra.org/2013-08-28.html
>
> We found out that:
>
> 1. this happens iff on two cores inside the same container there is a
> query parser defined via defType.
> 2. After removing index files on one of the cores, the delete by query
> works just fine. Right after restarting the container, the same query fails.
>
>
> Is there a jira for this? Should I create one?
>
> Dmitry
>
>
> On Mon, Feb 3, 2014 at 2:03 PM, Dmitry Kan  wrote:
>
>> Hello!
>>
>> We are hitting a really strange and nasty issue when trying to delete by
>> query and not when just adding documents. The exception says:
>> http://pastebin.com/B1x5dAF7
>>
>> Any ideas as to what is going on?
>>
>> The delete by query is referencing the unique field. The core's index
>> does not contain the value that is being deleted.
>> Solr: 4.3.1.
>>
>>
>> --
>> Dmitry
>> Blog: http://dmitrykan.blogspot.com
>> Twitter: twitter.com/dmitrykan
>>
>
>
>
> --
> Dmitry
> Blog: http://dmitrykan.blogspot.com
> Twitter: twitter.com/dmitrykan
>



-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan


Re: Announce list

2014-02-03 Thread Lajos

There's always http://projects.apache.org/feeds/rss.xml.

L


On 03/02/2014 14:59, Arie Zilberstein wrote:

Hi,

Is there a mailing list for getting just announcements about new versions?

Thanks,
Arie



Re: Announce list

2014-02-03 Thread Alexandre Rafalovitch
I don't think so.  What would be the value?

Would you be upgrading every 6-8 weeks as the new versions come out?
Or are you downstream of Solr and want to check compatibility?

Curious what the use case would be.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Mon, Feb 3, 2014 at 8:59 PM, Arie Zilberstein
 wrote:
> Hi,
>
> Is there a mailing list for getting just announcements about new versions?
>
> Thanks,
> Arie


Strange Error Message while Full Import

2014-02-03 Thread Peter Sch�tt
Hallo,

when I do a full import of a SOLR index I become a strange error 
message:

org.apache.solr.handler.dataimport.DataImportHandlerException: 
java.sql.SQLRecoverableException: Closed Resultset: next

It is only a simple query 

select FIRMEN_ID, FIRMIERUNG, FIRMENKENNUNG,
 PZN, DEBITORNUMMER, ADRESS_ID from DAT_FIRMA

This error seems to be a subsequent error but there is no other cause in 
the stacktrace.

Thanks for any hints.

Ciao
  Peter Schütt

P.S. The error stacktrace


Feb 03, 2014 2:11:01 PM org.apache.solr.common.SolrException log
SEVERE: getNext() failed for query 'select FIRMEN_ID, FIRMIERUNG, 
FIRMENKENNUNG,
 PZN, DEBITORNUMMER, ADRESS_ID from 
DAT_FIRMA':org.apache.solr.handler.dataimport.DataImportHandlerException
: java.sql.SQLRecoverableException: Closed Resultset: next
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd
Throw(DataImportHandlerException.java:63)
at 
org.apache.solr.handler.dataimport.PreparedStatementJdbcDataSource$Re
sultSetIterator.hasnext(PreparedStatementJdbcDataSource.java:404)
at 
org.apache.solr.handler.dataimport.PreparedStatementJdbcDataSource$Re
sultSetIterator.access$600(PreparedStatementJdbcDataSource.java:256)
at 
org.apache.solr.handler.dataimport.PreparedStatementJdbcDataSource$Re
sultSetIterator$1.hasNext(PreparedStatementJdbcDataSource.java:324)
at 
org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(Entity
ProcessorBase.java:116)
at 
org.apache.solr.handler.dataimport.PreparedStatementSqlEntityProcesso
r.handleQuery(PreparedStatementSqlEntityProcessor.java:119)
at 
org.apache.solr.handler.dataimport.PreparedStatementSqlEntityProcesso
r.nextRow(PreparedStatementSqlEntityProcessor.java:124)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
ityProcessorWrapper.java:243)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument
(DocBuilde
r.java:465)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument
(DocBuilde
r.java:404)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump
(DocBuilder.j
ava:319)
at org.apache.solr.handler.dataimport.DocBuilder.execute
(DocBuilder.java
:227)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport
(DataImpo
rter.java:422)
at org.apache.solr.handler.dataimport.DataImporter.runCmd
(DataImporter.j
ava:487)
at org.apache.solr.handler.dataimport.DataImporter$1.run
(DataImporter.ja
va:468)
Caused by: java.sql.SQLRecoverableException: Closed Resultset: next
at oracle.jdbc.driver.OracleResultSetImpl.next
(OracleResultSetImpl.java:
214)
at org.apache.tomcat.dbcp.dbcp.DelegatingResultSet.next
(DelegatingResult
Set.java:207)
at org.apache.tomcat.dbcp.dbcp.DelegatingResultSet.next
(DelegatingResult
Set.java:207)
at 
org.apache.solr.handler.dataimport.PreparedStatementJdbcDataSource$Re
sultSetIterator.hasnext(PreparedStatementJdbcDataSource.java:396)
... 13 more



Re: Writing a customize updateRequestHandler

2014-02-03 Thread Jorge Luis Betancourt Gonzalez
In the book Apache Solr Beginner’s Guide there is a section dedicated to write 
new Solr plugins, perhaps it would be a good place to start, also in the wiki 
there is a page about this, but the it’s a light introduction. I’ve found that 
a very good starting point it’s just browse throw the code of some standard 
components similar to the one you’re trying to customize.

On Feb 3, 2014, at 9:00 AM, neerajp  wrote:

> Hi,
> I want to write a custom updateRequestHandler.
> Can you pl.s guide me the steps I need to perform for that ?
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Writing-a-customize-updateRequestHandler-tp4115059.html
> Sent from the Solr - User mailing list archive at Nabble.com.


III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: weird exception on update

2014-02-03 Thread Dmitry Kan
This exception is similar to what is talked about here:
https://gist.github.com/mbklein/6367133
http://irc.projecthydra.org/2013-08-28.html

We found out that:

1. this happens iff on two cores inside the same container there is a query
parser defined via defType.
2. After removing index files on one of the cores, the delete by query
works just fine. Right after restarting the container, the same query fails.


Is there a jira for this? Should I create one?

Dmitry


On Mon, Feb 3, 2014 at 2:03 PM, Dmitry Kan  wrote:

> Hello!
>
> We are hitting a really strange and nasty issue when trying to delete by
> query and not when just adding documents. The exception says:
> http://pastebin.com/B1x5dAF7
>
> Any ideas as to what is going on?
>
> The delete by query is referencing the unique field. The core's index does
> not contain the value that is being deleted.
> Solr: 4.3.1.
>
>
> --
> Dmitry
> Blog: http://dmitrykan.blogspot.com
> Twitter: twitter.com/dmitrykan
>



-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan


Writing a customize updateRequestHandler

2014-02-03 Thread neerajp
Hi,
I want to write a custom updateRequestHandler.
Can you pl.s guide me the steps I need to perform for that ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Writing-a-customize-updateRequestHandler-tp4115059.html
Sent from the Solr - User mailing list archive at Nabble.com.


Announce list

2014-02-03 Thread Arie Zilberstein
Hi,

Is there a mailing list for getting just announcements about new versions?

Thanks,
Arie


Re: Apache Solr.

2014-02-03 Thread Jack Krupansky
PDF files can be directly imported into Solr using Solr Cell (AKA 
ExtractingRequestHandler).


See:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika

Internally, Solr Cell uses Tika, which in turn uses PDFBox.

-- Jack Krupansky

-Original Message- 
From: Alexei Martchenko

Sent: Monday, February 3, 2014 8:04 AM
To: solr-user@lucene.apache.org
Subject: Re: Apache Solr.

That's right, Solr doesn't import PDFs as it imports XMLs. You'll need to
use Tikka to import binary/specific file types.

http://tika.apache.org/1.4/formats.html


alexei martchenko
Facebook  |
Linkedin|
Steam  |
4sq| Skype: alexeiramone |
Github  | (11) 9 7613.0966 |


2014-02-03 Siegfried Goeschl :


Hi Vignesh,

a few keywords for further investigations

* Solr Data Import Handler
* Apache Tikka
* Apache PDFBox

Cheers,

Siegfried Goeschl


On 03.02.14 09:15, vignesh wrote:


Hi Team,



 I am Vignesh, am using Apache Solr 3.6 and able to
Index
XML file and now trying to Index PDF file and not able to index .Can you
give me the steps to carry out PDF indexing it will be very useful. 
Kindly

guide me through this process.





Thanks & Regards.

Vignesh.V



cid:image001.jpg@01CA4872.39B33D40

Ninestars Information Technologies Limited.,

72, Greams Road, Thousand Lights, Chennai - 600 006. India.

Landline : +91 44 2829 4226 / 36 / 56   X: 144

  http://www.ninestars.in/> www.ninestars.in




--

30 Million Advertisements displayed. Is yours there?
http://www.safentrixads.com/adlink?cid=13
--








Re: Apache Solr.

2014-02-03 Thread Alexei Martchenko
That's right, Solr doesn't import PDFs as it imports XMLs. You'll need to
use Tikka to import binary/specific file types.

http://tika.apache.org/1.4/formats.html


alexei martchenko
Facebook  |
Linkedin|
Steam  |
4sq| Skype: alexeiramone |
Github  | (11) 9 7613.0966 |


2014-02-03 Siegfried Goeschl :

> Hi Vignesh,
>
> a few keywords for further investigations
>
> * Solr Data Import Handler
> * Apache Tikka
> * Apache PDFBox
>
> Cheers,
>
> Siegfried Goeschl
>
>
> On 03.02.14 09:15, vignesh wrote:
>
>> Hi Team,
>>
>>
>>
>>  I am Vignesh, am using Apache Solr 3.6 and able to
>> Index
>> XML file and now trying to Index PDF file and not able to index .Can you
>> give me the steps to carry out PDF indexing it will be very useful. Kindly
>> guide me through this process.
>>
>>
>>
>>
>>
>> Thanks & Regards.
>>
>> Vignesh.V
>>
>>
>>
>> cid:image001.jpg@01CA4872.39B33D40
>>
>> Ninestars Information Technologies Limited.,
>>
>> 72, Greams Road, Thousand Lights, Chennai - 600 006. India.
>>
>> Landline : +91 44 2829 4226 / 36 / 56   X: 144
>>
>>   http://www.ninestars.in/> www.ninestars.in
>>
>>
>>
>>
>> --
>>
>> 30 Million Advertisements displayed. Is yours there?
>> http://www.safentrixads.com/adlink?cid=13
>> --
>>
>>
>


Re: Import data from mysql to sold

2014-02-03 Thread Alexei Martchenko
I've been using DIH to import large Databases to XML file batches and It's
blazing fast.


alexei martchenko
Facebook  |
Linkedin|
Steam  |
4sq| Skype: alexeiramone |
Github  | (11) 9 7613.0966 |


2014-02-03 rachun :

> Dear all gurus,
>
> I would like to import my data (mysql) about 4 Million rows  into solar
> 4.6.
> What is the best way to do it?
>
> Please suggest me.
>
> Million thanks,
> Chun.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Import-data-from-mysql-to-sold-tp4114982.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Geospatial clustering + zoom in/out help

2014-02-03 Thread Bojan Šmid
Hi David,

  I was hoping to get an answer on Geospatial topic from you :). These
links basically confirm that approach I wanted to take should work ok with
similar (or even bigger) amount of data than I plan to have. Instead of my
custom NxM division of world, I'll try existing GeoHash encoding, it may be
good enough (and will be quicker to implement).

  Thanks!

  Bojan


On Fri, Jan 31, 2014 at 8:27 PM, Smiley, David W.  wrote:

> Hi Bojan.
>
> You've got some good ideas here along the lines of some that others have
> tried.  I've through together a page on the wiki about this subject some
> time ago that I'm sure you will find interesting.  It references a relevant
> stack-overflow post, and also a presentation at DrupalCon which had a
> segment from a guy using the same approach you suggest here involving
> field-collapsing and/or stats components.  The video shows it in action.
>
> http://wiki.apache.org/solr/SpatialClustering
>
> It would be helpful for everyone if you share your experience with
> whatever you choose, once you give an approach a try.
>
> ~ David
> 
> From: Bojan Šmid [bos...@gmail.com]
> Sent: Thursday, January 30, 2014 1:15 PM
> To: solr-user@lucene.apache.org
> Subject: Geospatial clustering + zoom in/out help
>
> Hi,
>
> I have an index with 300K docs with lat,lon. I need to cluster the docs
> based on lat,lon for display in the UI. The user then needs to be able to
> click on any cluster and zoom in (up to 11 levels deep).
>
> I'm using Solr 4.6 and I'm wondering how best to implement this
> efficiently?
>
> A bit more specific questions below.
>
> I need to:
>
> 1) cluster data points at different zoom levels
>
> 2) click on a specific cluster and zoom in
>
> 3) be able to select a region (bounding box or polygon) and show clusters
> in the selected area
>
> What's the best way to implement this so that queries are fast?
>
> What I thought I would try, but maybe there are better ways:
>
> * divide the world in NxM large squares and then each of these squares into
> 4 more squares, and so on - 11 levels deep
>
> * at index time figure out all squares (at all 11 levels) each data point
> belongs to and index that info into 11 different fields: e.g.
>  zoom3=square1_62_47_33 >
>
> * at search time, use field collapsing on zoomX field to get which docs
> belong to which square on particular level
>
> * calculate center point of each square (by calculating mean value of
> positions for all points in that square) using StatsComponent (facet on
> zoomX field, avg on lat and lon fields) - I would consider those squares as
> separate clusters (one square is one cluster) and center points of those
> squares as center points of clusters derived from them
>
> I *think* the problem with this approach is that:
>
> * there will be many unique fields for bigger zoom levels, which means
> field collapsing / StatsComponent maaay not work fast enough
>
> * clusters will not look very natural because I would have many clusters on
> each zoom level and what are "real" geographical clusters would be
> displayed as multiple clusters since their points would in some cases be
> dispersed into multiple squares. But that may be OK
>
> * a lot will depend on how the squares are calculated - linearly dividing
> 360 degrees by N to get "equal" size squares in degrees would produce
> issues with "real" square sizes and counts of points in each of them
>
>
> So I'm wondering if there is a better way?
>
> Thanks,
>
>
>   Bojan
>


Score of Search Term for every character remove

2014-02-03 Thread Lusung, Abner
Hi,

I'm new with using SOLR and I'm curious if this is capable of doing the 
following or similar.

Sample:
Query: "ABCDEF"

Returns:
ABCDEF > 0 hits
ABCDE > 2 hits
ABCD > 3 hits
ABC > 10 hits
AB > 20 hits
A > 100 hits

In one request only.

Thanks.

Abner G. Lusung Jr.| Java Web Development, Internet and Commerce, Global Web 
Services  | Vishay Philippines Inc.
10th Floor Pacific Star Building, Makati Avenue corner Buendia Avenue, Makati 
City, Philippines 1200
Phone: +63 2 8387421 loc. 7995 | Mobile: +63 9169674514
Website : www.vishay.com

[Vishay]




weird exception on update

2014-02-03 Thread Dmitry Kan
Hello!

We are hitting a really strange and nasty issue when trying to delete by
query and not when just adding documents. The exception says:
http://pastebin.com/B1x5dAF7

Any ideas as to what is going on?

The delete by query is referencing the unique field. The core's index does
not contain the value that is being deleted.
Solr: 4.3.1.


-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan


Re: Solr and SDL Tridion Integration

2014-02-03 Thread Jack Krupansky
If SDL Tridion can export to CSV format, Solr can then import from CSV 
format.


Otherwise, you may have to write a custom script or even maybe Java code to 
read from SDL Tridion and output a supported Solr format, such as Solr XML, 
Solr JSON, or CSV.


-- Jack Krupansky

-Original Message- 
From: Prasi S

Sent: Monday, February 3, 2014 4:16 AM
To: solr-user@lucene.apache.org
Subject: Solr and SDL Tridion Integration

Hi,
I want to index sdl tridion content to solr. Can you suggest how this can
be achieved. Is there any document/tutorial for this? Thanks

Thanks,
Prasi 



Special NGRAMish requirement

2014-02-03 Thread Lochschmied, Alexander
Hi,

we need to use something very similar to EdgeNGram (minGramSize="1" 
maxGramSize="50" side="front").
The only thing missing is that we would like to reduce the number of matches. 
The request we need to implement is returning only those matches with the 
longest tokens (or terms if that is the right word).

Is there a way to do this in Solr (not necessarily with EdgeNGram)?

Thanks,
Alexander


Re: Apache Solr.

2014-02-03 Thread Siegfried Goeschl

Hi Vignesh,

a few keywords for further investigations

* Solr Data Import Handler
* Apache Tikka
* Apache PDFBox

Cheers,

Siegfried Goeschl

On 03.02.14 09:15, vignesh wrote:

Hi Team,



 I am Vignesh, am using Apache Solr 3.6 and able to Index
XML file and now trying to Index PDF file and not able to index .Can you
give me the steps to carry out PDF indexing it will be very useful. Kindly
guide me through this process.





Thanks & Regards.

Vignesh.V



cid:image001.jpg@01CA4872.39B33D40

Ninestars Information Technologies Limited.,

72, Greams Road, Thousand Lights, Chennai - 600 006. India.

Landline : +91 44 2829 4226 / 36 / 56   X: 144

  http://www.ninestars.in/> www.ninestars.in




--
30 Million Advertisements displayed. Is yours there?
http://www.safentrixads.com/adlink?cid=13
--





Re: Solr and SDL Tridion Integration

2014-02-03 Thread Alexandre Rafalovitch
This is a new one.

You may want to start from Tridion's list and ask about API, export or
any other ways to get to the data. Then come back with more specific
question once you know what it looks like and granularity of update
(hook on document change vs. full export only).

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Mon, Feb 3, 2014 at 4:16 PM, Prasi S  wrote:
> Hi,
> I want to index sdl tridion content to solr. Can you suggest how this can
> be achieved. Is there any document/tutorial for this? Thanks
>
> Thanks,
> Prasi


Fwd: Need help for integrating solr-4.5.1 with UIMA

2014-02-03 Thread rashi gandhi
Hi,



I'm trying to integrate Solr 4.5.1 with UIMA and following the steps of the
solr-4.5.1\contrib\uima\readme.txt.

Edited the solrconfig.xml as given in readme.txt. Also I have registered
the required keys.



But each time when I am indexing data , solr returns error:



Feb 3, 2014 2:04:32 PM
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl
callAnalysisComponentProcess(405)

SEVERE: Exception occurred

org.apache.uima.analysis_engine.AnalysisEngineProcessException

at
org.apache.uima.annotator.calais.OpenCalaisAnnotator.process(OpenCalaisAnnotator.java:206)

at
org.apache.uima.analysis_component.CasAnnotator_ImplBase.process(CasAnnotator_ImplBase.java:56)

at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)

at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295)

at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567)

at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.(ASB_impl.java:409)

at
org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342)

at
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)

at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)

at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:280)

at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processText(UIMAUpdateRequestProcessor.java:173)

at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:79)

at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247)

at
org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)

at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)

at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)

at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)

at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)

at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)

at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)

at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)

at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)

at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)

at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)

at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)

at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)

at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1008)

at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)

at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)

at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:619)

Caused by: java.net.ConnectException: Connection timed out: connect

at java.net.PlainSocketImpl.socketConnect(Native Method)

at
java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)

at
java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)

at
java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)

at
java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)

at java.net.Socket.connect(Socket.java:529)

at java.net.Socket.connect(Socket.j

Solr and SDL Tridion Integration

2014-02-03 Thread Prasi S
Hi,
I want to index sdl tridion content to solr. Can you suggest how this can
be achieved. Is there any document/tutorial for this? Thanks

Thanks,
Prasi


Apache Solr.

2014-02-03 Thread vignesh
Hi Team,

 

I am Vignesh, am using Apache Solr 3.6 and able to Index
XML file and now trying to Index PDF file and not able to index .Can you
give me the steps to carry out PDF indexing it will be very useful. Kindly
guide me through this process.

 

 

Thanks & Regards.

Vignesh.V

 

cid:image001.jpg@01CA4872.39B33D40

Ninestars Information Technologies Limited.,

72, Greams Road, Thousand Lights, Chennai - 600 006. India.

Landline : +91 44 2829 4226 / 36 / 56   X: 144

 http://www.ninestars.in/> www.ninestars.in 

 


--
30 Million Advertisements displayed. Is yours there?
http://www.safentrixads.com/adlink?cid=13
--

Re: Clone (or Restore) Solrcloud

2014-02-03 Thread Shalin Shekhar Mangar
Hi David,

The parent metadata persists only until the sub-shards become active.
Actually the logic to make the sub-shards active depends on knowing
when all 'sibling' sub-shards' replicas have recovered successfully.
We store the parent to make that easier to look up. Once all replicas
of all sub-shards have recovered, the shard states are updated. The
'updateshardstate' command also removes the 'parent' key from the
sub-shards while switching them to 'active'.

If you're seeing the 'parent' key on a 'active' sub-shard then it may
be a bug. Please paste your clusterstate and I'll look into why it was
left over.

On Mon, Feb 3, 2014 at 10:19 AM, David Smiley (@MITRE.org)
 wrote:
> I think I figured this out; I hope people find this useful..
>
> It may not be possible to declare what the hash ranges are when you create
> the collection, but you *can* do so when you split via the 'ranges'
> parameter, which is a comma-delimited list. So this means you can create a
> new collection with one shard and then immediately split it to the desired
> ranges to line up with that of your backup.  I also observed that if you
> create a collection and then split every shard (in 2), it will result in an
> equivalent collection to one that was created with twice as many shards to
> begin with.  I hoped that was so and verified the ranges end up being the
> same both ways.
>
> The only thing that seems like it may be benign but not 100% certain is that
> if you split a shard, the new shards have a 'parent' reference to the name
> of the shard it was split from.  And even if you delete that parent shard
> (since it's not needed anymore; it becomes inactive).  I'm not sure why this
> metadata is recorded because, at least after the split, I can't see why it's
> pertinent to anything.
>
> ~ David
>
>
> David Smiley (@MITRE.org) wrote
>> Hi,
>>
>> I'm attempting to come up with a SolrCloud restore / clone process for
>> either recover to a known good state or to clone the environment for
>> experimentation.  At the moment my process involves either creating a new
>> zookeeper environment or at least deleting the existing Collection so that
>> I can create a new one.  This works; I use the Core API; the first command
>> defines the collection parameters, and I invoke it once for each replica.
>> I don't use the Collection API because I want SolrCloud to go off trying
>> to create all the replicas -- I know where each one is pre-positioned.
>>
>> What I'm concerned about is what happens once I start wanting to use Shard
>> splitting, *especially* if I don't want to split all shards because shards
>> are uneven due to custom routing (e.g. id:"customer!myid").  In this case
>> I don't know how to create the collection with the hash ranges post-shard
>> split.  Solr doesn't have an API for me to explicitly say what the hash
>> ranges should be on each shard (to match up with a backup).  And I'm
>> concerned about undocumented pitfalls that may exist in manually
>> constructing a clusterstate.json, as another approach.
>>
>> Any ideas?
>>
>> ~ David
>
>
>
>
>
> -
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Clone-or-Restore-Solrcloud-tp4114773p4114983.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Regards,
Shalin Shekhar Mangar.