how to integrate solr with HDFS HA

2013-08-22 Thread YouPeng Yang
Hi all
I try to integrate solr with HDFS HA.When I start the solr server, it
comes out an exeception[1].
And I do know this is because the hadoop.conf.Configuration  in
HdfsDirectoryFactory.java does not include the HA configuration.
So I want to know ,in solr,is there any way to include my hadoop  HA
configuration ?


[1]---
Caused by: java.lang.IllegalArgumentException:
java.net.UnknownHostException: lklcluster
at
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
at
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:164)
at
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:415)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:382)
at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:123)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2277)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:86)
at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2311)
at org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:2299)
at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:364)
at
org.apache.solr.store.hdfs.HdfsDirectory.(HdfsDirectory.java:59)
at
org.apache.solr.core.HdfsDirectoryFactory.create(HdfsDirectoryFactory.java:154)
at
org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:350)
at org.apache.solr.core.SolrCore.getNewIndexDir(SolrCore.java:256)
at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:469)
at org.apache.solr.core.SolrCore.(SolrCore.java:759)


Re: Geo spatial clustering of points

2013-08-22 Thread David Smiley (@MITRE.org)
Hi Chris & Jeroen,

Tonight I posted some tips on Solr's wiki on this subject:
http://wiki.apache.org/solr/SpatialClustering

~ David


Chris Atkinson wrote
> Did you get any resolution for this? I'm about to implement something
> identical.
> On 3 Jul 2013 23:03, "Jeroen Steggink" <

> jeroen@

> > wrote:
> 
>> Hi,
>>
>> I'm looking for a way to clustering (or should I call it group) geo
>> spatial points on map based on the current zoom level and get the median
>> coordinate for that cluster.
>> Let's say I'm on the world level, and I want to cluster spatial points
>> within a 1000 km radius. When I zoom in I only want to get the clustered
>> points for that boundary. Let's say all the points within the US and
>> cluster them within a 500 km radius.
>>
>> I'm using Solr 4.3.0 and looked into
>> SpatialRecursivePrefixTreeFiel**dType
>> with faceting. However, I'm not sure if the geohashes are of any use for
>> clustering points.
>>
>> Does anyone have any experience with geo spatial clustering with Solr?
>>
>> Regards,
>>
>> jeroen
>>
>>
>>





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Geo-spatial-clustering-of-points-tp4075315p4086243.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to avoid underscore sign indexing problem?

2013-08-22 Thread Steve Rowe
Dan,

StandardTokenizer implements the word boundary rules from the Unicode Text 
Segmentation standard annex UAX#29:

   http://www.unicode.org/reports/tr29/#Word_Boundaries

Every character sequence within UAX#29 boundaries that contains a numeric or an 
alphabetic character is emitted as a term, and nothing else is emitted.

Punctuation can be included within a term, e.g. "1,248.99" or "192.168.1.1".

To split on underscores, you can convert underscores to e.g. spaces by adding 
PatternReplaeCharFilterFactory to your analyzer:



This replacement will be performed prior to StandardTokenizer, which will then 
see token-splitting spaces instead of underscores.

Steve

On Aug 22, 2013, at 10:23 PM, Dan Davis  wrote:

> Ah, but what is the definition of punctuation in Solr?
> 
> 
> On Wed, Aug 21, 2013 at 11:15 PM, Jack Krupansky 
> wrote:
> 
>> "I thought that the StandardTokenizer always split on punctuation, "
>> 
>> Proving that you haven't read my book! The section on the standard
>> tokenizer details the rules that the tokenizer uses (in addition to
>> extensive examples.) That's what I mean by "deep dive."
>> 
>> -- Jack Krupansky
>> 
>> -Original Message- From: Shawn Heisey
>> Sent: Wednesday, August 21, 2013 10:41 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: How to avoid underscore sign indexing problem?
>> 
>> 
>> On 8/21/2013 7:54 PM, Floyd Wu wrote:
>> 
>>> When using StandardAnalyzer to tokenize string "Pacific_Rim" will get
>>> 
>>> ST
>>> textraw_**bytesstartendtypeposition
>>> pacific_rim[70 61 63 69 66 69 63 5f 72 69 6d]0111
>>> 
>>> How to make this string to be tokenized to these two tokens "Pacific",
>>> "Rim"?
>>> Set _ as stopword?
>>> Please kindly help on this.
>>> Many thanks.
>>> 
>> 
>> Interesting.  I thought that the StandardTokenizer always split on
>> punctuation, but apparently that's not the case for the underscore
>> character.
>> 
>> You can always use the WordDelimeterFilter after the StandardTokenizer.
>> 
>> http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s#solr.**
>> WordDelimiterFilterFactory
>> 
>> Thanks,
>> Shawn
>> 



Re: How to access latitude and longitude with only LatLonType?

2013-08-22 Thread David Smiley (@MITRE.org)
Hi Quan

You claim to be using LatLonType, yet the error you posted makes it clear
you are in fact using SpatialRecursivePrefixTreeFieldType (RPT).

Regardless of which spatial field you use, it's not clear to me what sort of
statistics could be useful on a spatial field.  The stats component doesn't
work with any of the spatial fields.  Well... it's possible to use
LatLonType and then do stats on just the latitude or just the longitude (you
should see the auto-generated fields for these in the online schema browser)
but that would unlikely be useful.

~ David


zhangquan913 wrote
> Hello All,
> 
> I am currently doing a spatial query in solr. I indexed "coordinates"
> (type="location" class="solr.LatLonType"), but the following query failed.
> http://localhost/solr/quan/select?q=*:*&stats=true&stats.field=coordinates&stats.facet=township&rows=0
> It showed an error:
> Field type
> location{class=org.apache.solr.schema.SpatialRecursivePrefixTreeFieldType,analyzer=org.apache.solr.schema.FieldType$DefaultAnalyzer,args={distErrPct=0.025,
> class=solr.SpatialRecursivePrefixTreeFieldType, maxDistErr=0.09,
> units=degrees}} is not currently supported
> 
> I don't want to create duplicate indexed field "latitude" and "longitude".
> How can I use only "coordinates" to do this kind of stats on both latitude
> and longitude?
> 
> Thanks,
> Quan





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-access-latitude-and-longitude-with-only-LatLonType-tp4086109p4086229.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Distance sort on a multi-value field

2013-08-22 Thread David Smiley (@MITRE.org)
Awesome!

Be sure to "watch" the JIRA issue as it develops.  The patch will improve
(I've already improved it but not posted it) and one day a solution is bound
to get committed.

~ David


Jeff Wartes wrote
> This is actually pretty far afield from my original subject, but it turns
> out that I also had issues  with NRT and multi-field geospatial
> performance in Solr 4, so I'll follow that up.
> 
> 
> I've been testing and working with David's SOLR-5170 patch ever since he
> posted it, and I pushed it into production with only some cosmetic changes
> a few hours ago. 
> I have a relatively low update and query rate for this particular query
> type, (something like 2 updates/sec, 10 queries/sec) but a short
> autosoftcommit time. (5 sec) Based on the data so far this patch looks
> like it's brought my average response time down from 4 seconds to about
> 50ms.
> 
> Very nice!
> 
> 
> 
> On 8/20/13 7:37 PM, "David Smiley (@MITRE.org)" <

> DSMILEY@

> > wrote:
> 
>>The distance sorting code in SOLR-2155 is roughly equivalent to the code
>>that
>>RPT uses (RPT has its lineage in SOLR-2155 after all).  I just reviewed it
>>to double-check.  It's possible the behavior is slightly better in
>>SOLR-2155
>>because the cache (a Solr cache) contains normal hard-references whereas
>>RPT
>>has one based on weak references, which will linger longer.  But I think
>>the
>>likelihood of OOM is the same.
>>
>>Any way, the current best option is
>>https://issues.apache.org/jira/browse/SOLR-5170  which I posted a few days
>>ago.
>>
>>~ David
>>
>>
>>Billnbell wrote
>>> We have been using 2155 for over 6 months in production with over 2M
>>>hits
>>> every 10 minutes. No OOM yet.
>>> 
>>> 2155 seems great, and would this issue be any worse than 2155?
>>> 
>>> 
>>> 
>>> On Wed, Aug 14, 2013 at 4:08 PM, Jeff Wartes <
>>
>>> jwartes@
>>
>>> > wrote:
>>> 

 Hm, "Give me all the stores that only have branches in this area" might
 be
 a plausible use case for farthest distance.
 That's essentially a "contains" question though, so maybe that's
already
 supported? I guess it depends on how contains/intersects/etc handle
 multi-values. I feel like multi-value interaction really deserves its
own
 section in the documentation.


 I'm aware of the memory issue, but it seems like if you want sort
 multi-valued points, it's either this or try to pull in the 2155 patch.
 In
 general I'd rather go with the thing that's being maintained.


 Thanks for the code pointer. You're right, that doesn't look like
 something I can easily use for more general aggregate scoring control.
Ah
 well.



 On 8/14/13 12:35 PM, "Smiley, David W." <
>>
>>> dsmiley@
>>
>>> > wrote:

 >
 >
 >On 8/14/13 2:26 PM, "Jeff Wartes" <
>>
>>> jwartes@
>>
>>> > wrote:
 >
 >>
 >>I'm still pondering aggregate-type operations for scoring
multi-valued
 >>fields (original thread: http://goo.gl/zOX53f ), and it occurred to
me
 >>that distance-sort with SpatialRecursivePrefixTreeFieldType must be
 doing
 >>something like that.
 >
 >It isn't.
 >
 >>
 >>Somewhat surprisingly I don't see this in the documentation anywhere,
 but
 >>I presume the example query: (from:
 >>http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4)
 >>"q={!geofilt score=distance sfield=geo pt=54.729696,-98.525391 d=10}"
 >>
 >>assigns the distance/score based on the *closest* lat/long if the
 sfield
 >>is a multi-valued field.
 >
 >Yes it does.
 >
 >>
 >>That's a reasonable default, but it's a bit arbitrary. Can I sort
based
 >>on
 >>the *furthest* lat/long in the document? Or the average distance?
 >>
 >>Anyone know more about how this works and could give me some
pointers?
 >
 >I considered briefly supporting the farthest distance but dismissed it
 as
 >I saw no real use-case.  I didn't think of the average distance;
that's
 >plausible.  Any way, you're best bet is to dig into the code.  The
 >relevant part is ShapeFieldCacheDistanceValueSource.
 >
 >FYI something to keep in mind:
 >https://issues.apache.org/jira/browse/LUCENE-4698
 >
 >~ David
 >


>>> 
>>> 
>>> -- 
>>> Bill Bell
>>
>>> billnbell@
>>
>>> cell 720-256-8076
>>
>>
>>
>>
>>
>>-
>> Author: 
>>http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
>>--
>>View this message in context:
>>http://lucene.472066.n3.nabble.com/Distance-sort-on-a-multi-value-field-tp
>>4084666p4085797.html
>>Sent from the Solr - User mailing list archive at Nabble.com.





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Distance-sort-on-a-multi-value-field-tp4084666p4086226.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Distance sort on a multi-value field

2013-08-22 Thread Jeff Wartes

This is actually pretty far afield from my original subject, but it turns
out that I also had issues  with NRT and multi-field geospatial
performance in Solr 4, so I'll follow that up.


I've been testing and working with David's SOLR-5170 patch ever since he
posted it, and I pushed it into production with only some cosmetic changes
a few hours ago. 
I have a relatively low update and query rate for this particular query
type, (something like 2 updates/sec, 10 queries/sec) but a short
autosoftcommit time. (5 sec) Based on the data so far this patch looks
like it's brought my average response time down from 4 seconds to about
50ms.

Very nice!



On 8/20/13 7:37 PM, "David Smiley (@MITRE.org)"  wrote:

>The distance sorting code in SOLR-2155 is roughly equivalent to the code
>that
>RPT uses (RPT has its lineage in SOLR-2155 after all).  I just reviewed it
>to double-check.  It's possible the behavior is slightly better in
>SOLR-2155
>because the cache (a Solr cache) contains normal hard-references whereas
>RPT
>has one based on weak references, which will linger longer.  But I think
>the
>likelihood of OOM is the same.
>
>Any way, the current best option is
>https://issues.apache.org/jira/browse/SOLR-5170  which I posted a few days
>ago.
>
>~ David
>
>
>Billnbell wrote
>> We have been using 2155 for over 6 months in production with over 2M
>>hits
>> every 10 minutes. No OOM yet.
>> 
>> 2155 seems great, and would this issue be any worse than 2155?
>> 
>> 
>> 
>> On Wed, Aug 14, 2013 at 4:08 PM, Jeff Wartes <
>
>> jwartes@
>
>> > wrote:
>> 
>>>
>>> Hm, "Give me all the stores that only have branches in this area" might
>>> be
>>> a plausible use case for farthest distance.
>>> That's essentially a "contains" question though, so maybe that's
>>>already
>>> supported? I guess it depends on how contains/intersects/etc handle
>>> multi-values. I feel like multi-value interaction really deserves its
>>>own
>>> section in the documentation.
>>>
>>>
>>> I'm aware of the memory issue, but it seems like if you want sort
>>> multi-valued points, it's either this or try to pull in the 2155 patch.
>>> In
>>> general I'd rather go with the thing that's being maintained.
>>>
>>>
>>> Thanks for the code pointer. You're right, that doesn't look like
>>> something I can easily use for more general aggregate scoring control.
>>>Ah
>>> well.
>>>
>>>
>>>
>>> On 8/14/13 12:35 PM, "Smiley, David W." <
>
>> dsmiley@
>
>> > wrote:
>>>
>>> >
>>> >
>>> >On 8/14/13 2:26 PM, "Jeff Wartes" <
>
>> jwartes@
>
>> > wrote:
>>> >
>>> >>
>>> >>I'm still pondering aggregate-type operations for scoring
>>>multi-valued
>>> >>fields (original thread: http://goo.gl/zOX53f ), and it occurred to
>>>me
>>> >>that distance-sort with SpatialRecursivePrefixTreeFieldType must be
>>> doing
>>> >>something like that.
>>> >
>>> >It isn't.
>>> >
>>> >>
>>> >>Somewhat surprisingly I don't see this in the documentation anywhere,
>>> but
>>> >>I presume the example query: (from:
>>> >>http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4)
>>> >>"q={!geofilt score=distance sfield=geo pt=54.729696,-98.525391 d=10}"
>>> >>
>>> >>assigns the distance/score based on the *closest* lat/long if the
>>> sfield
>>> >>is a multi-valued field.
>>> >
>>> >Yes it does.
>>> >
>>> >>
>>> >>That's a reasonable default, but it's a bit arbitrary. Can I sort
>>>based
>>> >>on
>>> >>the *furthest* lat/long in the document? Or the average distance?
>>> >>
>>> >>Anyone know more about how this works and could give me some
>>>pointers?
>>> >
>>> >I considered briefly supporting the farthest distance but dismissed it
>>> as
>>> >I saw no real use-case.  I didn't think of the average distance;
>>>that's
>>> >plausible.  Any way, you're best bet is to dig into the code.  The
>>> >relevant part is ShapeFieldCacheDistanceValueSource.
>>> >
>>> >FYI something to keep in mind:
>>> >https://issues.apache.org/jira/browse/LUCENE-4698
>>> >
>>> >~ David
>>> >
>>>
>>>
>> 
>> 
>> -- 
>> Bill Bell
>
>> billnbell@
>
>> cell 720-256-8076
>
>
>
>
>
>-
> Author: 
>http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/Distance-sort-on-a-multi-value-field-tp
>4084666p4085797.html
>Sent from the Solr - User mailing list archive at Nabble.com.



Re: Removing duplicates during a query

2013-08-22 Thread Dan Davis
OK - I see that this can be done with Field Collapsing/Grouping.  I also
see the mentions in the Wiki for avoiding duplicates using a 16-byte hash.

So, question withdrawn...


On Thu, Aug 22, 2013 at 10:21 PM, Dan Davis  wrote:

> Suppose I have two documents with different id, and there is another
> field, for instance "content-hash" which is something like a 16-byte hash
> of the content.
>
> Can Solr be configured to return just one copy, and drop the other if both
> are relevant?
>
> If Solr does drop one result, do you get any indication in the document
> that was kept that there was another copy?
>
>


Re: More on topic of Meta-search/Federated Search with Solr

2013-08-22 Thread Dan Davis
You are right, but here's my null hypothesis for studying the impact on
relevance.Hash the query to deterministically seed random number
generator.Pick one from column A or column B randomly.

This is of course wrong - a query might find two non-relevant results in
corpus A and lots of relevant results in corpus B, leading to poor
precision because the two non-relevant documents are likely to show up on
the first page.   You can weight on the size of the corpus, but weighting
is probably wrong then on any specifc query.

It was an interesting thought experiment though.

Erik,

Since LucidWorks was dinged in the 2013 Magic Quadrant on Enterprise Search
due to a lack of "Federated Search", the for-profit Enterprise Search
companies must be doing it some way.Maybe relevance suffers (a lot),
but you can do it if you want to.

I have read very little of the IR literature - enough to sound like I know
a little, but it is a very little.  If there is literature on this, it
would be an interesting read.


On Sun, Aug 18, 2013 at 3:14 PM, Erick Erickson wrote:

> The lack of global TF/IDF has been answered in the past,
> in the sharded case, by "usually you have similar enough
> stats that it doesn't matter". This pre-supposes a fairly
> evenly distributed set of documents.
>
> But if you're talking about federated search across different
> types of documents, then what would you "rescore" with?
> How would you even consider scoring docs that are somewhat/
> totally different? Think magazine articles an meta-data associated
> with pictures.
>
> What I've usually found is that one can use grouping to show
> the top N of a variety of results. Or show tabs with different
> types. Or have the app intelligently combine the different types
> of documents in a way that "makes sense". But I don't know
> how you'd just get "the right thing" to happen with some kind
> of scoring magic.
>
> Best
> Erick
>
>
> On Fri, Aug 16, 2013 at 4:07 PM, Dan Davis  wrote:
>
>> I've thought about it, and I have no time to really do a meta-search
>> during
>> evaluation.  What I need to do is to create a single core that contains
>> both of my data sets, and then describe the architecture that would be
>> required to do blended results, with liberal estimates.
>>
>> From the perspective of evaluation, I need to understand whether any of
>> the
>> solutions to better ranking in the absence of global IDF have been
>> explored?I suspect that one could retrieve a much larger than N set of
>> results from a set of shards, re-score in some way that doesn't require
>> IDF, e.g. storing both results in the same priority queue and *re-scoring*
>> before *re-ranking*.
>>
>> The other way to do this would be to have a custom SearchHandler that
>> works
>> differently - it performs the query, retries all results deemed relevant
>> by
>> another engine, adds them to the Lucene index, and then performs the query
>> again in the standard way.   This would be quite slow, but perhaps useful
>> as a way to evaluate my method.
>>
>> I still welcome any suggestions on how such a SearchHandler could be
>> implemented.
>>
>
>


Re: How to avoid underscore sign indexing problem?

2013-08-22 Thread Dan Davis
Ah, but what is the definition of punctuation in Solr?


On Wed, Aug 21, 2013 at 11:15 PM, Jack Krupansky wrote:

> "I thought that the StandardTokenizer always split on punctuation, "
>
> Proving that you haven't read my book! The section on the standard
> tokenizer details the rules that the tokenizer uses (in addition to
> extensive examples.) That's what I mean by "deep dive."
>
> -- Jack Krupansky
>
> -Original Message- From: Shawn Heisey
> Sent: Wednesday, August 21, 2013 10:41 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to avoid underscore sign indexing problem?
>
>
> On 8/21/2013 7:54 PM, Floyd Wu wrote:
>
>> When using StandardAnalyzer to tokenize string "Pacific_Rim" will get
>>
>> ST
>> textraw_**bytesstartendtypeposition
>> pacific_rim[70 61 63 69 66 69 63 5f 72 69 6d]0111
>>
>> How to make this string to be tokenized to these two tokens "Pacific",
>> "Rim"?
>> Set _ as stopword?
>> Please kindly help on this.
>> Many thanks.
>>
>
> Interesting.  I thought that the StandardTokenizer always split on
> punctuation, but apparently that's not the case for the underscore
> character.
>
> You can always use the WordDelimeterFilter after the StandardTokenizer.
>
> http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s#solr.**
> WordDelimiterFilterFactory
>
> Thanks,
> Shawn
>


Re: How to avoid underscore sign indexing problem?

2013-08-22 Thread Floyd Wu
Alright, thanks for all your help. I finally fix this problem using
PatternReplaceFilterFactory + WordDelimeterfilterFactory.

I first replace _ (underscore) using PatternReplaceFilterFactory and then
using WordDelimeterFilterFactory to generate word and number part to
increase user search hit. Although this decrease search quality a little,
but user need higher recall rate than precision.

Thank you all.

Floyd





2013/8/22 Floyd Wu 

> After trying some search case and different params combination of
> WordDelimeter. I wonder what is the best strategy to index string
> "2DA012_ISO MARK 2" and can be search by term "2DA012"?
>
> What if I just want _ to be removed both query/index time, what and how to
> configure?
>
> Floyd
>
>
>
> 2013/8/22 Floyd Wu 
>
>> Thank you all.
>> By the way, Jack I gonna by your book. Where to buy?
>> Floyd
>>
>>
>> 2013/8/22 Jack Krupansky 
>>
>>> "I thought that the StandardTokenizer always split on punctuation, "
>>>
>>> Proving that you haven't read my book! The section on the standard
>>> tokenizer details the rules that the tokenizer uses (in addition to
>>> extensive examples.) That's what I mean by "deep dive."
>>>
>>> -- Jack Krupansky
>>>
>>> -Original Message- From: Shawn Heisey
>>> Sent: Wednesday, August 21, 2013 10:41 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: How to avoid underscore sign indexing problem?
>>>
>>>
>>> On 8/21/2013 7:54 PM, Floyd Wu wrote:
>>>
 When using StandardAnalyzer to tokenize string "Pacific_Rim" will get

 ST
 textraw_**bytesstartendtypeposition
 pacific_rim[70 61 63 69 66 69 63 5f 72 69 6d]0111

 How to make this string to be tokenized to these two tokens "Pacific",
 "Rim"?
 Set _ as stopword?
 Please kindly help on this.
 Many thanks.

>>>
>>> Interesting.  I thought that the StandardTokenizer always split on
>>> punctuation, but apparently that's not the case for the underscore
>>> character.
>>>
>>> You can always use the WordDelimeterFilter after the StandardTokenizer.
>>>
>>> http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s#solr.**
>>> WordDelimiterFilterFactory
>>>
>>> Thanks,
>>> Shawn
>>>
>>
>>
>


Removing duplicates during a query

2013-08-22 Thread Dan Davis
Suppose I have two documents with different id, and there is another field,
for instance "content-hash" which is something like a 16-byte hash of the
content.

Can Solr be configured to return just one copy, and drop the other if both
are relevant?

If Solr does drop one result, do you get any indication in the document
that was kept that there was another copy?


Re: Flushing cache without restarting everything?

2013-08-22 Thread Dan Davis
be careful with drop_caches - make sure you sync first


On Thu, Aug 22, 2013 at 1:28 PM, Jean-Sebastien Vachon <
jean-sebastien.vac...@wantedanalytics.com> wrote:

> I was afraid someone would tell me that... thanks for your input
>
> > -Original Message-
> > From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
> > Sent: August-22-13 9:56 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Flushing cache without restarting everything?
> >
> > On Tue, 2013-08-20 at 20:04 +0200, Jean-Sebastien Vachon wrote:
> > > Is there a way to flush the cache of all nodes in a Solr Cloud (by
> > > reloading all the cores, through the collection API, ...) without
> > > having to restart all nodes?
> >
> > As MMapDirectory shares data with the OS disk cache, flushing of
> > Solr-related caches on a machine should involve
> >
> > 1) Shut down all Solr instances on the machine
> > 2) Clear the OS read cache ('sudo echo 1 > /proc/sys/vm/drop_caches' on
> > a Linux box)
> > 3) Start the Solr instances
> >
> > I do not know of any Solr-supported way to do step 2. For our
> > performance tests we use custom scripts to perform the steps.
> >
> > - Toke Eskildsen, State and University Library, Denmark
> >
> >
> > -
> > Aucun virus trouvé dans ce message.
> > Analyse effectuée par AVG - www.avg.fr
> > Version: 2013.0.3392 / Base de données virale: 3209/6563 - Date:
> 09/08/2013
> > La Base de données des virus a expiré.
>


Re: Measuring SOLR performance

2013-08-22 Thread Roman Chyla
Hi Dmitry,
So it seems solrjmeter should not assume the adminPath - and perhaps needs
to be passed as an argument. When you set the adminPath, are you able to
access localhost:8983/solr/statements/admin/cores ?

roman


On Wed, Aug 21, 2013 at 7:36 AM, Dmitry Kan  wrote:

> Hi Roman,
>
> I have noticed a difference with different solr.xml config contents. It is
> probably legit, but thought to let you know (tests run on fresh checkout as
> of today).
>
> As mentioned before, I have two cores configured in solr.xml. If the file
> is:
>
> [code]
> 
>
>   
>hostPort="${jetty.port:8983}" hostContext="${hostContext:solr}">
> 
> 
>   
> 
> [/code]
>
> then the instruction:
>
> python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -q
> ./queries/demo/demo.queries -s localhost -p 8983 -a --durationInSecs 60 -R
> cms -t /solr/statements -e statements -U 100
>
> works just fine. If however the solr.xml has adminPath set to "/admin"
> solrjmeter produces an error:
>
> [error]
> **ERROR**
>   File "solrjmeter.py", line 1386, in 
> main(sys.argv)
>   File "solrjmeter.py", line 1278, in main
> check_prerequisities(options)
>   File "solrjmeter.py", line 375, in check_prerequisities
> error('Cannot find admin pages: %s, please report a bug' % apath)
>   File "solrjmeter.py", line 66, in error
> traceback.print_stack()
> Cannot find admin pages: http://localhost:8983/solr/admin, please report a
> bug
> [/error]
>
> With both solr.xml configs the following url returns just fine:
>
> http://localhost:8983/solr/statements/admin/system?wt=json
>
> Regards,
>
> Dmitry
>
>
>
> On Wed, Aug 14, 2013 at 2:03 PM, Dmitry Kan  wrote:
>
> > Hi Roman,
> >
> > This looks much better, thanks! The ordinary non-comarison mode works.
> > I'll post here, if there are other findings.
> >
> > Thanks for quick turnarounds,
> >
> > Dmitry
> >
> >
> > On Wed, Aug 14, 2013 at 1:32 AM, Roman Chyla  >wrote:
> >
> >> Hi Dmitry, oh yes, late night fixes... :) The latest commit should make
> it
> >> work for you.
> >> Thanks!
> >>
> >> roman
> >>
> >>
> >> On Tue, Aug 13, 2013 at 3:37 AM, Dmitry Kan 
> wrote:
> >>
> >> > Hi Roman,
> >> >
> >> > Something bad happened in fresh checkout:
> >> >
> >> > python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -q
> >> > ./queries/demo/demo.queries -s localhost -p 8983 -a --durationInSecs
> 60
> >> -R
> >> > cms -t /solr/statements -e statements -U 100
> >> >
> >> > Traceback (most recent call last):
> >> >   File "solrjmeter.py", line 1392, in 
> >> > main(sys.argv)
> >> >   File "solrjmeter.py", line 1347, in main
> >> > save_into_file('before-test.json', simplejson.dumps(before_test))
> >> >   File "/usr/lib/python2.7/dist-packages/simplejson/__init__.py", line
> >> 286,
> >> > in dumps
> >> > return _default_encoder.encode(obj)
> >> >   File "/usr/lib/python2.7/dist-packages/simplejson/encoder.py", line
> >> 226,
> >> > in encode
> >> > chunks = self.iterencode(o, _one_shot=True)
> >> >   File "/usr/lib/python2.7/dist-packages/simplejson/encoder.py", line
> >> 296,
> >> > in iterencode
> >> > return _iterencode(o, 0)
> >> >   File "/usr/lib/python2.7/dist-packages/simplejson/encoder.py", line
> >> 202,
> >> > in default
> >> > raise TypeError(repr(o) + " is not JSON serializable")
> >> > TypeError: <__main__.ForgivingValue object at 0x7fc6d4040fd0> is not
> >> JSON
> >> > serializable
> >> >
> >> >
> >> > Regards,
> >> >
> >> > D.
> >> >
> >> >
> >> > On Tue, Aug 13, 2013 at 8:10 AM, Roman Chyla 
> >> > wrote:
> >> >
> >> > > Hi Dmitry,
> >> > >
> >> > >
> >> > >
> >> > > On Mon, Aug 12, 2013 at 9:36 AM, Dmitry Kan 
> >> > wrote:
> >> > >
> >> > > > Hi Roman,
> >> > > >
> >> > > > Good point. I managed to run the command with -C and double
> quotes:
> >> > > >
> >> > > > python solrjmeter.py -a -C "g1,cms" -c hour -x
> >> ./jmx/SolrQueryTest.jmx
> >> > > >
> >> > > > As a result got several files (html, css, js, csv) in the running
> >> > > directory
> >> > > > (any way to specify where the output should be stored in this
> case?)
> >> > > >
> >> > >
> >> > > i know it is confusing, i plan to change it - but later, now it is
> too
> >> > busy
> >> > > here...
> >> > >
> >> > >
> >> > > >
> >> > > > When I look onto the comparison dashboard, I see this:
> >> > > >
> >> > > > http://pbrd.co/17IRI0b
> >> > > >
> >> > >
> >> > > two things: the tests probably took more than one hour to finish, so
> >> they
> >> > > are not aligned - try generating the comparison with '-c  14400'
>  (ie.
> >> > > 4x3600 secs)
> >> > >
> >> > > the other thing: if you have only two datapoints, the dygraph will
> not
> >> > show
> >> > > anything - there must be more datapoints/measurements
> >> > >
> >> > >
> >> > >
> >> > > >
> >> > > > One more thing: all the previous tests were run with softCommit
> >> > disabled.
> >> > > > After enabling it, the tests started to fail:
> >> > > >
> >> > > > $ python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -q
> >> > > > ./queries/demo/demo.quer

custom names for replicas in solrcloud

2013-08-22 Thread smanad
Hi, 

I am using Solr 4.3 with 3 solr hosta and with an external zookeeper
ensemble of 3 servers. And just 1 shard currently.

When I create collections using collections api it creates collections with
names, 
collection1_shard1_replica1, collection1_shard1_replica2,
collection1_shard1_replica3.
Is there any way to pass a custom name? or can I have all the replicas with
same name?

Any pointers will be much appreciated. 
Thanks, 
-Manasi 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/custom-names-for-replicas-in-solrcloud-tp4086205.html
Sent from the Solr - User mailing list archive at Nabble.com.


SOLR search by external fields

2013-08-22 Thread Vinay B,
What we need is similar to what is discussed here, except not as a filter
but as an actual query:
http://lucene.472066.n3.nabble.com/filter-query-from-external-list-of-Solr-unique-IDs-td1709060.html

We'd like to implement a query parser/scorer that would allow us to combine
SOLR searches with searching external fields. This is due to the limitation
of having to update an entire document even though only a field in the
document needs to be updated.

For example we have a database table called document_attributes containing
two columns document_id, attribute_id. The document_id corresponds to the
ID of the documents indexed is SOLR.

We'd like to be able to pass in a query like:

attribute_id:123 OR text:some_query
(attribute_id:123 OR attribute_id:456) AND text:some_query
etc...

Can we implement a plugin/module in SOLR that's able to parse the above
query and then fetch the document_ids associated with the attribute_id and
combine the results with the normal processing of SOLR search to return one
set of results for the entire query.

We'd appreciate any guidance on how to implement this if it is possible.


Re: Storing query results

2013-08-22 Thread Ahmet Arslan
Hi jfeist,

Your mail reminds me this blog, not sure about solr though.

http://blog.mikemccandless.com/2011/11/searcherlifetimemanager-prevents-broken.html




 From: jfeist 
To: solr-user@lucene.apache.org 
Sent: Friday, August 23, 2013 12:09 AM
Subject: Storing query results
 

I am in the process of setting up a search application that allows the user
to view paginated query results.  The documents are highly dynamic but I
want the search results to be static, i.e. I don't want the user to click
the next page button, the query reruns, and now he has a different set of
search results because the data changed while he was looking through it.  I
want the results stored somewhere else and the successive page queries to
draw from that.  I know Solr has query result caching, but I want to store
it entirely.  Does Solr provide any functionality like this?  I imagine it
doesn't, because then you'd need to specify how long to store it, etc.  I'm
using Solr 4.4.0.  I found someone asking something similar  here
   but
that was 6 years ago.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Storing-query-results-tp4086182.html
Sent from the Solr - User mailing list archive at Nabble.com.

SOLR Prevent solr of modifying fields when update doc

2013-08-22 Thread Luís Portela Afonso
Hi,

How can i prevent solr from update some fields when updating a doc?
The problem is, i have an uuid with the field name uuid, but it is not an 
unique key. When a rss source updates a feed, solr will update the doc with the 
same link but it generates a new uuid. This is not the desired because this id 
is used by me to relate feeds with an user.

Can someone help me?

Many Thanks

smime.p7s
Description: S/MIME cryptographic signature


Storing query results

2013-08-22 Thread jfeist
I am in the process of setting up a search application that allows the user
to view paginated query results.  The documents are highly dynamic but I
want the search results to be static, i.e. I don't want the user to click
the next page button, the query reruns, and now he has a different set of
search results because the data changed while he was looking through it.  I
want the results stored somewhere else and the successive page queries to
draw from that.  I know Solr has query result caching, but I want to store
it entirely.  Does Solr provide any functionality like this?  I imagine it
doesn't, because then you'd need to specify how long to store it, etc.  I'm
using Solr 4.4.0.  I found someone asking something similar  here
   but
that was 6 years ago.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Storing-query-results-tp4086182.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to set discountOverlaps="true" in Solr 4x schema.xml

2013-08-22 Thread Tom Burton-West
I should have said that I have set it both to "true" and to "false" and
restarted Solr each time and the rankings and info in the debug query
showed no change.

Does this have to be set at index time?

Tom



>


Re: How to set discountOverlaps="true" in Solr 4x schema.xml

2013-08-22 Thread Tom Burton-West
Thanks Markus,

I set it , but it seems to make no difference in the score or statistics
listed in the debugQuery or in the ranking.   I'm using a field with
CommonGrams and a huge list of common words, so there should be a huge
difference in the document length with and without discountOverlaps.

Is the default for Solr 4 true?

 
  1.2
  0.75
false
  



On Thu, Aug 22, 2013 at 4:58 PM, Markus Jelsma
wrote:

> Hi Tom,
>
> Don't set it as attributes but as lists as Solr uses everywhere:
> 
>   true
> 
>
> For BM25 you can also set k1 and b which is very convenient!
>
> Cheers
>
>
> -Original message-
> > From:Tom Burton-West 
> > Sent: Thursday 22nd August 2013 22:42
> > To: solr-user@lucene.apache.org
> > Subject: How to set discountOverlaps="true" in Solr 4x
> schema.xml
> >
> > If I am using solr.SchemaSimilarityFactory to allow different
> similarities
> > for different fields, do I set "discountOverlaps="true" on the factory or
> > per field?
> >
> > What is the syntax?   The below does not seem to work
> >
> > 
> >  >  />
> >
> > Tom
> >
>


RE: How to set discountOverlaps="true" in Solr 4x schema.xml

2013-08-22 Thread Markus Jelsma
Hi Tom,

Don't set it as attributes but as lists as Solr uses everywhere:

  true


For BM25 you can also set k1 and b which is very convenient!

Cheers
 
 
-Original message-
> From:Tom Burton-West 
> Sent: Thursday 22nd August 2013 22:42
> To: solr-user@lucene.apache.org
> Subject: How to set discountOverlaps="true" in Solr 4x schema.xml
> 
> If I am using solr.SchemaSimilarityFactory to allow different similarities
> for different fields, do I set "discountOverlaps="true" on the factory or
> per field?
> 
> What is the syntax?   The below does not seem to work
> 
> 
>   />
> 
> Tom
> 


How to set discountOverlaps="true" in Solr 4x schema.xml

2013-08-22 Thread Tom Burton-West
If I am using solr.SchemaSimilarityFactory to allow different similarities
for different fields, do I set "discountOverlaps="true" on the factory or
per field?

What is the syntax?   The below does not seem to work




Tom


Re: Solr Ref guide question

2013-08-22 Thread Brendan Grainger
What version of solr are you using? Have you copied a solr.xml from
somewhere else? I can almost reproduce the error you're getting if I put a
non-existent core in my solr.xml, e.g.:



  

  
...


On Thu, Aug 22, 2013 at 1:30 PM, yriveiro  wrote:

> Hi all,
>
> I think that there is some lack in solr's ref doc.
>
> Section "Running Solr" says to run solr using the command:
>
> $ java -jar start.jar
>
> But If I do this with a fresh install, I have a stack trace like this:
> http://pastebin.com/5YRRccTx
>
> Is it this behavior as expected?
>
>
>
> -
> Best regards
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Ref-guide-question-tp4086142.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Brendan Grainger
www.kuripai.com


RE: updating docs in solr cloud hangs

2013-08-22 Thread Greg Walters
Thanks, Erick that's exactly the clarification/confirmation I was looking for!

Greg



Re: dataimporter tika fields empty

2013-08-22 Thread Alexandre Rafalovitch
Ah. That's because Tika processor does not support path extraction. You
need to nest one more level.

Regards,
  Alex
On 22 Aug 2013 13:34, "Andreas Owen"  wrote:

> i can do it like this but then the content isn't copied to text. it's just
> in text_test
>
>  url="${rec.path}${rec.file}" dataSource="dataUrl" >
> 
> 
> 
>
>
> On 22. Aug 2013, at 6:12 PM, Andreas Owen wrote:
>
> > i put it in the tika-entity as attribute, but it doesn't change
> anything. my bigger concern is why text_test isn't populated at all
> >
> > On 22. Aug 2013, at 5:27 PM, Alexandre Rafalovitch wrote:
> >
> >> Can you try SOLR-4530 switch:
> >> https://issues.apache.org/jira/browse/SOLR-4530
> >>
> >> Specifically, setting htmlMapper="identity" on the entity definition.
> This
> >> will tell Tika to send full HTML rather than a seriously stripped one.
> >>
> >> Regards,
> >> Alex.
> >>
> >> Personal website: http://www.outerthoughts.com/
> >> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> >> - Time is the quality of nature that keeps events from happening all at
> >> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
> >>
> >>
> >> On Thu, Aug 22, 2013 at 11:02 AM, Andreas Owen  wrote:
> >>
> >>> i'm trying to index a html page and only user the div with the
> >>> id="content". unfortunately nothing is working within the tika-entity,
> only
> >>> the standard text (content) is populated.
> >>>
> >>>   do i have to use copyField for test_text to get the data?
> >>>   or is there a problem with the entity-hirarchy?
> >>>   or is the xpath wrong, even though i've tried it without and just
> >>> using text?
> >>>   or should i use the updateextractor?
> >>>
> >>> data-config.xml:
> >>>
> >>> 
> >>>   
> >>>   
> >>>   http://127.0.0.1/tkb/internet/"; name="main"/>
> >>> 
> >>>>>> url="docImportUrl.xml" forEach="/docs/doc" dataSource="main">
> >>>   
> >>>   
> >>>   
> >>>   
> >>>   
> >>>   
> >>>
> >>>>>> url="${rec.path}${rec.file}" dataSource="dataUrl" >
> >>>   
> >>>>>> xpath="//div[@id='content']" />
> >>>   
> >>>   
> >>> 
> >>> 
> >>>
> >>> docImporterUrl.xml:
> >>>
> >>> 
> >>> 
> >>> 
> >>>   5
> >>>   tkb
> >>>   Startseite
> >>>   blabla ...
> >>>   http://localhost/tkb/internet/index.cfm
> >>>   http://localhost/tkb/internet/index.cfm/url
> >>>   http\specialConf
> >>>   
> >>>   
> >>>   6
> >>>   tkb
> >>>   Eigenheim
> >>>   Machen Sie sich erste Gedanken über den
> >>> Erwerb von Wohneigentum? Oder haben Sie bereits konkrete Pläne oder
> gar ein
> >>> spruchreifes Projekt? Wir beraten Sie gerne in allen Fragen rund um den
> >>> Erwerb oder Bau von Wohneigentum, damit Ihr Vorhaben auch in
> finanzieller
> >>> Hinsicht gelingt.
> >>>   
> >>> http://127.0.0.1/tkb/internet/private/beratung/eigenheim.htm
> >>>   
> >>> http://127.0.0.1/tkb/internet/private/beratung/eigenheim.htm/url
> >>>   
> >>> 
>
>


Re: How to SOLR file in svn repository

2013-08-22 Thread Alexandre Rafalovitch
I don't think you can go into production with that. But cloudera
distribution (with Hue) might be a similar or better option.

Regards,
Alex
On 22 Aug 2013 14:38, "Lance Norskog"  wrote:

> You need to:
> 1) crawl the SVN database
> 2) index the files
> 3) make a UI that fetches the original file when you click on a search
> results.
>
> Solr only has #2. If you run a subversion web browser app, you can
> download the developer-only version of the LucidWorks product and crawl the
> SVN web viewer. This will give you #1 and #3.
>
> Lance
>
> On 08/21/2013 09:00 AM, jiunarayan wrote:
>
>> I have a svn respository and svn file path. How can I SOLR search content
>> on
>> the svn file.
>>
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.**
>> nabble.com/How-to-SOLR-file-**in-svn-repository-tp4085904.**html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>


Re: Schema

2013-08-22 Thread Raymond Wiker
On Aug 22, 2013, at 19:53 , Kamaljeet Kaur  wrote:
> On Thu, Aug 22, 2013 at 10:56 PM, SolrLover [via Lucene]
>  wrote:
>> 
>> Now use DIH to get the data from MYSQL database in to SOLR..
>> 
>> http://wiki.apache.org/solr/DataImportHandler
> 
> 
> These are for versions 1.3, 1.4, 3.6 or 4.0.
> Why versions are mentioned there? Don't they work on solr 4.4.0?


Why don't you just try? 


Re: Schema

2013-08-22 Thread tamanjit.bin...@yahoo.co.in
Verisons mentioned in the wiki only tell you that these features are
available from that version of Solr. This will not be applicable in your
case as you are using the latest version. So everything you find in the wiki
would be available in 4.4 Solr



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-tp4086136p4086163.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: updating docs in solr cloud hangs

2013-08-22 Thread Erick Erickson
Right, it's a little arcane. But the lockup is because the
various leaders send documents to each other and wait
for returns. If there are a _lot_ of incoming packets to
various leaders, it can generate the distributed deadlock.
So the shuffling you refer to is the root of the issue.

If the leaders only receive documents for the shard they're
a leader of, then they won't have to send updates to other
leaders and shouldn't hit this condition.

But you're right, this situation was encountered the first time
by SolrJ clients sending lots and lots or parallel requests,
I don't remember whether it was just one client with lots of
threads or many clients. If you're not using SolrJ, then
it won't do you much good since it's client-side only.

As far as being a true fix or not, you can look at it as
kicking the can down the road. This patch has several
advantages:
1> It should pave the way for, and move towards,
linear scalability as far as scaling up to many
many nodes when indexing from SolrJ.
2> It should improve throughput in the normal case as well.
3> Along the way it _should_ significantly lower (perhaps
remove entirely) the chance that this deadlock will occur,
again when indexing from SolrJ.

If you had a bunch of clients sending, say, posting csv files
to SolrCloud I'd guess you'd find this happening again.

So it's an improvement not a perfect cure. But if you think
it'd help

Best,
Erick


On Thu, Aug 22, 2013 at 3:23 PM, allrightname wrote:

> Erick,
>
> I've read over SOLR-4816 after finding your comment about the server-side
> stack traces showing threads locked up over semaphores and I'm curious how
> that issue cures the problem on the server-side as the patch only includes
> client-side changes. Do the servers get so tied up shuffling documents
> around when they're not sent to the master that they get blocked as
> described? If they do get blocked due to shuffling documents around is a
> client-side fix for this not more of a workaround than a true fix?
>
> I'm entirely willing to apply this patch to all of the code I've got that
> talks to my solr servers and try it out but I'm reluctant to because this
> looks like a client-side fix to a server-side issue.
>
> Thanks,
> Greg
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-tp4067388p4086160.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: updating docs in solr cloud hangs

2013-08-22 Thread allrightname
Erick,

I've read over SOLR-4816 after finding your comment about the server-side
stack traces showing threads locked up over semaphores and I'm curious how
that issue cures the problem on the server-side as the patch only includes
client-side changes. Do the servers get so tied up shuffling documents
around when they're not sent to the master that they get blocked as
described? If they do get blocked due to shuffling documents around is a
client-side fix for this not more of a workaround than a true fix?

I'm entirely willing to apply this patch to all of the code I've got that
talks to my solr servers and try it out but I'm reluctant to because this
looks like a client-side fix to a server-side issue.

Thanks,
Greg



--
View this message in context: 
http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-tp4067388p4086160.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr cloud hash range set to null after recovery from index corruption

2013-08-22 Thread Rikke Willer

Hi,

I have a Solr cloud set up with 12 shards with 2 replicas each, divided on 6 
servers (each server hosting 4 cores). Solr version is 4.3.1.
Due to memory errors on one machine, 3 of its 4 indexes became corrupted. I 
unloaded the cores, repaired the indexes with the Lucene CheckIndex tool, and 
added the cores again.
Afterwards the Solr cloud hash range has been set to null for the shards with 
corrupt indexes.
Could anybody point me to why this has occured, and more importantly, how to 
set the range on the shards again?
Thank you.

Best,

Rikke


Re: Adding one core to an existing core?

2013-08-22 Thread Bruno Mannina

Thanks a lot !!!

Le 22/08/2013 16:23, Andrea Gazzarini a écrit :
First, a core is a separate index so it is completely indipendent from 
the already existing core(s). So basically you don't need to reindex.


In order to have two cores (but the same applies for n cores): you 
must have in your solr.home the file (solr.xml) described here


http://wiki.apache.org/solr/Solr.xml%20%28supported%20through%204.x%29

then, you must obviously have one or two directories (corresponding to 
the "instanceDir" attribute). I said one or two because if the indexes 
configuration is basically the same (or something changes but is 
dynamically configured - i.e. core name) you can create two instances 
starting from the same configuration. I mean



 
  
  
 


Otherwise you must have two different conf directories that contain 
indexes configuration. You should already have a first one (the 
current core), you just need to have another conf dir with 
solrconfig.xml, schema.xml and other required files. In this case each 
core will have its own instanceDir.



 
  
  
 


Best,
Andrea



On 08/22/2013 04:04 PM, Bruno Mannina wrote:

Little precision, I'm on Ubuntu 12.04LTS

Le 22/08/2013 15:56, Bruno Mannina a écrit :

Dear Users,

(Solr3.6 + Tomcat7)

I use since two years Solr with one core, I would like now to add 
one another core (a new database).


Can I do this without re-indexing my core1 ?
could you point me to a good tutorial to do that?

(my current database is around 200Go for 86 000 000 docs)
My new database will be little, around 1000 documents of 5ko each.

thanks a lot,
Bruno












Re: How to SOLR file in svn repository

2013-08-22 Thread Lance Norskog

You need to:
1) crawl the SVN database
2) index the files
3) make a UI that fetches the original file when you click on a search 
results.


Solr only has #2. If you run a subversion web browser app, you can 
download the developer-only version of the LucidWorks product and crawl 
the SVN web viewer. This will give you #1 and #3.


Lance

On 08/21/2013 09:00 AM, jiunarayan wrote:

I have a svn respository and svn file path. How can I SOLR search content on
the svn file.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-SOLR-file-in-svn-repository-tp4085904.html
Sent from the Solr - User mailing list archive at Nabble.com.




Highlighting and proximity search

2013-08-22 Thread geran
Hello, I am dealing with an issue of highlighting and so far the other posts
that I've read have not provided a solution.

When using proximity search ("coming soon"~10) I get some documents with no
highlights and some documents highlight these words even when they are not
in a 10 word proximity. 

Some more configuration details are below, any help is much appreciated. We
are running solr version 4.4.0.

Full example query:

hl.fragsize=0&hl.requireFieldMatch=true&sort=document_date_range+desc&hl.fragListBui
lder=single&hl.fragmentsBuilder=colored&hl=true&version=2.2&rows=80&hl.highlightMultiTerm=true&df=text&hl.useFastVectorHighlighter=true&start=0&q=(text:("coming+soon"~10))&hl.usePhraseHighligh
ter=true

Configuration of the field being queried:

 
  


  



  


  
  







  


Configuration of highlighter in solrconfig.xml

 
  


  
  

  
  

  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-and-proximity-search-tp4086152.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to SOLR file in svn repository

2013-08-22 Thread Walter Underwood
After you connect to Subversion, you'll need parsers for code, etc.

You might want to try Krugle instead, since they have already written all that 
stuff: http://krugle.org/

wunder

On Aug 22, 2013, at 10:43 AM, SolrLover wrote:

> I  don't think there's an SOLR- SVN connector available out of the box.
> 
> You can write a custom SOLRJ indexer program to get the necessary data from
> SVN (using JAVA API) and add the data to SOLR.
> 
> 



Re: Flushing cache without restarting everything?

2013-08-22 Thread Walter Underwood
We warm the file buffers before starting Solr to avoid spending time waiting 
for disk IO. The script is something like this:

for core in core1 core2 core3
do
find /apps/solr/data/${core}/index -type f | xargs cat > /dev/null
done

It makes a big difference in the first few minutes of service. Of course, it 
helps if you have enough RAM to hold the entire index.

wunder

On Aug 22, 2013, at 10:28 AM, Jean-Sebastien Vachon wrote:

> I was afraid someone would tell me that... thanks for your input
> 
>> -Original Message-
>> From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
>> Sent: August-22-13 9:56 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Flushing cache without restarting everything?
>> 
>> On Tue, 2013-08-20 at 20:04 +0200, Jean-Sebastien Vachon wrote:
>>> Is there a way to flush the cache of all nodes in a Solr Cloud (by
>>> reloading all the cores, through the collection API, ...) without
>>> having to restart all nodes?
>> 
>> As MMapDirectory shares data with the OS disk cache, flushing of
>> Solr-related caches on a machine should involve
>> 
>> 1) Shut down all Solr instances on the machine
>> 2) Clear the OS read cache ('sudo echo 1 > /proc/sys/vm/drop_caches' on
>> a Linux box)
>> 3) Start the Solr instances
>> 
>> I do not know of any Solr-supported way to do step 2. For our
>> performance tests we use custom scripts to perform the steps.
>> 
>> - Toke Eskildsen, State and University Library, Denmark
>> 
>> 
>> -
>> Aucun virus trouvé dans ce message.
>> Analyse effectuée par AVG - www.avg.fr
>> Version: 2013.0.3392 / Base de données virale: 3209/6563 - Date: 09/08/2013
>> La Base de données des virus a expiré.

--
Walter Underwood
wun...@wunderwood.org





Re: Solr 4.2.1 update to 4.3/4.4 problem

2013-08-22 Thread Erick Erickson
Your first problem is that the terms aren't getting to the field
analysis chain as a unit, if you attach &debug=query to your
query and say you're searching lastName:(ogden erickson),
you'll sees something like
lastName:ogden lastName:erickson
when what you want is
lastname:ogden erickson
(note, this is the _parsed_ query, not the input string!
So try escaping the space as
lastname:ogden\ erickson

As for the second problem, _how_ is it "not working at all"?
You're breaking up the input into separate tokens, which you
say you don't want to do. If you really want all your names to
be treated as strings just ignoring, say, the / take a look at
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternReplaceCharFilterFactory
and use it with your first type.

Best
Erick


On Thu, Aug 22, 2013 at 7:34 AM, skorrapa wrote:

> Hello All,
>
> I am also facing a similar issue. I am using Solr 4.3.
> Following is the configuration I gave in schema.xml
>   sortMissingLast="true" omitNorms="true" >
> 
> 
> 
>   
>   
> 
> 
> 
> 
> sortMissingLast="true" omitNorms="true">
>   
> 
> 
>   
>   
> 
> 
> 
>  
>
> My requirement is that any string I give during search should be treated as
> a single string and try to find it, case insensitively.
> I have got strings like first name and last name(for this am using
> string_lower_case), and strings with special character '/'(for this am
> using
> string_id_itm ).
> But I am not getting results as expected. The first field type should also
> accept strings with spaces and give me results but it isn't, and the second
> field type doesnt work at all
>
> e.g of field values: John Smith (for field type 1)
>   MB56789/A (for field type 2)
> Please help
>
> vehovmar wrote
> > Thanks a lot for both replies. Helped me a lot. It seems that
> > EdgeNGramFilterFactory on query analyzer was really my problem, I'll have
> > to test it a little more to be sure.
> >
> >
> > As for the "bf" parameter, I thinks it's quite fine as it is, from
> > documentation:
> >
> > "the bf parameter actually takes a list of function queries separated by
> > whitespace and each with an optional boost"
> > Example: bf="ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3"
> >
> > And I'm using field function, Example Syntax: myFloatField or
> > field(myFloatField)
> >
> >
> > Thanks again to both of you guys!
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086070.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Schema

2013-08-22 Thread Kamaljeet Kaur
On Thu, Aug 22, 2013 at 10:56 PM, SolrLover [via Lucene]
 wrote:
>
> Now use DIH to get the data from MYSQL database in to SOLR..
>
> http://wiki.apache.org/solr/DataImportHandler


These are for versions 1.3, 1.4, 3.6 or 4.0.
Why versions are mentioned there? Don't they work on solr 4.4.0?


-- 
Kamaljeet Kaur

kamalkaur188.wordpress.com
facebook.com/kaur.188




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-tp4086136p4086145.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to SOLR file in svn repository

2013-08-22 Thread SolrLover
I  don't think there's an SOLR- SVN connector available out of the box.

You can write a custom SOLRJ indexer program to get the necessary data from
SVN (using JAVA API) and add the data to SOLR.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-SOLR-file-in-svn-repository-tp4085904p4086144.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dataimporter tika fields empty

2013-08-22 Thread Andreas Owen
i can do it like this but then the content isn't copied to text. it's just in 
text_test







On 22. Aug 2013, at 6:12 PM, Andreas Owen wrote:

> i put it in the tika-entity as attribute, but it doesn't change anything. my 
> bigger concern is why text_test isn't populated at all
> 
> On 22. Aug 2013, at 5:27 PM, Alexandre Rafalovitch wrote:
> 
>> Can you try SOLR-4530 switch:
>> https://issues.apache.org/jira/browse/SOLR-4530
>> 
>> Specifically, setting htmlMapper="identity" on the entity definition. This
>> will tell Tika to send full HTML rather than a seriously stripped one.
>> 
>> Regards,
>> Alex.
>> 
>> Personal website: http://www.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all at
>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>> 
>> 
>> On Thu, Aug 22, 2013 at 11:02 AM, Andreas Owen  wrote:
>> 
>>> i'm trying to index a html page and only user the div with the
>>> id="content". unfortunately nothing is working within the tika-entity, only
>>> the standard text (content) is populated.
>>> 
>>>   do i have to use copyField for test_text to get the data?
>>>   or is there a problem with the entity-hirarchy?
>>>   or is the xpath wrong, even though i've tried it without and just
>>> using text?
>>>   or should i use the updateextractor?
>>> 
>>> data-config.xml:
>>> 
>>> 
>>>   
>>>   
>>>   http://127.0.0.1/tkb/internet/"; name="main"/>
>>> 
>>>   >> url="docImportUrl.xml" forEach="/docs/doc" dataSource="main">
>>>   
>>>   
>>>   
>>>   
>>>   
>>>   
>>> 
>>>   >> url="${rec.path}${rec.file}" dataSource="dataUrl" >
>>>   
>>>   >> xpath="//div[@id='content']" />
>>>   
>>>   
>>> 
>>> 
>>> 
>>> docImporterUrl.xml:
>>> 
>>> 
>>> 
>>> 
>>>   5
>>>   tkb
>>>   Startseite
>>>   blabla ...
>>>   http://localhost/tkb/internet/index.cfm
>>>   http://localhost/tkb/internet/index.cfm/url
>>>   http\specialConf
>>>   
>>>   
>>>   6
>>>   tkb
>>>   Eigenheim
>>>   Machen Sie sich erste Gedanken über den
>>> Erwerb von Wohneigentum? Oder haben Sie bereits konkrete Pläne oder gar ein
>>> spruchreifes Projekt? Wir beraten Sie gerne in allen Fragen rund um den
>>> Erwerb oder Bau von Wohneigentum, damit Ihr Vorhaben auch in finanzieller
>>> Hinsicht gelingt.
>>>   
>>> http://127.0.0.1/tkb/internet/private/beratung/eigenheim.htm
>>>   
>>> http://127.0.0.1/tkb/internet/private/beratung/eigenheim.htm/url
>>>   
>>> 



Solr Ref guide question

2013-08-22 Thread yriveiro
Hi all,

I think that there is some lack in solr's ref doc.

Section "Running Solr" says to run solr using the command:

$ java -jar start.jar

But If I do this with a fresh install, I have a stack trace like this:
http://pastebin.com/5YRRccTx

Is it this behavior as expected?



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Ref-guide-question-tp4086142.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Flushing cache without restarting everything?

2013-08-22 Thread Jean-Sebastien Vachon
I was afraid someone would tell me that... thanks for your input

> -Original Message-
> From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
> Sent: August-22-13 9:56 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Flushing cache without restarting everything?
> 
> On Tue, 2013-08-20 at 20:04 +0200, Jean-Sebastien Vachon wrote:
> > Is there a way to flush the cache of all nodes in a Solr Cloud (by
> > reloading all the cores, through the collection API, ...) without
> > having to restart all nodes?
> 
> As MMapDirectory shares data with the OS disk cache, flushing of
> Solr-related caches on a machine should involve
> 
> 1) Shut down all Solr instances on the machine
> 2) Clear the OS read cache ('sudo echo 1 > /proc/sys/vm/drop_caches' on
> a Linux box)
> 3) Start the Solr instances
> 
> I do not know of any Solr-supported way to do step 2. For our
> performance tests we use custom scripts to perform the steps.
> 
> - Toke Eskildsen, State and University Library, Denmark
> 
> 
> -
> Aucun virus trouvé dans ce message.
> Analyse effectuée par AVG - www.avg.fr
> Version: 2013.0.3392 / Base de données virale: 3209/6563 - Date: 09/08/2013
> La Base de données des virus a expiré.


Re: Schema

2013-08-22 Thread SolrLover
Now use DIH to get the data from MYSQL database in to SOLR..

http://wiki.apache.org/solr/DataImportHandler

You need to define the field mapping (between my sql and SOLR document) in
data-config.xml.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-tp4086136p4086140.html
Sent from the Solr - User mailing list archive at Nabble.com.


Schema

2013-08-22 Thread Kamaljeet Kaur
Hello there,
I have installed solr and its working fine on localhost. Have indexed the
example files given along with solr-4.4.0. These are CSV or XML. Now I want
to index mysql database for django project and search the queries from user
end and also implement more features. What should I do?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-tp4086136.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: UpdateProcessor not working with DIH, but works with SolrJ

2013-08-22 Thread Andrea Gazzarini

Ok, found

class="org.apache.solr.handler.dataimport.DataImportHandler">


dih-config.xml
*nohtml**<*/str>



Of course, my mistake...when I changed the name of the chain I deleted 
the "<" char.

Sorry

On 08/22/2013 06:15 PM, Shawn Heisey wrote:
of "update.chain" so this shouldn't be the problem. 




Re: UpdateProcessor not working with DIH, but works with SolrJ

2013-08-22 Thread Shawn Heisey

On 8/22/2013 10:06 AM, Andrea Gazzarini wrote:

yes, yes of course, you should use your already declared request
handler...that was just a copied and pasted example :)

I'm curious about what kind of error you gotI copied the snippet
above from a working core (just replaced the name of the chain)

BTW: AFAIK is the "update.processor" that has been deprecated in favor
of "update.chain" so this shouldn't be the problem.


Here's the full exception.  I use xinclude heavily in my solrconfig.xml. 
 The xinclude directives are actually almost the only thing that's in 
solrconfig.xml.


http://apaste.info/7PB0

I'm going to try setting my update processor to default as recommended 
by Steve Rowe.


Thanks,
Shawn



Re: dataimporter tika fields empty

2013-08-22 Thread Andreas Owen
i put it in the tika-entity as attribute, but it doesn't change anything. my 
bigger concern is why text_test isn't populated at all

On 22. Aug 2013, at 5:27 PM, Alexandre Rafalovitch wrote:

> Can you try SOLR-4530 switch:
> https://issues.apache.org/jira/browse/SOLR-4530
> 
> Specifically, setting htmlMapper="identity" on the entity definition. This
> will tell Tika to send full HTML rather than a seriously stripped one.
> 
> Regards,
> Alex.
> 
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
> 
> 
> On Thu, Aug 22, 2013 at 11:02 AM, Andreas Owen  wrote:
> 
>> i'm trying to index a html page and only user the div with the
>> id="content". unfortunately nothing is working within the tika-entity, only
>> the standard text (content) is populated.
>> 
>>do i have to use copyField for test_text to get the data?
>>or is there a problem with the entity-hirarchy?
>>or is the xpath wrong, even though i've tried it without and just
>> using text?
>>or should i use the updateextractor?
>> 
>> data-config.xml:
>> 
>> 
>>
>>
>>http://127.0.0.1/tkb/internet/"; name="main"/>
>> 
>>> url="docImportUrl.xml" forEach="/docs/doc" dataSource="main">
>>
>>
>>
>>
>>
>>
>> 
>>> url="${rec.path}${rec.file}" dataSource="dataUrl" >
>>
>>> xpath="//div[@id='content']" />
>>
>>
>> 
>> 
>> 
>> docImporterUrl.xml:
>> 
>> 
>> 
>> 
>>5
>>tkb
>>Startseite
>>blabla ...
>>http://localhost/tkb/internet/index.cfm
>>http://localhost/tkb/internet/index.cfm/url
>>http\specialConf
>>
>>
>>6
>>tkb
>>Eigenheim
>>Machen Sie sich erste Gedanken über den
>> Erwerb von Wohneigentum? Oder haben Sie bereits konkrete Pläne oder gar ein
>> spruchreifes Projekt? Wir beraten Sie gerne in allen Fragen rund um den
>> Erwerb oder Bau von Wohneigentum, damit Ihr Vorhaben auch in finanzieller
>> Hinsicht gelingt.
>>
>> http://127.0.0.1/tkb/internet/private/beratung/eigenheim.htm
>>
>> http://127.0.0.1/tkb/internet/private/beratung/eigenheim.htm/url
>>
>> 



Re: UpdateProcessor not working with DIH, but works with SolrJ

2013-08-22 Thread Shawn Heisey

On 8/22/2013 10:02 AM, Steve Rowe wrote:

You could declare your update chain as the default by adding 'default="true"' 
to its declaring element:



and then you wouldn't need to declare it as the default update.chain in either 
of your request handlers.


If I did this, would it only apply the HTML processor to only the fields 
that I have specified in those XML sections?  I haven't thought through 
the implications, but I think it might be OK.


Thanks,
Shawn



Re: UpdateProcessor not working with DIH, but works with SolrJ

2013-08-22 Thread Andrea Gazzarini
yes, yes of course, you should use your already declared request 
handler...that was just a copied and pasted example :)


I'm curious about what kind of error you gotI copied the snippet 
above from a working core (just replaced the name of the chain)


BTW: AFAIK is the "update.processor" that has been deprecated in favor 
of "update.chain" so this shouldn't be the problem.


Best,
Gazza

On 08/22/2013 05:57 PM, Shawn Heisey wrote:

On 8/22/2013 9:42 AM, Andrea Gazzarini wrote:

You should declare this

nohtml

in the "defaults" section of the RequestHandler that corresponds to your
dataimporthandler. You should have something like this:

 
 
 dih-config.xml
 nohtml/str>
 
 

Otherwise the default update chain will be called (and your URP are not
part of that). The solrj, behind the scenes, is a client of the /update
request handler, that's the reason why using that you can see your URP
working.


This results in an error parsing the config, so my cores won't start 
up.  I saw another message via google that talked about using 
update.processor instead of update.chain, so I tried that as well, 
with no luck.


Can I ask DIH to use the /update handler that I have declared already?

Thanks,
Shawn





Re: UpdateProcessor not working with DIH, but works with SolrJ

2013-08-22 Thread Steve Rowe
You could declare your update chain as the default by adding 'default="true"' 
to its declaring element:

   

and then you wouldn't need to declare it as the default update.chain in either 
of your request handlers.

On Aug 22, 2013, at 11:57 AM, Shawn Heisey  wrote:

> On 8/22/2013 9:42 AM, Andrea Gazzarini wrote:
>> You should declare this
>> 
>> nohtml
>> 
>> in the "defaults" section of the RequestHandler that corresponds to your
>> dataimporthandler. You should have something like this:
>> 
>> > class="org.apache.solr.handler.dataimport.DataImportHandler">
>> 
>> dih-config.xml
>> nohtml/str>
>> 
>> 
>> 
>> Otherwise the default update chain will be called (and your URP are not
>> part of that). The solrj, behind the scenes, is a client of the /update
>> request handler, that's the reason why using that you can see your URP
>> working.
> 
> This results in an error parsing the config, so my cores won't start up.  I 
> saw another message via google that talked about using update.processor 
> instead of update.chain, so I tried that as well, with no luck.
> 
> Can I ask DIH to use the /update handler that I have declared already?
> 
> Thanks,
> Shawn
> 



Re: UpdateProcessor not working with DIH, but works with SolrJ

2013-08-22 Thread Shawn Heisey

On 8/22/2013 9:42 AM, Andrea Gazzarini wrote:

You should declare this

nohtml

in the "defaults" section of the RequestHandler that corresponds to your
dataimporthandler. You should have something like this:

 
 
 dih-config.xml
 nohtml/str>
 
 

Otherwise the default update chain will be called (and your URP are not
part of that). The solrj, behind the scenes, is a client of the /update
request handler, that's the reason why using that you can see your URP
working.


This results in an error parsing the config, so my cores won't start up. 
 I saw another message via google that talked about using 
update.processor instead of update.chain, so I tried that as well, with 
no luck.


Can I ask DIH to use the /update handler that I have declared already?

Thanks,
Shawn



Re: UpdateProcessor not working with DIH, but works with SolrJ

2013-08-22 Thread Andrea Gazzarini

You should declare this

nohtml

in the "defaults" section of the RequestHandler that corresponds to your 
dataimporthandler. You should have something like this:


class="org.apache.solr.handler.dataimport.DataImportHandler">


dih-config.xml
nohtml/str>



Otherwise the default update chain will be called (and your URP are not 
part of that). The solrj, behind the scenes, is a client of the /update 
request handler, that's the reason why using that you can see your URP 
working.


Best,
Gazza


On 08/22/2013 05:35 PM, Shawn Heisey wrote:
I have an updateProcessor defined.  It seems to work perfectly when I 
index with SolrJ, but when I use DIH (which I do for a full index 
rebuild), it doesn't work.  This is the case with both Solr 4.4 and 
Solr 4.5-SNAPSHOT, svn revision 1516342.


Here's a solrconfig.xml excerpt:


  
  
ft_text
ft_subject
keywords
text_preview
  
  
  
ft_text
ft_subject
keywords
text_preview
  
  
  


  

  nohtml

  

If I turn on DEBUG logging for FieldMutatingUpdateProcessorFactory, I 
see "replace value" debugs, but the contents of the index are only 
changed if the update happens with SolrJ, not with DIH.


A side issue.  FieldMutatingUpdateProcessorFactory has the following 
line in it, at about line 72:


if (destVal != srcVal) {

Shouldn't this be the following?

if (destVal.equals(srcVal)) {

Thanks,
Shawn




UpdateProcessor not working with DIH, but works with SolrJ

2013-08-22 Thread Shawn Heisey
I have an updateProcessor defined.  It seems to work perfectly when I 
index with SolrJ, but when I use DIH (which I do for a full index 
rebuild), it doesn't work.  This is the case with both Solr 4.4 and Solr 
4.5-SNAPSHOT, svn revision 1516342.


Here's a solrconfig.xml excerpt:


  
  
ft_text
ft_subject
keywords
text_preview
  
  
  
ft_text
ft_subject
keywords
text_preview
  
  
  


  

  nohtml

  

If I turn on DEBUG logging for FieldMutatingUpdateProcessorFactory, I 
see "replace value" debugs, but the contents of the index are only 
changed if the update happens with SolrJ, not with DIH.


A side issue.  FieldMutatingUpdateProcessorFactory has the following 
line in it, at about line 72:


if (destVal != srcVal) {

Shouldn't this be the following?

if (destVal.equals(srcVal)) {

Thanks,
Shawn


Re: dataimporter tika fields empty

2013-08-22 Thread Alexandre Rafalovitch
Can you try SOLR-4530 switch:
https://issues.apache.org/jira/browse/SOLR-4530

Specifically, setting htmlMapper="identity" on the entity definition. This
will tell Tika to send full HTML rather than a seriously stripped one.

Regards,
Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Thu, Aug 22, 2013 at 11:02 AM, Andreas Owen  wrote:

> i'm trying to index a html page and only user the div with the
> id="content". unfortunately nothing is working within the tika-entity, only
> the standard text (content) is populated.
>
> do i have to use copyField for test_text to get the data?
> or is there a problem with the entity-hirarchy?
> or is the xpath wrong, even though i've tried it without and just
> using text?
> or should i use the updateextractor?
>
> data-config.xml:
>
> 
> 
> 
> http://127.0.0.1/tkb/internet/"; name="main"/>
> 
>  url="docImportUrl.xml" forEach="/docs/doc" dataSource="main">
> 
> 
> 
> 
> 
> 
>
>  url="${rec.path}${rec.file}" dataSource="dataUrl" >
> 
>  xpath="//div[@id='content']" />
> 
> 
> 
> 
>
> docImporterUrl.xml:
>
> 
> 
> 
> 5
> tkb
> Startseite
> blabla ...
> http://localhost/tkb/internet/index.cfm
> http://localhost/tkb/internet/index.cfm/url
> http\specialConf
> 
> 
> 6
> tkb
> Eigenheim
> Machen Sie sich erste Gedanken über den
> Erwerb von Wohneigentum? Oder haben Sie bereits konkrete Pläne oder gar ein
> spruchreifes Projekt? Wir beraten Sie gerne in allen Fragen rund um den
> Erwerb oder Bau von Wohneigentum, damit Ihr Vorhaben auch in finanzieller
> Hinsicht gelingt.
> 
> http://127.0.0.1/tkb/internet/private/beratung/eigenheim.htm
> 
> http://127.0.0.1/tkb/internet/private/beratung/eigenheim.htm/url
> 
> 


dataimporter tika fields empty

2013-08-22 Thread Andreas Owen
i'm trying to index a html page and only user the div with the id="content". 
unfortunately nothing is working within the tika-entity, only the standard text 
(content) is populated. 

do i have to use copyField for test_text to get the data? 
or is there a problem with the entity-hirarchy?
or is the xpath wrong, even though i've tried it without and just using 
text?
or should i use the updateextractor?

data-config.xml:




http://127.0.0.1/tkb/internet/"; name="main"/>

 





  



   





docImporterUrl.xml:




5
tkb
Startseite
blabla ...
http://localhost/tkb/internet/index.cfm
http://localhost/tkb/internet/index.cfm/url
http\specialConf


6
tkb
Eigenheim
Machen Sie sich erste Gedanken über den Erwerb von 
Wohneigentum? Oder haben Sie bereits konkrete Pläne oder gar ein spruchreifes 
Projekt? Wir beraten Sie gerne in allen Fragen rund um den Erwerb oder Bau von 
Wohneigentum, damit Ihr Vorhaben auch in finanzieller Hinsicht 
gelingt.

http://127.0.0.1/tkb/internet/private/beratung/eigenheim.htm

http://127.0.0.1/tkb/internet/private/beratung/eigenheim.htm/url



How to access latitude and longitude with only LatLonType?

2013-08-22 Thread zhangquan913
Hello All,

I am currently doing a spatial query in solr. I indexed "coordinates"
(type="location" class="solr.LatLonType"), but the following query failed.
http://localhost/solr/quan/select?q=*:*&stats=true&stats.field=coordinates&stats.facet=township&rows=0
It showed an error:
Field type
location{class=org.apache.solr.schema.SpatialRecursivePrefixTreeFieldType,analyzer=org.apache.solr.schema.FieldType$DefaultAnalyzer,args={distErrPct=0.025,
class=solr.SpatialRecursivePrefixTreeFieldType, maxDistErr=0.09,
units=degrees}} is not currently supported

I don't want to create duplicate indexed field "latitude" and "longitude".
How can I use only "coordinates" to do this kind of stats on both latitude
and longitude?

Thanks,
Quan



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-access-latitude-and-longitude-with-only-LatLonType-tp4086109.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Adding one core to an existing core?

2013-08-22 Thread Andrea Gazzarini
First, a core is a separate index so it is completely indipendent from 
the already existing core(s). So basically you don't need to reindex.


In order to have two cores (but the same applies for n cores): you must 
have in your solr.home the file (solr.xml) described here


http://wiki.apache.org/solr/Solr.xml%20%28supported%20through%204.x%29

then, you must obviously have one or two directories (corresponding to 
the "instanceDir" attribute). I said one or two because if the indexes 
configuration is basically the same (or something changes but is 
dynamically configured - i.e. core name) you can create two instances 
starting from the same configuration. I mean



 
  
  
 


Otherwise you must have two different conf directories that contain 
indexes configuration. You should already have a first one (the current 
core), you just need to have another conf dir with solrconfig.xml, 
schema.xml and other required files. In this case each core will have 
its own instanceDir.



 
  
  
 


Best,
Andrea



On 08/22/2013 04:04 PM, Bruno Mannina wrote:

Little precision, I'm on Ubuntu 12.04LTS

Le 22/08/2013 15:56, Bruno Mannina a écrit :

Dear Users,

(Solr3.6 + Tomcat7)

I use since two years Solr with one core, I would like now to add one 
another core (a new database).


Can I do this without re-indexing my core1 ?
could you point me to a good tutorial to do that?

(my current database is around 200Go for 86 000 000 docs)
My new database will be little, around 1000 documents of 5ko each.

thanks a lot,
Bruno









Re: Adding one core to an existing core?

2013-08-22 Thread Bruno Mannina

Little precision, I'm on Ubuntu 12.04LTS

Le 22/08/2013 15:56, Bruno Mannina a écrit :

Dear Users,

(Solr3.6 + Tomcat7)

I use since two years Solr with one core, I would like now to add one 
another core (a new database).


Can I do this without re-indexing my core1 ?
could you point me to a good tutorial to do that?

(my current database is around 200Go for 86 000 000 docs)
My new database will be little, around 1000 documents of 5ko each.

thanks a lot,
Bruno







Re: Flushing cache without restarting everything?

2013-08-22 Thread Toke Eskildsen
On Tue, 2013-08-20 at 20:04 +0200, Jean-Sebastien Vachon wrote:
> Is there a way to flush the cache of all nodes in a Solr Cloud (by
> reloading all the cores, through the collection API, ...) without
> having to restart all nodes?

As MMapDirectory shares data with the OS disk cache, flushing of
Solr-related caches on a machine should involve

1) Shut down all Solr instances on the machine
2) Clear the OS read cache ('sudo echo 1 > /proc/sys/vm/drop_caches' on
a Linux box)
3) Start the Solr instances

I do not know of any Solr-supported way to do step 2. For our
performance tests we use custom scripts to perform the steps.

- Toke Eskildsen, State and University Library, Denmark



Adding one core to an existing core?

2013-08-22 Thread Bruno Mannina

Dear Users,

(Solr3.6 + Tomcat7)

I use since two years Solr with one core, I would like now to add one 
another core (a new database).


Can I do this without re-indexing my core1 ?
could you point me to a good tutorial to do that?

(my current database is around 200Go for 86 000 000 docs)
My new database will be little, around 1000 documents of 5ko each.

thanks a lot,
Bruno



Re: when does RAMBufferSize work when commit.

2013-08-22 Thread Shawn Heisey
On 8/22/2013 2:25 AM, YouPeng Yang wrote:
> Hi all
> About the RAMBufferSize  and commit ,I have read the doc :
> http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/60544
> 
>I can not figure out how do they make work.
> 
>   Given the settings:
> 
>  10
>  
>${solr.autoCommit.maxDocs:1000}
>false
>  
> 
>  If the indexs docs up to 1000  and the size of these docs is below 10MB
> ,it will trigger an commit.
> 
>  If the size of the indexed docs reaches to 10MB while the the number is below
> 1000, it will not trigger an commit , however the index docs will just
> be flushed
> to disk,it will only commit when the number reaches to 1000?

Your actual config seems to have its wires crossed a little bit.  You
have the autoCommit.maxDocs value being used in a maxTime tag, not a
maxDocs tag.  You may want to adjust the variable name or the tag.

If that were a maxDocs tag instead of maxTime, your description would be
pretty much right on the money.  The space taken in the RAM buffer is
typically larger than the actual document size, but the general idea is
sound.

The default for RAMBufferSizeMB in recent Solr versions is 100.  Unless
you've got super small documents, or you are in a limited memory
situation and have a lot of cores, I would not go smaller than that.

Thanks,
Shawn



RE: Flushing cache without restarting everything?

2013-08-22 Thread Jean-Sebastien Vachon
How can you validate that the changes you just made had any impact on the 
performance of the cloud if you don't have the same starting conditions?

What we do basically is running a batch of requests to warm up the index and 
then launch the benchmark itself. That way we can measure the impact of our 
change(s). Otherwise there is absolutely no way we can be sure who is 
responsible for the gain or loss of performance.

Restarting a cloud is actually a real pain, I just want to know if there is a 
faster way to proceed.

> -Original Message-
> From: Dmitry Kan [mailto:solrexp...@gmail.com]
> Sent: August-22-13 7:26 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Flushing cache without restarting everything?
> 
> But is it really a good benchmarking, if you flush the cache? Wouldn't you
> want to benchmark against a system, that would be comparable to what is
> under real (=production) load?
> 
> Dmitry
> 
> 
> On Tue, Aug 20, 2013 at 9:39 PM, Jean-Sebastien Vachon < jean-
> sebastien.vac...@wantedanalytics.com> wrote:
> 
> > I just want to run benchmarks and want to have the same starting
> > conditions.
> >
> > > -Original Message-
> > > From: Walter Underwood [mailto:wun...@wunderwood.org]
> > > Sent: August-20-13 2:06 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Flushing cache without restarting everything?
> > >
> > > Why? What are you trying to acheive with this? --wunder
> > >
> > > On Aug 20, 2013, at 11:04 AM, Jean-Sebastien Vachon wrote:
> > >
> > > > Hi All,
> > > >
> > > > Is there a way to flush the cache of all nodes in a Solr Cloud (by
> > reloading all
> > > the cores, through the collection API, ...) without having to
> > > restart
> > all nodes?
> > > >
> > > > Thanks
> > >
> > >
> > >
> > >
> > >
> > > -
> > > Aucun virus trouvé dans ce message.
> > > Analyse effectuée par AVG - www.avg.fr
> > > Version: 2013.0.3392 / Base de données virale: 3209/6563 - Date:
> > 09/08/2013
> > > La Base de données des virus a expiré.
> >
> 
> -
> Aucun virus trouvé dans ce message.
> Analyse effectuée par AVG - www.avg.fr
> Version: 2013.0.3392 / Base de données virale: 3209/6563 - Date: 09/08/2013
> La Base de données des virus a expiré.


Re: SolrCmdDistributor may not be threadsafe...

2013-08-22 Thread Tor Egil
Updated to sun java 1.7.0_25. on solr 4.4.0. but are still getting mutated
strings:

725597 [Thread-20] ERROR org.apache.solr.update.SolrCmdDistributor  - shard
update error StdNode:
http://10.231.188.126:8080/solr/kunde0/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Unexpected close tag ; expected .
 at [row,col {unknown-source}]: [1,2096]
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:424)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:375)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCmdDistributor-may-not-be-threadsafe-tp4086042p4086076.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Data Import faile in solr 4.3.0

2013-08-22 Thread Shalin Shekhar Mangar
Call "optimize" on your Solr 3.5 server which will write a new index
segment in v3.5 format. Such an index should be read in Solr 4.x
without any problem.

On Thu, Aug 22, 2013 at 5:00 PM, Montu v Boda
 wrote:
> thanks
>
> actually the problem is that we have migrated the solr 1.4 index data to
> solr 3.5 using replication feature of solr 3.5. so that what ever data we
> have in solr 3.5 is of solr 1.4.
>
> so i do not think so it is work in solr 4.x.
>
> so please suggest your view based on my above point.
>
> Thanks & Regards
> Montu v Boda
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Data-Import-faile-in-solr-4-3-0-tp4085868p4086068.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Regards,
Shalin Shekhar Mangar.


Re: relation between optimize and merge

2013-08-22 Thread Jack Krupansky
optimize is an explicit request to perform a merge. Merges occur in the 
background, automatically, as needed or indicated by the parameters of the 
merge policy. An optimize is requested from outside of Solr.


-- Jack Krupansky

-Original Message- 
From: YouPeng Yang

Sent: Thursday, August 22, 2013 3:18 AM
To: solr-user@lucene.apache.org
Subject: relation between optimize and merge

Hi All

  I do have some diffculty with understand the relation between the
optimize and merge
 Can anyone give some tips about the difference.

Regards 



Re: Facing Solr performance during query search

2013-08-22 Thread Toke Eskildsen
On Wed, 2013-08-21 at 10:09 +0200, sivaprasad wrote:
> The slave will poll for every 1hr. 

And are there normally changes?

> We have configured ~2000 facets and the machine configuration is given
> below.

I assume that you only request a subset of those facets at a time.

How much RAM does your machine have? 
How large is your index in GB?
How many documents do you have in your index?

As you are not explicitly warming your facets and since you have a lot
of them, my guess is that you're performing initializing facet calls all
the time. If the slave only has 32GB of RAM (and thus only about 10GB
for disk cache) and if your index is substantially larger than that, the
initialization will require a lot of non-cached disk access.

Try disabling the slave polling, then send 1000 queries and then re-send
the exact same 1000 queries. Are the response times satisfactory the
second time? If so, you should consider warming your facets and/or try
to come up with a solution where you don't have so many of them.

https://sbdevel.wordpress.com/2013/04/16/you-are-faceting-itwrong/

- Toke Eskildsen, State and University Library, Denmark



Re: Solr 4.2.1 update to 4.3/4.4 problem

2013-08-22 Thread skorrapa
Hello All,

I am also facing a similar issue. I am using Solr 4.3.
Following is the configuration I gave in schema.xml
   



  
  


  
  

  


  
  



 

My requirement is that any string I give during search should be treated as
a single string and try to find it, case insensitively.
I have got strings like first name and last name(for this am using
string_lower_case), and strings with special character '/'(for this am using
string_id_itm ).
But I am not getting results as expected. The first field type should also
accept strings with spaces and give me results but it isn't, and the second
field type doesnt work at all

e.g of field values: John Smith (for field type 1)
  MB56789/A (for field type 2)
Please help

vehovmar wrote
> Thanks a lot for both replies. Helped me a lot. It seems that
> EdgeNGramFilterFactory on query analyzer was really my problem, I'll have
> to test it a little more to be sure.
> 
> 
> As for the "bf" parameter, I thinks it's quite fine as it is, from
> documentation:
> 
> "the bf parameter actually takes a list of function queries separated by
> whitespace and each with an optional boost"
> Example: bf="ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3"
> 
> And I'm using field function, Example Syntax: myFloatField or
> field(myFloatField)
> 
> 
> Thanks again to both of you guys!





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086070.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Data Import faile in solr 4.3.0

2013-08-22 Thread Montu v Boda
thanks

actually the problem is that we have migrated the solr 1.4 index data to
solr 3.5 using replication feature of solr 3.5. so that what ever data we
have in solr 3.5 is of solr 1.4.

so i do not think so it is work in solr 4.x.

so please suggest your view based on my above point.

Thanks & Regards
Montu v Boda



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Data-Import-faile-in-solr-4-3-0-tp4085868p4086068.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Flushing cache without restarting everything?

2013-08-22 Thread Dmitry Kan
But is it really a good benchmarking, if you flush the cache? Wouldn't you
want to benchmark against a system, that would be comparable to what is
under real (=production) load?

Dmitry


On Tue, Aug 20, 2013 at 9:39 PM, Jean-Sebastien Vachon <
jean-sebastien.vac...@wantedanalytics.com> wrote:

> I just want to run benchmarks and want to have the same starting
> conditions.
>
> > -Original Message-
> > From: Walter Underwood [mailto:wun...@wunderwood.org]
> > Sent: August-20-13 2:06 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Flushing cache without restarting everything?
> >
> > Why? What are you trying to acheive with this? --wunder
> >
> > On Aug 20, 2013, at 11:04 AM, Jean-Sebastien Vachon wrote:
> >
> > > Hi All,
> > >
> > > Is there a way to flush the cache of all nodes in a Solr Cloud (by
> reloading all
> > the cores, through the collection API, ...) without having to restart
> all nodes?
> > >
> > > Thanks
> >
> >
> >
> >
> >
> > -
> > Aucun virus trouvé dans ce message.
> > Analyse effectuée par AVG - www.avg.fr
> > Version: 2013.0.3392 / Base de données virale: 3209/6563 - Date:
> 09/08/2013
> > La Base de données des virus a expiré.
>


Re: Solr Indexing Status

2013-08-22 Thread Prasi S
Thanks much . This was useful.


On Thu, Aug 22, 2013 at 2:24 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> You can use the /admin/mbeans handler to get all system stats. You can
> find stats such as "adds" and "cumulative_adds" under the update
> handler section.
>
> http://localhost:8983/solr/collection1/admin/mbeans?stats=true
>
> On Thu, Aug 22, 2013 at 12:35 PM, Prasi S  wrote:
> > I am not using dih for indexing csv files. Im pushing data through solrj
> > code. But i want a status something like what dih gives. ie. fire a
> > command=status and we get the response. Is anythin like that available
> for
> > any type of file indexing which we do through api ?
> >
> >
> > On Thu, Aug 22, 2013 at 12:09 AM, Shalin Shekhar Mangar <
> > shalinman...@gmail.com> wrote:
> >
> >> Yes, you can invoke
> >> http://:/solr/dataimport?command=status which will return
> >> how many Solr docs have been added etc.
> >>
> >> On Wed, Aug 21, 2013 at 4:56 PM, Prasi S  wrote:
> >> > Hi,
> >> > I am using solr 4.4 to index csv files. I am using solrj for this. At
> >> > frequent intervels my user may request for "Status". I have to send
> get
> >> > something like in DIH " Indexing in progress.. Added xxx documents".
> >> >
> >> > Is there anything like in dih, where we can fire a command=status to
> get
> >> > the status of indexing for files.
> >> >
> >> >
> >> > Thanks,
> >> > Prasi
> >>
> >>
> >>
> >> --
> >> Regards,
> >> Shalin Shekhar Mangar.
> >>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


DIH not proceeding after few millions

2013-08-22 Thread Prasi S
Hi, Im using DIH to index data to solr. Solr 4.4 version is used. Indexing
proceeds normal in the beginning.

I have some 10 data-config files.

file1 -> select * from table where id between 1 and 100

file2 -> select * from table where id between 100 and 300. and so
on.

Here 4 batches go normally. For the fifth batch, i ge the status from Admin
page ( Dataimport) as

*Duration: 2 hrs*.
Indexed:0 documents ; deleted:0 documents.

And indexing stops. But no documents were indexed. I use single external
zookeeper for this.

I dont see any exception in solr logs and in Zookeeper, below is the status.

INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@627] - Got
user-level KeeperException when processing sessionid:0x1 40a4ce824b0005
type:create cxid:0x29a zxid:0x157d txntype:-1 reqpath:n/a Error P

Any ideas?


Re: Data Import faile in solr 4.3.0

2013-08-22 Thread Shalin Shekhar Mangar
No one is asking you to re-index data. The Solr 3.5 index can be read
and written by a Solr 4.x installation.

On Thu, Aug 22, 2013 at 12:08 PM, Montu v Boda
 wrote:
> Thanks for suggestion
>
> but as per us this is not the right way to re-index all the data each and
> every time. we mean when we migrate the sole from older to latest version.
> there is some way that solr have to provide the solutions for this because
> re indexing the 50 lac document is not an easy job.
>
> we want to know is there any way in solr to do this in easily.
>
> Thanks & Regards
> Montu v Boda
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Data-Import-faile-in-solr-4-3-0-tp4085868p4086020.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr Indexing Status

2013-08-22 Thread Shalin Shekhar Mangar
You can use the /admin/mbeans handler to get all system stats. You can
find stats such as "adds" and "cumulative_adds" under the update
handler section.

http://localhost:8983/solr/collection1/admin/mbeans?stats=true

On Thu, Aug 22, 2013 at 12:35 PM, Prasi S  wrote:
> I am not using dih for indexing csv files. Im pushing data through solrj
> code. But i want a status something like what dih gives. ie. fire a
> command=status and we get the response. Is anythin like that available for
> any type of file indexing which we do through api ?
>
>
> On Thu, Aug 22, 2013 at 12:09 AM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>> Yes, you can invoke
>> http://:/solr/dataimport?command=status which will return
>> how many Solr docs have been added etc.
>>
>> On Wed, Aug 21, 2013 at 4:56 PM, Prasi S  wrote:
>> > Hi,
>> > I am using solr 4.4 to index csv files. I am using solrj for this. At
>> > frequent intervels my user may request for "Status". I have to send get
>> > something like in DIH " Indexing in progress.. Added xxx documents".
>> >
>> > Is there anything like in dih, where we can fire a command=status to get
>> > the status of indexing for files.
>> >
>> >
>> > Thanks,
>> > Prasi
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>



-- 
Regards,
Shalin Shekhar Mangar.


SolrCmdDistributor may not be threadsafe...

2013-08-22 Thread Tor Egil
I have been running DIH Imports (>15 000 000 rows) all day and every now and
then I get some weird errors. Some examples:

A letter is replaced by an unknow character (Should have been a 'V')
285680 [Thread-20] ERROR org.apache.solr.update.SolrCmdDistributor  - shard
update error StdNode:
http://10.231.188.127:8080/solr/kunde0/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
undefined field: "KUNDE_ETTERNA?N"
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:424)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:375)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

  

938360 [Thread-59] ERROR org.apache.solr.update.SolrCmdDistributor  - shard
update error StdNode:
http://10.231.188.186:8080/solr/kunde0/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Unexpected character 'l' (code 108) in start tag Expected a quote
 at [row,col {unknown-source}]: [1,2188]
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:424)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:375)
...

1379931 [Thread-22] ERROR org.apache.solr.update.SolrCmdDistributor  - shard
update error StdNode:
http://10.231.188.186:8080/solr/kunde0/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Unexpected character '0' (code 48) in start tag Expected a quote


2546924 [Thread-79] ERROR org.apache.solr.update.SolrCmdDistributor  - shard
update error StdNode:
http://10.231.188.127:8080/solr/kunde0/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Unexpected character '0' (code 48) in content after '<' (malformed start
element?).
 at [row,col {unknown-source}]: [1,6333]
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:424)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:375)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   

I'm running on jdk1.7.0_21. 4.4.0 1504776 with 3 nodes.

Seen this before?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCmdDistributor-may-not-be-threadsafe-tp4086042.html
Sent from the Solr - User mailing list archive at Nabble.com.


SOLUTION: Clusterstate says "state:recovering", but Core says "I see state: null"?

2013-08-22 Thread Tor Egil
Aliasing instead of swapping removed this problem!

DO NOT USE "SWAP" WHEN IN CLOUD MODE (solr 4.3)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Clusterstate-says-state-recovering-but-Core-says-I-see-state-null-tp4084504p4086037.html
Sent from the Solr - User mailing list archive at Nabble.com.


when does RAMBufferSize work when commit.

2013-08-22 Thread YouPeng Yang
Hi all
About the RAMBufferSize  and commit ,I have read the doc :
http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/60544

   I can not figure out how do they make work.

  Given the settings:

 10
 
   ${solr.autoCommit.maxDocs:1000}
   false
 

 If the indexs docs up to 1000  and the size of these docs is below 10MB
,it will trigger an commit.

 If the size of the indexed docs reaches to 10MB while the the number is below
1000, it will not trigger an commit , however the index docs will just
be flushed
to disk,it will only commit when the number reaches to 1000?

 Are  the two scenarioes right?


Regards


relation between optimize and merge

2013-08-22 Thread YouPeng Yang
Hi All

   I do have some diffculty with understand the relation between the
optimize and merge
  Can anyone give some tips about the difference.

Regards


Re: Solr Indexing Status

2013-08-22 Thread Prasi S
I am not using dih for indexing csv files. Im pushing data through solrj
code. But i want a status something like what dih gives. ie. fire a
command=status and we get the response. Is anythin like that available for
any type of file indexing which we do through api ?


On Thu, Aug 22, 2013 at 12:09 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Yes, you can invoke
> http://:/solr/dataimport?command=status which will return
> how many Solr docs have been added etc.
>
> On Wed, Aug 21, 2013 at 4:56 PM, Prasi S  wrote:
> > Hi,
> > I am using solr 4.4 to index csv files. I am using solrj for this. At
> > frequent intervels my user may request for "Status". I have to send get
> > something like in DIH " Indexing in progress.. Added xxx documents".
> >
> > Is there anything like in dih, where we can fire a command=status to get
> > the status of indexing for files.
> >
> >
> > Thanks,
> > Prasi
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>