Solr-Ajax client

2014-03-11 Thread Davis Marques
Just a quick announcement and request for guidance:

I've developed an open source, Javascript client for Apache Solr. Its very
easy to implement and can be configured to provide faceted search to an
existing Solr index in just a few minutes. The source is available online
here:

  https://bitbucket.org/esrc/eaccpf-ajax

I attempted to add a note about it into the Solr wiki, at
https://wiki.apache.org/solr/IntegratingSolr, but was prevented by the
system.  Is there some protocol for posting information to the wiki?

Davis

-- 
Davis M. Marques

t: 61 0418 450 194
e: dmarq@gmail.com
w: http://www.davismarques.com/


Re: Disabling lookups into disabled caches?

2014-03-11 Thread Otis Gospodnetic
Hi Shawn,

Here it is: https://issues.apache.org/jira/browse/SOLR-5851

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Tue, Mar 11, 2014 at 11:22 PM, Shawn Heisey  wrote:

> On 3/11/2014 8:51 PM, Shawn Heisey wrote:
> > On 3/11/2014 8:07 PM, Otis Gospodnetic wrote:
> >> Is there a way to disable cache *lookups* into cached that are disabled?
> >>
> >> Check this for example:
> https://apps.sematext.com/spm-reports/s/Z04bfIvGyH
> >>
> >> This is a Document cache that was enabled, and then got disabled.  But
> the
> >> lookups are still happening, which is pointless if the cache is
> disabled.
> >>
> >> If that's not doable, I will JIRA?
> >
> > I think this needs an issue.  I've worked up a *possible* patch for the
> > problem.  One that still needs testing and review.  Which reminds me, I
> > should probably invent new test methods for this.
> >
> > The lookups should have very little overhead, but any avoidable overhead
> > *should* be avoided.
>
> The quickfix that I started with on FastLRUCache didn't work and made
> most of the tests fail.  It turns out that FastLRUCache bumps the max
> cache size to 2 when you set it to zero.  I haven't looked deeper into
> the other cache types yet.
>
> Once you create the issue, we can move this discussion there.
>
> Thanks,
> Shawn
>
>


Re: Disabling lookups into disabled caches?

2014-03-11 Thread Shawn Heisey
On 3/11/2014 8:51 PM, Shawn Heisey wrote:
> On 3/11/2014 8:07 PM, Otis Gospodnetic wrote:
>> Is there a way to disable cache *lookups* into cached that are disabled?
>>
>> Check this for example: https://apps.sematext.com/spm-reports/s/Z04bfIvGyH
>>
>> This is a Document cache that was enabled, and then got disabled.  But the
>> lookups are still happening, which is pointless if the cache is disabled.
>>
>> If that's not doable, I will JIRA?
> 
> I think this needs an issue.  I've worked up a *possible* patch for the
> problem.  One that still needs testing and review.  Which reminds me, I
> should probably invent new test methods for this.
> 
> The lookups should have very little overhead, but any avoidable overhead
> *should* be avoided.

The quickfix that I started with on FastLRUCache didn't work and made
most of the tests fail.  It turns out that FastLRUCache bumps the max
cache size to 2 when you set it to zero.  I haven't looked deeper into
the other cache types yet.

Once you create the issue, we can move this discussion there.

Thanks,
Shawn



Re: Disabling lookups into disabled caches?

2014-03-11 Thread Shawn Heisey
On 3/11/2014 8:07 PM, Otis Gospodnetic wrote:
> Is there a way to disable cache *lookups* into cached that are disabled?
> 
> Check this for example: https://apps.sematext.com/spm-reports/s/Z04bfIvGyH
> 
> This is a Document cache that was enabled, and then got disabled.  But the
> lookups are still happening, which is pointless if the cache is disabled.
> 
> If that's not doable, I will JIRA?

I think this needs an issue.  I've worked up a *possible* patch for the
problem.  One that still needs testing and review.  Which reminds me, I
should probably invent new test methods for this.

The lookups should have very little overhead, but any avoidable overhead
*should* be avoided.

Thanks,
Shawn



Re: Store Null in Price Field

2014-03-11 Thread Ahmet Arslan
Hi Ravi,

How about RemoveBlankFieldUpdateProcessorFactory ?

https://lucene.apache.org/solr/4_0_0/solr-core/org/apache/solr/update/processor/


Ahmet



On Tuesday, March 11, 2014 6:11 PM, "EXTERNAL Taminidi Ravi (ETI, 
Automotive-Service-Solutions)"  wrote:
Hi, Is there anyway Index/store value for Null Field in Price column..?

My Price field is tfloat and the XML data file has empty value for the field, 
Solr takes this as string and throwing error. Is there any quick fix for this.?

--Ravi



Disabling lookups into disabled caches?

2014-03-11 Thread Otis Gospodnetic
Hi,

Is there a way to disable cache *lookups* into cached that are disabled?

Check this for example: https://apps.sematext.com/spm-reports/s/Z04bfIvGyH

This is a Document cache that was enabled, and then got disabled.  But the
lookups are still happening, which is pointless if the cache is disabled.

If that's not doable, I will JIRA?

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


use local param in solrconfig fq for access-control

2014-03-11 Thread Andreas Owen
i would like to use $r and $org for access control. it has to allow the fq's 
from my facet to work aswell. i'm not sure if i'm doing it wright or if i 
should add it to a qf or the q itself. the debugquery returns a parsed fq 
string and in them $r and $org are printed instead of their values. how do i 
get them to be intepreted? the lacal params are listed in the response so they 
should be valid.


      {!q.op=OR} (*:* -organisations:["" TO *] -roles:["" TO *]) 
(+organisations:($org) +roles:($r)) (-organisations:["" TO *] +roles:($r)) 
(+organisations:($org) -roles:["" TO *])
     





Re: [Clustering] Full-Index Offline cluster

2014-03-11 Thread Stanislaw Osinski
> Thank you Ahmet, Staszek and Tomnaso ;)
> so the only way to obtain offline Clustering is to move to a customisation
> !
> I will take a look to the interface of the API ( If you can give me a link
> to the class, it will be appreciated, If not I will find it by myself .
>

The API stub is
the org.apache.solr.handler.clustering.DocumentClusteringEngine class in
contrib/clustering. The API has not yet been implemented, so you may want
to tune the API to suit the way you'd like to arrange your full-index
clustering code.

S.


Re: Migration issues - Solr 4.3.0 to Solr 4.4.0

2014-03-11 Thread Chris W
Moving 4 versions ahead may need much additional tests from my side to
ensure our cluster performance is good and within our SLA. Moving to 4.4
 (just 1 month after 4.3.1 was released) gives me the most important bug
fix for reloading collections (which does not work now and have to do a
rolling restart)

I am also ok upgrading to 4.5 but do not want to go too far without more
testing

Either way i think i will hit the issue mentioned above



On Tue, Mar 11, 2014 at 12:22 PM, Erick Erickson wrote:

> First I have to ask why you're going to 4.4  rather than 4.7. I
> understand vetting requirements, but I thought I'd ask No use
> going through this twice if you can avoid it.
>
> On Tue, Mar 11, 2014 at 12:49 PM, Chris W  wrote:
> > I am running solrcloud version 4.3.0  with a 10 m1.xlarge nodes and using
> > zk to manage the state/data for collections and configs.I want to upgrade
> > to version 4.4.0.
> >
> >
> > When i deploy a 4.4 version of solrcloud in my test environment, none of
> > the collections/configs (created using the 4.3 version of solr) that
> exist
> > in zk show up in the core admin. I should also mention that, all of my
> > collection configs for solrconfig.xml have
> >
> >   LUCENE_43.
> >
> >  Should i change the lucene version to match lucene_44 (match solr
> > version?) to get it working again?
> >
> > What is the best way to upgrade to a newer version of solrcloud without
> > deleting and recreating all configs and collections?
> >
> > Kindly advise
> > --
> > Best
> > --
> > C
>



-- 
Best
-- 
C


Re: SolrCloud - Fails to read db config file

2014-03-11 Thread Danny
Hi,

I seem to have the same issue here.

I'm running a very simple Solr server in standalone mode, using a DIH with
the following datasource in *dataconfig.xml* :


It works fine.

Then I stop the standalone server, empty the data directory, and start a
cluster with an embedded ZooKeeper, just by changing the command line to
this one (called from the */danny* directory) :


The Solr cloud starts without problems.

Then I go in the admin interface, in the *Dataimport* page of the core, to
execute a data import. But now the *Entity* dropdown list is empty, and I
get this error in the log :


So it looks like the same problem deniz had.

I don't understand what does /configs/danny_conf/ represents for ZooKeeper ?
Is it a /real/ path, or a /virtual/ one ?

I created a symbolic link that allowed the path
/configs/danny_conf//danny/solr/core1/conf/data-config.xml to be actually
readable, but I still had the error, so I guess it's not a real path that
ZooKeeper is looking for.

In the admin interface, under *Cloud > Tree*, I can see under
/configs/danny_conf/ that all the files that are listed are actually the
files that are under this real path : /danny/solr/core1/conf/ (like
data-config.xml for example).

But I still don't understand what's going on...

Also, I didn't understand Mark Miller's tip :

Mark Miller-3 wrote
> As a workaround, try explicitly setting the directory to write the
> properties file in with the "directory" param. You should be able to set
> it to anything, as it should not be used.

Thanks for any help here :)

Danny.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Fails-to-read-db-config-file-tp4022299p4122963.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Migration issues - Solr 4.3.0 to Solr 4.4.0

2014-03-11 Thread Erick Erickson
First I have to ask why you're going to 4.4  rather than 4.7. I
understand vetting requirements, but I thought I'd ask No use
going through this twice if you can avoid it.

On Tue, Mar 11, 2014 at 12:49 PM, Chris W  wrote:
> I am running solrcloud version 4.3.0  with a 10 m1.xlarge nodes and using
> zk to manage the state/data for collections and configs.I want to upgrade
> to version 4.4.0.
>
>
> When i deploy a 4.4 version of solrcloud in my test environment, none of
> the collections/configs (created using the 4.3 version of solr) that exist
> in zk show up in the core admin. I should also mention that, all of my
> collection configs for solrconfig.xml have
>
>   LUCENE_43.
>
>  Should i change the lucene version to match lucene_44 (match solr
> version?) to get it working again?
>
> What is the best way to upgrade to a newer version of solrcloud without
> deleting and recreating all configs and collections?
>
> Kindly advise
> --
> Best
> --
> C


Re: How to apply Semantic Search in Solr

2014-03-11 Thread Sujit Pal
Hi Sohan,

Given you have 15 days and this looks like a class project, I would suggest
going with John Berryman's approach - he also provides code which you can
just apply to your data. Even if you don't get the exact expansions you
desire, I think you will get results that will pleasantly surprise you :-).

-sujit



On Mon, Mar 10, 2014 at 11:07 PM, Sohan Kalsariya
wrote:

> Hey Sujit thanks a lot.
> But what do you think about Berryman blog post ?
> Is it feasible to apply or should i apply the synonym stuff ?
> which one is good?
> And the 3rd approach you told me about, seems like difficult and
> time consuming for students like me as i will have to submit this in next
> 15 Days.
> Please suggest me something.
>
>
> On Tue, Mar 11, 2014 at 5:12 AM, Sujit Pal  wrote:
>
> > Hi Sohan,
> >
> > You would be the best person to answer your question of how to proceed
> :-).
> > From your original query term "musical events in New York" rewriting to
> > "musical nights at ABC place" OR "concerts events" OR "classical music
> > event" you would have to build into your knowledge base that "ABC place"
> is
> > a synonym for "New York", and that "musical event at New York" is a
> synonym
> > for "concerts events" and "classical music event". You can do this using
> > approach #1 (from the Berryman blog post) and the approach #2 (my first
> > suggestion) but these results are not guaranteed - because your corpus
> may
> > not contain this relationship. Approach #3 (my second suggestion)
> involves
> > lots of work and possibly domain knowledge but much cleaner
> relationships.
> > OTOH, you could get away for this one query by adding the three queries
> > into your synonyms.txt and enabling synonym support in Solr.
> >
> > http://stackoverflow.com/questions/18790256/solr-synonym-not-working
> >
> > So how much effort you put into supporting this feature would be dictated
> > by how important it is to your environment - that is a question only you
> > can answer.
> >
> > -sujit
> >
> >
> >
> > On Sun, Mar 9, 2014 at 11:26 PM, Sohan Kalsariya
> > wrote:
> >
> > > Thanks Sujit and all for your views about semantic search in solr.
> > > But How do i proceed towards, i mean how do i start off the things to
> get
> > > on track ?
> > >
> > >
> > >
> > > On Sat, Mar 8, 2014 at 10:50 PM, Sujit Pal 
> > wrote:
> > >
> > > > Thanks for sharing this link Sohan, its an interesting approach.
> Since
> > > you
> > > > have effectively defined what you mean by Semantic Search, there are
> > > couple
> > > > other approaches I know of to do something like this:
> > > > 1) preprocess your documents looking for terms that co-occur in the
> > same
> > > > document. The more such cooccurrences you find the more strongly
> these
> > > > terms are related (can help with ordering related terms from most
> > related
> > > > to least related). At query time expand the query to include /most/
> > > related
> > > > concepts and search.
> > > > 2) use an external knowledgebase such as a taxonomy that indicates
> > > > relationships between concepts (this is the approach we use). At
> query
> > > time
> > > > expand the query to include related concepts and search.
> > > >
> > > > -sujit
> > > >
> > > > On Sat, Mar 8, 2014 at 8:21 AM, Sohan Kalsariya <
> > > sohankalsar...@gmail.com
> > > > >wrote:
> > > >
> > > > > Basically, when i searched it on Google I got this result :
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> http://www.opensourceconnections.com/2013/08/25/semantic-search-with-solr-and-python-numpy/
> > > > >
> > > > > And I am working on this.
> > > > >
> > > > > So is this useful ?
> > > > >
> > > > >
> > > > > On Sat, Mar 8, 2014 at 3:11 PM, Alexandre Rafalovitch <
> > > > arafa...@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > And how would it know to give you those results? Obviously, you
> > have
> > > > > > some sort of magic/algorithm in your mind. Are you doing
> geographic
> > > > > > location match, category match, synonyms match?
> > > > > >
> > > > > > We can't really help with generic questions. You still need to
> > figure
> > > > > > out what "semantic" means for you specifically.
> > > > > >
> > > > > > Regards,
> > > > > >Alex.
> > > > > > Personal website: http://www.outerthoughts.com/
> > > > > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > > > > > - Time is the quality of nature that keeps events from happening
> > all
> > > > > > at once. Lately, it doesn't seem to be working.  (Anonymous  -
> via
> > > GTD
> > > > > > book)
> > > > > >
> > > > > >
> > > > > > On Sat, Mar 8, 2014 at 4:27 PM, Sohan Kalsariya
> > > > > >  wrote:
> > > > > > > Hello,
> > > > > > >
> > > > > > > I am working on an event listing and promotions website(
> > > > > > > http://allevents.in) and I want to apply semantic search on
> > solr.
> > > > > > > For example, if someone search :
> > > > > > >
> > > > > > > "Musical Events in New York"
> > > > > > > So it would give me results such as :
> > > > > > >
> >

solr-user@lucene.apache.org

2014-03-11 Thread tchaffee
Should the same exact query using fq={!collapse field=fld} return the same
results as  group=true&group.field=fld  ???

I am getting different results for my facets on those queries when I have a
second fq=

This happens in both

4.6.0 1543363 - simon - 2013-11-19 11:16:33 

and

4.8-2014-02-23_07-35-56 1570983 - hudson - 2014-02-23 07:46:13

I can post actual queries and results but I am not seeing anything about how
CollapsingQParserPlugin or grouping works in the debug.



Thanks,
Tim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/CollapsingQParserPlugin-facet-results-fq-collapse-field-fld-vs-group-true-group-field-fld-tp4122952.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [Clustering] Full-Index Offline cluster

2014-03-11 Thread Alessandro Benedetti
Thank you Ahmet, Staszek and Tomnaso ;)
so the only way to obtain offline Clustering is to move to a customisation !
I will take a look to the interface of the API ( If you can give me a link
to the class, it will be appreciated, If not I will find it by myself .

Cheers


2014-03-10 18:48 GMT+00:00 Ahmet Arslan :

>
>
> Hi Staszek, Tommaso,
>
> Thanks for the clarification.
>
> Ahmet
>
> On Monday, March 10, 2014 8:23 PM, Tommaso Teofili <
> tommaso.teof...@gmail.com> wrote:
> Hi Ahmet, Ale,
>
> right, there's a classification module for Lucene (and therefore usable in
> Solr as well), but no clustering support there.
>
> Regards,
> Tommaso
>
>
>
> 2014-03-10 19:15 GMT+01:00 Ahmet Arslan :
>
> > Hi,
> >
> > Thats weird. As far as I know there is no such thing. There is
> > classification stuff but I haven't heard of clustering.
> >
> >
> http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
> >
> > May be others (Dawid Weiss) can clarify?
> >
> > Ahmet
> >
> >
> >
> > On Monday, March 10, 2014 4:24 PM, Alessandro Benedetti <
> > benedetti.ale...@gmail.com> wrote:
> > Thank you, Ahmet, i already know Mahout.
> > What i was curious is if already exists an integration in Solr for
> Offline
> > clustering ...
> > Reading the wiki we can find this phrase : " While Solr contains an
> > extension for for full-index clustering (*off-line* clustering) this
> > section will focus on discussing on-line clustering only."[1]
> > So I was wondering if any documentation stands there :)
> > [1] https://cwiki.apache.org/confluence/display/solr/Result+Clustering
> >
> >
> > 2014-03-10 14:15 GMT+00:00 Ahmet Arslan :
> >
> > > Hi Alessandro,
> > >
> > > Generally Apache mahout http://mahout.apache.org is recommended for
> > > offline clustering.
> > >
> > > Ahmet
> > >
> > >
> > >
> > > On Monday, March 10, 2014 4:11 PM, Alessandro Benedetti <
> > > benedetti.ale...@gmail.com> wrote:
> > > Hi guys,
> > > I'm looking around to find out if it's possible to have a full-index
> > > /Offline cluster.
> > > My scope is to make a full index clustering ad for each document have
> the
> > > cluster field with the id/label of the cluster at indexing time.
> > > Anyone know more details regarding this kind of integration with
> Carrot2
> > ?
> > >
> > > I find only the classic query time clustering approach :
> > > https://cwiki.apache.org/confluence/display/solr/Result+Clustering
> > >
> > > Cheers
> > >
> > >
> > > --
> > > --
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> >
> > >
> > >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
> >
>
>


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Updated to v4.7 - Getting "Search requests cannot accept content streams"

2014-03-11 Thread leevduhl
We resolved this problem by changing the "Content-Type" we were providing.  

Changing it to "application/x-www-form-urlencoded" resolved the issue.

Thanks for the help!

Lee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Updated-to-v4-7-Getting-Search-requests-cannot-accept-content-streams-tp4122540p4122937.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Optimizing RAM

2014-03-11 Thread Shawn Heisey

On 3/11/2014 11:05 AM, abhishek jain wrote:

hi Shawn,
Thanks for the reply,

Is there a way to optimize RAM or does  Solr does automatically. I have
multiple shards and i know i will be querying only 30% of shards most of
time! and i have 6 slaves. so dedicating more slave with 30% most used
shards .

Another question:
Is it advised to serve queries from master or only from slaves? or it
doesnt matter?


You'll have to explain what you mean by 'optimize RAM' before I can 
answer that question and have any confidence that I've given you the 
information you need.


The OS disk cache is handled by the operating system, not Solr.  It is 
automatic, and it is very efficient.  Some operating systems are better 
at it than others, but even the worst of them is pretty good.


For the Java heap, normal usage will eventually allocate the maximum 
heap value.  Java's garbage collection model is not very well optimized 
for large heaps, but it's highly tunable. With good tuning options, it 
usually works very well.


Thanks,
Shawn



Re: Optimizing RAM

2014-03-11 Thread abhishek jain
hi Shawn,
Thanks for the reply,

Is there a way to optimize RAM or does  Solr does automatically. I have
multiple shards and i know i will be querying only 30% of shards most of
time! and i have 6 slaves. so dedicating more slave with 30% most used
shards .

Another question:
Is it advised to serve queries from master or only from slaves? or it
doesnt matter?

thanks
Abhishek




On Tue, Mar 11, 2014 at 9:12 PM, Shawn Heisey  wrote:

> On 3/11/2014 6:14 AM, abhishek.netj...@gmail.com wrote:
> > Hi all,
> > What should be the ideal RAM index size ratio.
> >
> > please reply I expect index to be of size of 60 gb and I dont store
> contents.
>
> Ideally, your total system RAM will be equal to the size of all your
> program's heap requirements, plus the size of all the data for all the
> programs.
>
> If Solr is the only thing on the box, then the ideal memory size is
> roughly the Solr heap plus the size of all the Solr indexes that live on
> that machine.  So if your heap is 8GB and your index is 60GB, you'll
> want at least 68GB of RAM for an ideal setup.  I don't know how big your
> heap is, so I am guessing here.
>
> You said your index does not store much content.  That means you will
> need a higher percentage of your total index size to be in RAM for good
> performance.  I would estimate that you want a minimum of two thirds of
> your index in RAM, which indicates a minimum RAM size of 48GB if we
> assume your heap is 8GB.  64GB would be better.
>
> http://wiki.apache.org/solr/SolrPerformanceProblems#General_information
>
> Thanks,
> Shawn
>
>


-- 
Thanks and kind Regards,
Abhishek jain
+91 9971376767


Migration issues - Solr 4.3.0 to Solr 4.4.0

2014-03-11 Thread Chris W
I am running solrcloud version 4.3.0  with a 10 m1.xlarge nodes and using
zk to manage the state/data for collections and configs.I want to upgrade
to version 4.4.0.


When i deploy a 4.4 version of solrcloud in my test environment, none of
the collections/configs (created using the 4.3 version of solr) that exist
in zk show up in the core admin. I should also mention that, all of my
collection configs for solrconfig.xml have

  LUCENE_43.

 Should i change the lucene version to match lucene_44 (match solr
version?) to get it working again?

What is the best way to upgrade to a newer version of solrcloud without
deleting and recreating all configs and collections?

Kindly advise
-- 
Best
-- 
C


multiple facet.prefix

2014-03-11 Thread Nikhil
Hello All,

I am using solr 3.6 and I want to add multiple facet.prefix in single query.
I searched the forums but could not find the appropriate way.

what i want to do is something like this:

facet.prefix=(A OR B)

Please let me know how can i achieve this?

Thanks,

Nikhil.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/multiple-facet-prefix-tp4122909.html
Sent from the Solr - User mailing list archive at Nabble.com.


Store Null in Price Field

2014-03-11 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Hi, Is there anyway Index/store value for Null Field in Price column..?

My Price field is tfloat and the XML data file has empty value for the field, 
Solr takes this as string and throwing error. Is there any quick fix for this.?

--Ravi


Re: PHP Solr Client - spellchecker

2014-03-11 Thread Shawn Heisey
On 3/11/2014 2:40 AM, rachun wrote:
> $q='macbook';
> $client = new SolrClient($config);
> $query = new SolrQuery();
> $query->setQuery($q);
> $query->addParam("shards.qt","/spell");
> $query->addParam("fl","product_name_th");
> 
> $query_response = $client->query($query);
> $result = $query_response->getResponse();



> === Solr Log =
> 
> INFO  - 2014-03-11 15:23:48.556; org.apache.solr.core.SolrCore;
> [collection1] webapp=/solr path=/select/
> params={fl=product_name_th&indent=on&shards.qt=/spell&start=0&q=macbook&wt=xml&rows=0&version=2.2}
> hits=4 status=0 QTime=2 
> ==
> At this log you can see it didn't go through my requestHandler named spell
> but when I try this

Your request went to the /select handler, not the /spell handler.  The
shards.qt parameter controls where the request goes for *distributed*
queries, not standard queries.

I do not see a method in the documentation for the PHP library that sets
the request handler path.  In SolrJ, this is the setRequestHandler method.

http://lucene.apache.org/solr/4_7_0/solr-solrj/org/apache/solr/client/solrj/SolrQuery.html#setRequestHandler%28java.lang.String%29

If you cannot set the request handler path to "/spell" instead of the
default "/select" then you would need to enable 'handleSelect' on the
 element in solrconfig.xml and send a "qt" parameter
set to "/spell".  This capability was disabled in newer Solr releases
because it allows *any* request handler to be used with /select, even
/update, which is a potential exploit.

I would recommend asking for some help on whatever support resources are
available on the PHP client, and you may need to file a bug on that client.

Thanks,
Shawn



Re: replica reports recovery_failed but is considered the leader

2014-03-11 Thread Oliver Schrenk
Solr 4.7

On 11 Mar 2014, at 16:43, Erick Erickson  wrote:

> What version of Solr? There's been quite a bit of work
> between various 4x versions.
> 
> Erick
> 
> On Tue, Mar 11, 2014 at 11:25 AM, Oliver Schrenk
>  wrote:
>> Hi,
>> 
>> After an unsuccessful indexing on a Solr Cloud cluster with four machines, 
>> were we experienced a lot of errors we are still trying to investigate, we 
>> found the cluster to be in a weird state.
>> 
>>{"collection_v1":{
>>"shards":{
>>  "shard1":{
>>"range":"8000-bfff",
>>"state":"active",
>>"replicas":{
>>  "core_node1":{
>>"state":"recovery_failed",
>>"base_url":"http://solr-host9:7070/solr";,
>>"core":"elmar_v1_shard1_replica1",
>>"node_name":"solr-host9:7070_solr",
>>"leader":"true"},
>>  "core_node2":{
>>"state":"active",
>>"base_url":"http://solr-host8:7070/solr";,
>>"core":"elmar_v1_shard1_replica2",
>>"node_name":"solr-host8:7070_solr"}}},
>> 
>>...
>> 
>>"maxShardsPerNode":"2",
>>"router":{"name":"compositeId"},
>>"replicationFactor":"2"}}
>>}
>> 
>> 
>> From my point of view it doesn't make sense that core_node1is the leader of 
>> shard1, when it can't even be recovered.  With the other machine completely 
>> working, why is core_node2 not the leader? Am I wrong in my assumption? In 
>> the same vein, how I can I manually set the leader?
>> 
>> Regards
>> Oliver
>> 



Re: replica reports recovery_failed but is considered the leader

2014-03-11 Thread Erick Erickson
What version of Solr? There's been quite a bit of work
between various 4x versions.

Erick

On Tue, Mar 11, 2014 at 11:25 AM, Oliver Schrenk
 wrote:
> Hi,
>
> After an unsuccessful indexing on a Solr Cloud cluster with four machines, 
> were we experienced a lot of errors we are still trying to investigate, we 
> found the cluster to be in a weird state.
>
> {"collection_v1":{
> "shards":{
>   "shard1":{
> "range":"8000-bfff",
> "state":"active",
> "replicas":{
>   "core_node1":{
> "state":"recovery_failed",
> "base_url":"http://solr-host9:7070/solr";,
> "core":"elmar_v1_shard1_replica1",
> "node_name":"solr-host9:7070_solr",
> "leader":"true"},
>   "core_node2":{
> "state":"active",
> "base_url":"http://solr-host8:7070/solr";,
> "core":"elmar_v1_shard1_replica2",
> "node_name":"solr-host8:7070_solr"}}},
>
> ...
>
> "maxShardsPerNode":"2",
> "router":{"name":"compositeId"},
> "replicationFactor":"2"}}
> }
>
>
> From my point of view it doesn't make sense that core_node1is the leader of 
> shard1, when it can't even be recovered.  With the other machine completely 
> working, why is core_node2 not the leader? Am I wrong in my assumption? In 
> the same vein, how I can I manually set the leader?
>
> Regards
> Oliver
>


Re: Optimizing RAM

2014-03-11 Thread Shawn Heisey
On 3/11/2014 6:14 AM, abhishek.netj...@gmail.com wrote:
> Hi all,
> What should be the ideal RAM index size ratio.
> 
> please reply I expect index to be of size of 60 gb and I dont store contents. 

Ideally, your total system RAM will be equal to the size of all your
program's heap requirements, plus the size of all the data for all the
programs.

If Solr is the only thing on the box, then the ideal memory size is
roughly the Solr heap plus the size of all the Solr indexes that live on
that machine.  So if your heap is 8GB and your index is 60GB, you'll
want at least 68GB of RAM for an ideal setup.  I don't know how big your
heap is, so I am guessing here.

You said your index does not store much content.  That means you will
need a higher percentage of your total index size to be in RAM for good
performance.  I would estimate that you want a minimum of two thirds of
your index in RAM, which indicates a minimum RAM size of 48GB if we
assume your heap is 8GB.  64GB would be better.

http://wiki.apache.org/solr/SolrPerformanceProblems#General_information

Thanks,
Shawn



replica reports recovery_failed but is considered the leader

2014-03-11 Thread Oliver Schrenk
Hi,

After an unsuccessful indexing on a Solr Cloud cluster with four machines, were 
we experienced a lot of errors we are still trying to investigate, we found the 
cluster to be in a weird state.

{"collection_v1":{
"shards":{
  "shard1":{
"range":"8000-bfff",
"state":"active",
"replicas":{
  "core_node1":{
"state":"recovery_failed",
"base_url":"http://solr-host9:7070/solr";,
"core":"elmar_v1_shard1_replica1",
"node_name":"solr-host9:7070_solr",
"leader":"true"},
  "core_node2":{
"state":"active",
"base_url":"http://solr-host8:7070/solr";,
"core":"elmar_v1_shard1_replica2",
"node_name":"solr-host8:7070_solr"}}},

...

"maxShardsPerNode":"2",
"router":{"name":"compositeId"},
"replicationFactor":"2"}}
}


From my point of view it doesn’t make sense that core_node1is the leader of 
shard1, when it can’t even be recovered.  With the other machine completely 
working, why is core_node2 not the leader? Am I wrong in my assumption? In the 
same vein, how I can I manually set the leader?

Regards
Oliver



Re: zkHost configuration

2014-03-11 Thread Greg Walters
It's used for failover and if you've got ZooKeeper running on a separate 
machine(s) you need a way to tell Solr where to look.

Thanks,
Greg

On Mar 11, 2014, at 10:11 AM, Oliver Schrenk  wrote:

> Hi,
> 
> I was wondering why there is the need to full specify all zookeeper hosts 
> when starting up Solr. For example using
> 
>   java -Djetty.port=7574 
> -DzkHost=localhost:2181,zkhost1:2181,zkhost2:2181,zkhost3:2181 -jar start.jar
> 
> Isn’t it enough to point to localhost:2181 and let the Zookeeper ensemble 
> itself where the other machines are? Or is this just used for failover in 
> case  a zookeeper machine goes down?
> 
> 
> Regards,
> Oliver
> 
> 



zkHost configuration

2014-03-11 Thread Oliver Schrenk
Hi,

I was wondering why there is the need to full specify all zookeeper hosts when 
starting up Solr. For example using

java -Djetty.port=7574 
-DzkHost=localhost:2181,zkhost1:2181,zkhost2:2181,zkhost3:2181 -jar start.jar

Isn’t it enough to point to localhost:2181 and let the Zookeeper ensemble 
itself where the other machines are? Or is this just used for failover in case  
a zookeeper machine goes down?


Regards,
Oliver




Re: Implementing a customised tokenizer

2014-03-11 Thread Ahmet Arslan
Hi,

expungesDeletes (default false) is not done automatically through SolrJ. 
Please see : https://issues.apache.org/jira/browse/SOLR-1487

During segment merge, deleted terms purged. Thats why problem solved by itself. 
 

Ahmet


On Tuesday, March 11, 2014 4:07 PM, epnRui  wrote:
Hi Ahmet,

I think the expungesDelete is done automatically through SolrJ. So I don't
think it was that.
THe problem solved by itself apparently. I wonder if it has to do with an
automatic optimization of Solr indexes?
Otherwise it was something similar to XY problem :P

Thanks for the help!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Implementing-a-customised-tokenizer-tp4121355p4122864.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: Implementing a customised tokenizer

2014-03-11 Thread Furkan KAMACI
Hi;

I suggest you to look at the source code. NGramTokenizer.java has some
explanations as comments and it may help you.

Thanks;
Furkan KAMACI


2014-03-11 16:06 GMT+02:00 epnRui :

> Hi Ahmet,
>
> I think the expungesDelete is done automatically through SolrJ. So I don't
> think it was that.
> THe problem solved by itself apparently. I wonder if it has to do with an
> automatic optimization of Solr indexes?
> Otherwise it was something similar to XY problem :P
>
> Thanks for the help!
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Implementing-a-customised-tokenizer-tp4121355p4122864.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Unable to get offsets using AtomicReader.termPositionsEnum(Term)

2014-03-11 Thread Jefferson French
Thank you, Robert. You are right, I was confused between the two. I also
didn't know the "storeOffsetsWithPositions" existed. My code works as I
expected now.


On Mon, Mar 10, 2014 at 11:11 PM, Robert Muir  wrote:

> Hello, I think you are confused between two different index
> structures, probably because of the name of the options in solr.
>
> 1. indexing term vectors: this means given a document, you can go
> lookup a miniature "inverted index" just for that document. That means
> each document has "term vectors" which has a term dictionary of the
> terms in that one document, and optionally things like positions and
> character offsets. This can be useful if you are examining *many
> terms* for just a few documents. For example: the MoreLikeThis use
> case. In solr this is activated with termVectors=true. To additionally
> store positions/offsets information inside the term vectors its
> termPositions and termOffsets, respectively.
>
> 2. indexing character offsets: this means given a term, you can get
> the offset information "along with" each position that matched. So
> really you can think of this as a special form of a payload. This is
> useful if you are examining *many documents* for just a few terms. For
> example, many highlighting use cases. In solr this is activated with
> storeOffsetsWithPositions=true. It is unrelated to term vectors.
>
> Hopefully this helps.
>
> On Mon, Mar 10, 2014 at 9:32 PM, Jefferson French 
> wrote:
> > This looks like a codec issue, but I'm not sure how to address it. I've
> > found that a different instance of DocsAndPositionsEnum is instantiated
> > between my code and Solr's TermVectorComponent.
> >
> > Mine:
> > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$EverythingEnum
> > Solr:
> org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVDocsEnum
> >
> > As far as I can tell, I've only used Lucene/Solr 4.6, so I'm not sure
> where
> > the Lucene 4.1 reference comes from. I've searched through the Solr
> config
> > files and can't see where to change the codec, but shouldn't the reader
> use
> > the same codec as used when the index was created?
> >
> >
> > On Fri, Mar 7, 2014 at 1:37 PM, Jefferson French  >wrote:
> >
> >> We have an API on top of Lucene 4.6 that I'm trying to adapt to running
> >> under Solr 4.6. The problem is although I'm getting the correct offsets
> >> when the index is created by Lucene, the same method calls always
> return -1
> >> when the index is created by Solr. In the latter case I can see the
> >> character offsets via Luke, and I can even get them from Solr when I
> access
> >> the /tvrh search handler, which uses the TermVectorComponent class.
> >>
> >> This is roughly how I'm reading character offsets in my Lucene code:
> >>
> >>> AtomicReader reader = ...
> >>> Term term = ...
> >>> DocsAndPositionsEnum postings = reader.termPositionsEnum(term);
> >>> while (postings.nextDoc() != DocsAndPositionsEnum.NO_MORE_DOCS) {
> >>>   for (int i = 0; i < postings.freq(); i++) {
> >>> System.out.println("start:" + postings.startOffset());
> >>> System.out.println("end:" + postings.endOffset());
> >>>   }
> >>> }
> >>
> >>
> >> Notice that I want the values for a single term. When run against an
> index
> >> created by Solr, the above calls to startOffset() and endOffset() return
> >> -1. Solr's TermVectorComponent prints the correct offsets like this
> >> (paraphrased):
> >>
> >> IndexReader reader = searcher.getIndexReader();
> >>> Terms vector = reader.getTermVector(docId, field);
> >>> TermsEnum termsEnum = vector.iterator(termsEnum);
> >>> int freq = (int) termsEnum.totalTermFreq();
> >>> DocsAndPositionsEnum dpEnum = null;
> >>> while((text = termsEnum.next()) != null) {
> >>>   String term = text.utf8ToString();
> >>>   dpEnum = termsEnum.docsAndPositions(null, dpEnum);
> >>>   dpEnum.nextDoc();
> >>>   for (int i = 0; i < freq; i++) {
> >>> final int pos = dpEnum.nextPosition();
> >>> System.out.println("start:" + dpEnum.startOffset());
> >>> System.out.println("end:" + dpEnum.endOffset());
> >>>   }
> >>> }
> >>
> >>
> >> but in this case it is getting the offsets per doc ID, rather than a
> >> single term, which is what I want.
> >>
> >> Could anyone tell me:
> >>
> >>1. Why I'm not able to get the offsets using my first example, and/or
> >>2. A better way to get the offsets for a given term?
> >>
> >> Thanks.
> >>
> >>Jeff
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
>


Re: Implementing a customised tokenizer

2014-03-11 Thread epnRui
Hi Ahmet,

I think the expungesDelete is done automatically through SolrJ. So I don't
think it was that.
THe problem solved by itself apparently. I wonder if it has to do with an
automatic optimization of Solr indexes?
Otherwise it was something similar to XY problem :P

Thanks for the help!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Implementing-a-customised-tokenizer-tp4121355p4122864.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Apache Solr.

2014-03-11 Thread Furkan KAMACI
Hi;

I suggest you to start reading from here:
http://solr.pl/en/2011/04/04/indexing-files-like-doc-pdf-solr-and-tika-integration/

Thanks;
Furkan KAMACI


2014-03-11 14:44 GMT+02:00 vignesh :

>  Dear Team,
>
>
>
>Am Vignesh , at present developing keyword search using
> Apache -Solr 3.6.Have indexed pdf and I have around 1000 keywords and using
> Boolean operators(AND,OR,NOT) have passed a query and I got the required
> results for the keyword searched.
>
>
>
> Now I am trying to extract the filename containing the keyword and the
> list of keyword found in that filename and not the full content of the
> file. Kindly guide me in this it will be very helpful to carryout my task.
>
>
>
>
>
>
>
> *Thanks & Regards.*
>
> *Vignesh.V*
>
>
>
> *[image: cid:image001.jpg@01CA4872.39B33D40]*
>
> Ninestars Information Technologies Limited.,
>
> 72, Greams Road, Thousand Lights, Chennai - 600 006. India.
>
> Landline : +91 44 2829 4226 / 36 / 56   X: 144
>
> www.ninestars.in
>
>
>
> --
> STOP Virus, STOP SPAM, SAVE Bandwidth!
> www.safentrix.com 
> --
>
>


Re: Issue with spatial search

2014-03-11 Thread David Smiley (@MITRE.org)
It controls accuracy of non-point shapes.  The more accurate you want it, the 
more work Lucene must do to achieve it.  For query shapes, the impact is not 
much the last time I checked.  For indexed shapes (again, non-point shapes 
we’re talking about), however, it has an exponential curve trade-off where it’s 
increasingly painful to get close to distErrPct=0 if the shapes cover a large 
area.  I have near-term plans to address the index-time non-point shape 
accuracy but at least you don’t have that case from the scenario you gave.

~ David

From: "Steven Bower-2 [via Lucene]" 
mailto:ml-node+s472066n4122855...@n3.nabble.com>>
Date: Tuesday, March 11, 2014 at 9:45 AM
To: "Smiley, David W." mailto:dsmi...@mitre.org>>
Subject: Re: Issue with spatial search

great.. that worked!

What does distErrPct actually control, besides controlling the error
percentage? or maybe better put how does it impact perf?

steve


On Mon, Mar 10, 2014 at 11:17 PM, David Smiley (@MITRE.org) <
[hidden email]> wrote:

> Correct, Steve. Alternatively you can also put this option in your query
> after the end of the last parenthesis, as in this example from the wiki:
>
>   fq=geo:"IsWithin(POLYGON((-10 30, -40 40, -10 -20, 40 20, 0 0, -10 30)))
> distErrPct=0"
>
> ~ David
>
>
> Steven Bower wrote
> > Only points in the index.. Am I correct this won't require a reindex?
> >
> > On Monday, March 10, 2014, Smiley, David W. <
>
> > dsmiley@
>
> > > wrote:
> >
> >> Hi Steven,
> >>
> >> Set distErrPct to 0 in order to get non-point shapes to always be as
> >> accurate as maxDistErr.  Point shapes are always that accurate.  As long
> >> as
> >> you only index points, not other shapes (you don't index polygons, etc.)
> >> then distErrPct of 0 should be fine.  In fact, perhaps a future Solr
> >> version should simply use 0 as the default; the last time I did
> >> benchmarks
> >> it was pretty marginal impact of higher distErrPct.
> >>
> >> It's a fairly different story if you are indexing non-point shapes.
> >>
> >> ~ David
> >>
> >> From: Steven Bower <
>
> > smb-apache@
>
> >  
> >  >>
> >
>
> > smb-apache@
>
> >  >>
> >> Reply-To: "
>
> > solr-user@.apache
>
> >  
> >  >>
> >
>
> > solr-user@.apache
>
> >  >" <
>
> > solr-user@.apache
>
> > 
> >> 
> > solr-user@.apache
>
> >  >>
> >> Date: Monday, March 10, 2014 at 4:23 PM
> >> To: "
>
> > solr-user@.apache
>
> >  
> >  >>
> >
>
> > solr-user@.apache
>
> >  >" <
>
> > solr-user@.apache
>
> > 
> >> 
> > solr-user@.apache
>
> >  >>
> >> Subject: Re: Issue with spatial search
> >>
> >> Minor edit to the KML to adjust color of polygon
> >>
> >>
> >> On Mon, Mar 10, 2014 at 4:21 PM, Steven Bower <
>
> > smb-apache@
>
> > 
> >> 
> > smb-apache@
>
> >  >> wrote:
> >> I am seeing a "error" when doing a spatial search where a particular
> >> point
> >> is showing up within a polygon, but by all methods I've tried that point
> >> is
> >> not within the polygon..
> >>
> >> First the point is: 41.2299,29.1345 (lat/lon)
> >>
> >> The polygon is:
> >>
> >> 31.2719,32.283
> >> 31.2179,32.3681
> >> 31.1333,32.3407
> >> 30.9356,32.6318
> >> 31.0707,34.5196
> >> 35.2053,36.9415
> >> 37.2959,36.6339
> >> 40.8334,30.4273
> >> 41.1622,29.1421
> >> 41.6484,27.4832
> >> 47.0255,13.6342
> >> 43.9457,3.17525
> >> 37.0029,-5.7017
> >> 35.7741,-5.57719
> >> 34.801,-4.66201
> >> 33.345,10.0157
> >> 29.6745,18.9366
> >> 30.6592,29.1683
> >> 31.2719,32.283
> >>
> >> The geo field we are using has this config:
> >>
> >>
> >  >>
> > class="solr.SpatialRecursivePrefixTreeFieldType"
> >>distErrPct="0.025"
> >>maxDistErr="0.09"
> >>
> >>
> >>
> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
> >>units="degrees"/>
> >>
> >> The config is basically the same as the one from the docs...
> >>
> >> They query I am issuing is this:
> >>
> >> location:"Intersects(POLYGON((32.283 31.2719, 32.3681 31.2179, 32.3407
> >> 31.1333, 32.6318 30.9356, 34.5196 31.0707, 36.9415 35.2053, 36.6339
> >> 37.2959, 30.4273 40.8334, 29.1421 41.1622, 27.4832 41.6484, 13.6342
> >> 47.0255, 3.17525 43.9457, -5.7017 37.0029, -5.57719 35.7741, -4.66201
> >> 34.801, 10.0157 33.345, 18.9366 29.6745, 29.1683 30.6592, 32.283
> >> 31.2719)))"
> >>
> >> and it brings back a result where the "location" field is
> 41.2299,29.1345
> >>
> >> I've attached a KML with the polygon and the point and you can see from
> >> that, visually, that the point is not within the polygon. I also tried
> in
> >> google maps API but after playing around realize that the polygons in
> >> maps
> >> are draw in Euclidian space while the map itself is a Mercator
> >> projection..
> >> Loading the kml in earth fixes this issue but the point still lays
> >> outside
> >> the polygon.. The distance between th

Re: Facets, termvectors, relevancy and Multi word tokenizing

2014-03-11 Thread epnRui
Hi Iorixxx!

I have not optimized the index but the day after this post I saw I didn't
have this problem anymore.

I will follow your advice next time!

Now I'm avoiding so much manipulation at indexation time and I'm doing more
work in the java code in the client side.

If I had time I would implement a new tokenizer...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facets-termvectors-relevancy-and-Multi-word-tokenizing-tp4120101p4122862.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Issue with spatial search

2014-03-11 Thread Steven Bower
great.. that worked!

What does distErrPct actually control, besides controlling the error
percentage? or maybe better put how does it impact perf?

steve


On Mon, Mar 10, 2014 at 11:17 PM, David Smiley (@MITRE.org) <
dsmi...@mitre.org> wrote:

> Correct, Steve. Alternatively you can also put this option in your query
> after the end of the last parenthesis, as in this example from the wiki:
>
>   fq=geo:"IsWithin(POLYGON((-10 30, -40 40, -10 -20, 40 20, 0 0, -10 30)))
> distErrPct=0"
>
> ~ David
>
>
> Steven Bower wrote
> > Only points in the index.. Am I correct this won't require a reindex?
> >
> > On Monday, March 10, 2014, Smiley, David W. <
>
> > dsmiley@
>
> > > wrote:
> >
> >> Hi Steven,
> >>
> >> Set distErrPct to 0 in order to get non-point shapes to always be as
> >> accurate as maxDistErr.  Point shapes are always that accurate.  As long
> >> as
> >> you only index points, not other shapes (you don't index polygons, etc.)
> >> then distErrPct of 0 should be fine.  In fact, perhaps a future Solr
> >> version should simply use 0 as the default; the last time I did
> >> benchmarks
> >> it was pretty marginal impact of higher distErrPct.
> >>
> >> It's a fairly different story if you are indexing non-point shapes.
> >>
> >> ~ David
> >>
> >> From: Steven Bower <
>
> > smb-apache@
>
> >  
> >  >>
> >
>
> > smb-apache@
>
> >  >>
> >> Reply-To: "
>
> > solr-user@.apache
>
> >  
> >  >>
> >
>
> > solr-user@.apache
>
> >  >" <
>
> > solr-user@.apache
>
> > 
> >> 
> > solr-user@.apache
>
> >  >>
> >> Date: Monday, March 10, 2014 at 4:23 PM
> >> To: "
>
> > solr-user@.apache
>
> >  
> >  >>
> >
>
> > solr-user@.apache
>
> >  >" <
>
> > solr-user@.apache
>
> > 
> >> 
> > solr-user@.apache
>
> >  >>
> >> Subject: Re: Issue with spatial search
> >>
> >> Minor edit to the KML to adjust color of polygon
> >>
> >>
> >> On Mon, Mar 10, 2014 at 4:21 PM, Steven Bower <
>
> > smb-apache@
>
> > 
> >> 
> > smb-apache@
>
> >  >> wrote:
> >> I am seeing a "error" when doing a spatial search where a particular
> >> point
> >> is showing up within a polygon, but by all methods I've tried that point
> >> is
> >> not within the polygon..
> >>
> >> First the point is: 41.2299,29.1345 (lat/lon)
> >>
> >> The polygon is:
> >>
> >> 31.2719,32.283
> >> 31.2179,32.3681
> >> 31.1333,32.3407
> >> 30.9356,32.6318
> >> 31.0707,34.5196
> >> 35.2053,36.9415
> >> 37.2959,36.6339
> >> 40.8334,30.4273
> >> 41.1622,29.1421
> >> 41.6484,27.4832
> >> 47.0255,13.6342
> >> 43.9457,3.17525
> >> 37.0029,-5.7017
> >> 35.7741,-5.57719
> >> 34.801,-4.66201
> >> 33.345,10.0157
> >> 29.6745,18.9366
> >> 30.6592,29.1683
> >> 31.2719,32.283
> >>
> >> The geo field we are using has this config:
> >>
> >>
> >  >>
> > class="solr.SpatialRecursivePrefixTreeFieldType"
> >>distErrPct="0.025"
> >>maxDistErr="0.09"
> >>
> >>
> >>
> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
> >>units="degrees"/>
> >>
> >> The config is basically the same as the one from the docs...
> >>
> >> They query I am issuing is this:
> >>
> >> location:"Intersects(POLYGON((32.283 31.2719, 32.3681 31.2179, 32.3407
> >> 31.1333, 32.6318 30.9356, 34.5196 31.0707, 36.9415 35.2053, 36.6339
> >> 37.2959, 30.4273 40.8334, 29.1421 41.1622, 27.4832 41.6484, 13.6342
> >> 47.0255, 3.17525 43.9457, -5.7017 37.0029, -5.57719 35.7741, -4.66201
> >> 34.801, 10.0157 33.345, 18.9366 29.6745, 29.1683 30.6592, 32.283
> >> 31.2719)))"
> >>
> >> and it brings back a result where the "location" field is
> 41.2299,29.1345
> >>
> >> I've attached a KML with the polygon and the point and you can see from
> >> that, visually, that the point is not within the polygon. I also tried
> in
> >> google maps API but after playing around realize that the polygons in
> >> maps
> >> are draw in Euclidian space while the map itself is a Mercator
> >> projection..
> >> Loading the kml in earth fixes this issue but the point still lays
> >> outside
> >> the polygon.. The distance between the edge of the polygon closes to the
> >> point and the point itself is ~1.2 miles which is much larger than the
> >> 1meter accuracy given by the maxDistErr (per the docs).
> >>
> >> Any thoughts on this?
> >>
> >> Thanks,
> >>
> >> Steve
> >>
> >>
>
>
>
>
>
> -
>  Author:
> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Issue-with-spatial-search-tp4122690p4122744.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


RE: Many PDFs indexed but only one returned in te Solr-UI

2014-03-11 Thread Croci Francesco Luigi (ID SWS)
Hi Erik,

you were right...

I had the "signatureField" bound to the "uid" in the solrconfig.xml, so the uid 
was always the same.
Now I defined a new field for the "signatureField" and it works!

Before:
...


false
uid  <-
true
content
10
.2
solr.update.processor.TextProfileSignature



...


...






uid


After:
...


false
signatureField  
<-
true
content
10
.2
solr.update.processor.TextProfileSignature



...


...


  <--




uid


Greetings
Francesco

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Dienstag, 11. März 2014 12:46
To: solr-user@lucene.apache.org
Subject: Re: Many PDFs indexed but only one returned in te Solr-UI

Hmmm, that looks OK to me. I'd log out
the id you assign for each document,
it's _possible_ that somehow you're
getting the same ID for all the files
except this line should be preventing that:
 doc.addField("id", document);

Tail the Solr log while you're doing this and see the update messages to insure 
that there are more than one. And I'm assuming that you've got more than one 
file in your directory.


BTW, doing the commit after every doc is generally poor practice in 
production.I know you're just testing now, but thought I'd mention it. Let 
autocommit handle most of it and (perhaps) commit once at the end.

Hmmm, silly question perhaps, but are you absolutely sure that you're querying 
the same core you're indexing to? On the same machine?
Sometimes as a sanity check I'll add, say, a timestamp to the id field (i.e.
doc.add("id", filename + timestamp) just to have something that changes every 
run.

Best
Erick

On Tue, Mar 11, 2014 at 6:00 AM, Croci  Francesco Luigi (ID SWS) 
 wrote:
> I followed the example here 
> (http://searchhub.org/2012/02/14/indexing-with-solrj/) for indexing all the 
> pdfs in a directory. The process seems to work well, but at the end, when I 
> go in the Solr-UI and click on "Execute query"(with q=*:*), I get only one 
> entry.
>
> Do I miss something in my code?
>
> ...
>
> String[] files = documentDir.list();
>
>
>
> if (files != null)
>
> {
>
>   for (String document : files)
>
>   {
>
> ContentHandler textHandler = new BodyContentHandler();
>
> Metadata metadata = new Metadata();
>
> ParseContext context = new ParseContext();
>
> AutoDetectParser autoDetectParser = new AutoDetectParser();
>
>
>
> InputStream inputStream = null;
>
>
>
> try
>
> {
>
>   inputStream = new FileInputStream(new File(documentDir, 
> document));
>
>
>
>   autoDetectParser.parse(inputStream, textHandler, metadata, 
> context);
>
>
>
>   SolrInputDocument doc = new SolrInputDocument();
>
>   doc.addField("id", document);
>
>
>
>   String content = textHandler.toString();
>
>
>
>   if (content != null)
>
>   {
>
> doc.addField("fullText", content);
>
>   }
>
>
>
>   UpdateResponse resp = server.add(doc, 1);
>
>
>
>   server.commit(true, true, true);
>
>
>
>   if (resp.getStatus() != 0)
>
>   {
>
> throw new IDSystemException(LOG, "Document could not be 
> indexed. Status returned: " + resp.getStatus());
>
>   }
>
> }
>
> catch (FileNotFoundException fnfe)
>
> {
>
>   throw new IDSystemException(LOG, fnfe.getMessage(), fnfe);
>
> }
>
> catch (IOException ioe)
>
> {
>
>   throw new IDSystemException(LOG, ioe.getMessage(), ioe);
>
> }
>
> catch (SAXException se)
>
> {
>
>   throw new IDSystemException(LOG, se.getMessage(), se);
>
> }
>
> catch (TikaException te)
>
> {
>
>   throw new IDSystemException(LOG, te.getMessage(), te);
>
> }
>
> catch (SolrServerException sse)
>
> {
>
>   throw new IDSystemException(LOG, sse.getMessage(), sse);
>
> }
>
> finally
>
> {
>
>   if (inputStream != null)
>
>   {
>
> try
>
> {
>
>   inputStream.close();
>
> }
>
> catch (IOException ioe)
>
> {
>
>   throw new IDSystemException(LOG, ioe.getMessage(), ioe);
>
> }
>
>   }
>
> }
>
>...
>
> Thank you for any hint.
>
> Francesco


Re: NOT SOLVED searches for single char tokens instead of from 3 uppwards

2014-03-11 Thread Jack Krupansky
The usual use of an ngram filter is at index time and not at query time. 
What exactly are you trying to achieve by using ngram filtering at query 
time as well as index time?


Generally, it is inappropriate to combine the word delimiter filter with the 
standard tokenizer - the later removes the punctuation that normally 
influences how WDF treats the parts of a token. Use the white space 
tokenizer if you intend to use WDF.


Which query parser are you using? What fields are being queried?

Please post the parsed query string from the debug output - it will show the 
precise generated query.


I think what you are seeing is that the ngram filter is generating tokens 
like "h_cugtest" and then the WDF is removing the underscore and then "h" 
gets generated as a separate token.


-- Jack Krupansky

-Original Message- 
From: Andreas Owen

Sent: Tuesday, March 11, 2014 5:09 AM
To: solr-user@lucene.apache.org
Subject: RE: NOT SOLVED searches for single char tokens instead of from 3 
uppwards


I got it roght the first time and here is my requesthandler. The field 
"plain_text" is searched correctly and has the sam fieldtype as "title" -> 
"text_de"


class="solr.SynonymExpandingExtendedDismaxQParserPlugin">

 

 
standard
 
 
shingle
true
true
2
4
 
 
synonym
solr.KeywordTokenizerFactory
synonyms.txt
true
true
 

 




  explicit
  10
  synonym_edismax
  true
  plain_text^10 editorschoice^200
title^20 h_*^14
tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10
contentmanager^5 links^5
last_modified^5 url^5
  

{!q.op=OR} (*:* -organisations:["" TO *] -roles:["" TO *]) 
(+organisations:($org) +roles:($r)) (-organisations:["" TO *] +roles:($r)) 
(+organisations:($org) -roles:["" TO *])
  (expiration:[NOW TO *] OR (*:* -expiration:*))^6 


  div(clicks,max(displays,1))^8 

  text
  *,path,score
  json
  AND

  
  on
  plain_text,title
  200
  
  


   on
1
   {!ex=inhaltstyp_s}inhaltstyp_s
index
{!ex=doctype}doctype
index
{!ex=thema_f}thema_f
index
{!ex=author_s}author_s
index
{!ex=sachverstaendiger_s}sachverstaendiger_s
index
{!ex=veranstaltung_s}veranstaltung_s
index
{!ex=last_modified}last_modified
+1MONTH
NOW/MONTH+1MONTH
NOW/MONTH-36MONTHS
after

  




i have a field with the following type:


  


   
   
  
  
  
  



shouldn't this make tokens from 3 to 15 in length and not from 1? heres is a 
query report of 2 results:



  0  125  truetitle,roles,organisations,idtrueyh_cugtest1394522589347xmlorganisations:* roles:*   
   ..

1.6365329 = (MATCH) sum of:   1.6346203 = (MATCH) max of:
0.14759353 = (MATCH) product of:   0.28596246 = (MATCH) sum of:
0.01528686 = (MATCH) weight(plain_text:cug in 0) [DefaultSimilarity],
result of:   0.01528686 = score(doc=0,freq=1.0 = termFreq=1.0
), product of: 0.035319194 = queryWeight, product of:
5.540098 = idf(docFreq=9, maxDocs=937)   0.0063751927 =
queryNorm 0.43282017 = fieldWeight in 0, product of:
1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0
5.540098 = idf(docFreq=9, maxDocs=937)   0.078125 =
fieldNorm(doc=0) 0.0119499 = (MATCH) weight(plain_text:ugt in
0) [DefaultSimilarity], result of:   0.0119499 =
score(doc=0,freq=1.0 = termFreq=1.0 ),

product of: 0.031227252 = queryWeight, product of:

4.8982444 = idf(docFreq=18, maxDocs=937)   0.0063751927 =
queryNorm 0.38267535 = fieldWeight in 0, product of:
1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0
4.8982444 = idf(docFreq=18, maxDocs=937)   0.078125 =
fieldNorm(doc=0) 0.019351374 = (MATCH) weight(plain_text:yhc
in 0) [DefaultSimilarity], result of:   0.019351374 =
score(doc=0,freq=1.0 = termFreq=1.0 ), product of:
0.03973814 = queryWeight, product of:   6.2332454 =
idf(docFreq=4, maxDocs=937)   0.0063751927 = queryNorm
0.4869723 = fieldWeight in 0, product of:   1.0 =
tf(freq=1.0), with freq of: 1.0 = termFreq=1.0
6.2332454 =
idf(docFreq=4, maxDocs=937)   0.078125 = fieldNorm(doc=0) 
0.019351374 = (MATCH)

weight(plain_text:hcu in 0) [DefaultSimilarity], result of:
0.019351374 = score(doc=0,freq=1.0 = termFreq=1.0 ), product of:
0.03973814 = queryWeight, product of:   6.2332454 =
idf(docFreq=4, maxDocs=937)   0.0063751927 = queryNorm
0.4869723 = fieldWeight in 0, product of:   1.0 =
tf(freq=1.0), with freq of: 1.0 = termFreq=1.0
6.2332454 = idf(docFreq=4, maxDocs=937)   0.078125 =
fieldNorm(doc=0) 0.01528686 = (MATCH) weight(plain_text:cug in
0) [DefaultSimilarity], result of:   0.01528686 =
score(doc=0,freq=1.0 = termFreq=1.0 ), product of:
0.035319194 = queryWeight, product of:   5.540098 =
idf(docFreq=9, maxDocs=937

Re: Apache Solr.

2014-03-11 Thread Jack Krupansky
Add a copyField to your schema to copy the file name string field to a 
tokenized text field. You can then query both the string field and the text 
field.

-- Jack Krupansky

From: vignesh 
Sent: Tuesday, March 11, 2014 8:44 AM
To: solr-user@lucene.apache.org 
Subject: Apache Solr.

Dear Team,

 

   Am Vignesh , at present developing keyword search using Apache 
–Solr 3.6.Have indexed pdf and I have around 1000 keywords and using Boolean 
operators(AND,OR,NOT) have passed a query and I got the required results for 
the keyword searched.

 

Now I am trying to extract the filename containing the keyword and the list of 
keyword found in that filename and not the full content of the file. Kindly 
guide me in this it will be very helpful to carryout my task.

 

 

 

Thanks & Regards.

Vignesh.V

 



Ninestars Information Technologies Limited.,

72, Greams Road, Thousand Lights, Chennai - 600 006. India.

Landline : +91 44 2829 4226 / 36 / 56   X: 144

www.ninestars.in 

 




STOP Virus, STOP SPAM, SAVE Bandwidth! 
www.safentrix.com 




Apache Solr.

2014-03-11 Thread vignesh
Dear Team,

 

   Am Vignesh , at present developing keyword search using
Apache -Solr 3.6.Have indexed XML and I have around 1000 keywords and using
Boolean operators(AND,OR,NOT) have passed a query and I got the required
results for the keyword searched.

 

Now I am trying to carry out phonetic search and for example if I search for
Stephen - I need out also like Stefan, Stephan etc. Kindly guide me in
this it will be very helpful to carry out my task.

 

 

 

Thanks & Regards.

Vignesh.V

 

cid:image001.jpg@01CA4872.39B33D40

Ninestars Information Technologies Limited.,

72, Greams Road, Thousand Lights, Chennai - 600 006. India.

Landline : +91 44 2829 4226 / 36 / 56   X: 144

 http://www.ninestars.in/> www.ninestars.in 

 


--
STOP Virus, STOP SPAM, SAVE Bandwidth!
http://www.safentrix.com/adlink?cid=0
--

use local params in query

2014-03-11 Thread Andreas Owen
Shouldn't the numbers be in the output below (parsed_filter_queries) and not
$r and $org? 

 

This works great but i would like to use lacal params "r" and "org" instead
of hard-coded

 (*:* -organisations:[* TO *] -roles:[* TO
*]) (+organisations:(150 42) +roles:(174 72))

 

I would like

 (*:* -organisations:[* TO *] -roles:[* TO
*]) (+organisations:($org) +roles:($r))

 

I use this in my requesthandler under invariant because i need it to be
added to the query without being able to be overriden. Oh and i use facets
so fq has to be combinable. This should work or am i understanding it wrong?

 

Debug query:

 



  0

  109

  

true

true

267

yh_cug

1394533792473

xml

  

...



{!q.op=OR} (*:* -organisations:["" TO *] -roles:["" TO *])
(+organisations:($org) +roles:($r)) (-organisations:["" TO *] +roles:($r))
(+organisations:($org) -roles:["" TO *])

  

  

(MatchAllDocsQuery(*:*) -organisations:["" TO *] -roles:["" TO *])
(+organisations:$org +roles:$r) (-organisations:["" TO *] +roles:$r)
(+organisations:$org -roles:["" TO *])

  

 

 

 

 

 



Apache Solr.

2014-03-11 Thread vignesh
Dear Team,

 

   Am Vignesh , at present developing keyword search using
Apache -Solr 3.6.Have indexed pdf and I have around 1000 keywords and using
Boolean operators(AND,OR,NOT) have passed a query and I got the required
results for the keyword searched.

 

Now I am trying to extract the filename containing the keyword and the list
of keyword found in that filename and not the full content of the file.
Kindly guide me in this it will be very helpful to carryout my task.

 

 

 

Thanks & Regards.

Vignesh.V

 

cid:image001.jpg@01CA4872.39B33D40

Ninestars Information Technologies Limited.,

72, Greams Road, Thousand Lights, Chennai - 600 006. India.

Landline : +91 44 2829 4226 / 36 / 56   X: 144

 http://www.ninestars.in/> www.ninestars.in 

 


--
STOP Virus, STOP SPAM, SAVE Bandwidth!
http://www.safentrix.com/adlink?cid=0
--

Re: Optimizing RAM

2014-03-11 Thread abhishek . netjain
Hi all,
What should be the ideal RAM index size ratio.

please reply I expect index to be of size of 60 gb and I dont store contents. 
Thanks 
Abhishek

  Original Message  
From: abhishek.netj...@gmail.com
Sent: Monday, 10 March 2014 09:25
To: solr-user@lucene.apache.org
Cc: Erick Erickson
Subject: Re: Optimizing RAM

Hi,
If I go with copy field than will it increase I/O load considering I have RAM 
less than one third of total index size?

Thanks 
Abhishek

  Original Message  
From: Erick Erickson
Sent: Monday, 10 March 2014 01:37
To: solr-user@lucene.apache.org
Reply To: solr-user@lucene.apache.org
Subject: Re: Optimizing RAM

I'd go for a copyField, keep the stemmed and unstemmed
version in the same index.

An alternative (and I think there's a JIRA for this if not an
outright patch) is implement a "special" filter that, say, puts
the original tken in with a special character, say $ at the
end, i.e. if indexing "running", you'd index both "running$" and
"run". Then when you want exact match, you search for "running$".

Best,
Erick

On Sun, Mar 9, 2014 at 2:55 PM, abhishek jain
 wrote:
> hi friends,
> I want to index some good amount of data, i want to keep both stemmed and
> unstemmed versions ,
> I am confused should i keep two separate indexes or keep one index with two
> versions or column , i mean col1_stemmed and col2_unstemmed.
>
> I have multicore with multi shard configuration.
> My server have 32 GB RAM and stemmed index size (without content) i
> calculated as 60 GB .
> I want to not put too much load and I/O load on a decent server with some 5
> other replicated servers and want to use servers for other purposes also.
>
>
> Also is it advised to server queries from master server or only from slaves?
> --
> Thanks,
> Abhishek


Re: Many PDFs indexed but only one returned in te Solr-UI

2014-03-11 Thread Erick Erickson
Hmmm, that looks OK to me. I'd log out
the id you assign for each document,
it's _possible_ that somehow you're
getting the same ID for all the files
except this line should be preventing that:
 doc.addField("id", document);

Tail the Solr log while you're doing this and
see the update messages to insure that there
are more than one. And I'm assuming that
you've got more than one file in your directory.


BTW, doing the commit after every doc is
generally poor practice in production.I know
you're just testing now, but thought I'd
mention it. Let autocommit handle most of it
and (perhaps) commit once at the end.

Hmmm, silly question perhaps, but are you
absolutely sure that you're querying the same
core you're indexing to? On the same machine?
Sometimes as a sanity check I'll add, say,
a timestamp to the id field (i.e.
doc.add("id", filename + timestamp) just to
have something that changes every run.

Best
Erick

On Tue, Mar 11, 2014 at 6:00 AM, Croci  Francesco Luigi (ID SWS)
 wrote:
> I followed the example here 
> (http://searchhub.org/2012/02/14/indexing-with-solrj/) for indexing all the 
> pdfs in a directory. The process seems to work well, but at the end, when I 
> go in the Solr-UI and click on "Execute query"(with q=*:*), I get only one 
> entry.
>
> Do I miss something in my code?
>
> ...
>
> String[] files = documentDir.list();
>
>
>
> if (files != null)
>
> {
>
>   for (String document : files)
>
>   {
>
> ContentHandler textHandler = new BodyContentHandler();
>
> Metadata metadata = new Metadata();
>
> ParseContext context = new ParseContext();
>
> AutoDetectParser autoDetectParser = new AutoDetectParser();
>
>
>
> InputStream inputStream = null;
>
>
>
> try
>
> {
>
>   inputStream = new FileInputStream(new File(documentDir, document));
>
>
>
>   autoDetectParser.parse(inputStream, textHandler, metadata, context);
>
>
>
>   SolrInputDocument doc = new SolrInputDocument();
>
>   doc.addField("id", document);
>
>
>
>   String content = textHandler.toString();
>
>
>
>   if (content != null)
>
>   {
>
> doc.addField("fullText", content);
>
>   }
>
>
>
>   UpdateResponse resp = server.add(doc, 1);
>
>
>
>   server.commit(true, true, true);
>
>
>
>   if (resp.getStatus() != 0)
>
>   {
>
> throw new IDSystemException(LOG, "Document could not be indexed. 
> Status returned: " + resp.getStatus());
>
>   }
>
> }
>
> catch (FileNotFoundException fnfe)
>
> {
>
>   throw new IDSystemException(LOG, fnfe.getMessage(), fnfe);
>
> }
>
> catch (IOException ioe)
>
> {
>
>   throw new IDSystemException(LOG, ioe.getMessage(), ioe);
>
> }
>
> catch (SAXException se)
>
> {
>
>   throw new IDSystemException(LOG, se.getMessage(), se);
>
> }
>
> catch (TikaException te)
>
> {
>
>   throw new IDSystemException(LOG, te.getMessage(), te);
>
> }
>
> catch (SolrServerException sse)
>
> {
>
>   throw new IDSystemException(LOG, sse.getMessage(), sse);
>
> }
>
> finally
>
> {
>
>   if (inputStream != null)
>
>   {
>
> try
>
> {
>
>   inputStream.close();
>
> }
>
> catch (IOException ioe)
>
> {
>
>   throw new IDSystemException(LOG, ioe.getMessage(), ioe);
>
> }
>
>   }
>
> }
>
>...
>
> Thank you for any hint.
>
> Francesco


Re: How to customize Solr

2014-03-11 Thread Erick Erickson
You can also google for Solr PostFilters,
which were originally written for ACL control.

Best,
Erick

On Tue, Mar 11, 2014 at 5:28 AM, Ahmet Arslan  wrote:
> Hi,
>
> In the link has two custom classes : AccessControlQParserPlugin and 
> AccessControlQuery. They can be used as an example to write 
> OnlineUsersQParserPlugin and OnlineUsersQuery. This Query implementation can 
> _only_ be used as an fq. They can be loaded as described here : 
> https://wiki.apache.org/solr/SolrPlugins
>
> Ahmet
>
>
>
> On Tuesday, March 11, 2014 7:24 AM, ~$alpha`  wrote:
> the link you provided has no information about customizing
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-customize-Solr-tp4122551p4122760.html
>
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Shard replication

2014-03-11 Thread Erick Erickson
Here's a good explanation of replicationFactor:
http://wiki.apache.org/solr/SolrCloud
You don't want to define this statically, it's about
the number of nodes not _how_ they replicate.

This will explain the indexing process:
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud.
There really isn't one unless a replica goes off-line, all files are
indexed to all replicas at the same time.

Multiple collections work just fine, give them
separate names is all and address them as you
would cores for requests, i.e.
...solr/collection/select?

For your specific question, set up 4 shards
with a replicationFactor of at least 2.

Best,
Erick



On Tue, Mar 11, 2014 at 4:40 AM, Gastone Penzo  wrote:
> Hello,
> i i'm testing new solr 4.7 with solr cloud and solr replication.
> I can't find any documentation on replicationFactor parameter.
> It seems it can be passed only by api on the creation of new collection.
> How does this parameter works?
> It there a way to specify it statically on solrconfig.xml?
>
> Another question:
> How does replication (not the standard master/slave, but the shard one)
> works?
>
> I explain my situation.
>
> i'd like to setup 4 nodes with 1 collection and 4 shards and obtain some
> type of replication such that if one of four nodes goes down, all data are
> still available.
> And if i have multiple collections?
>
> Thank you
>
> *Gastone Penzo*


Re: Result merging takes too long

2014-03-11 Thread Erick Erickson
In SolrCloud there are a couple of round trips
that _may_ be what you're seeing.

First, though, the QTime is the time spent
querying, it does NOT include assembling
the documents from disk for return etc., so
bear that in mind

But here's the sequence as I understand it
from the receiving node's viewpoint.
1> send the query out to one replica for
each shard
2> get the top N doc IDs and scores (
or whatever sorting criteria) from each
shard.
3> Merge the lists and select the top N
to return
4> request the actual documents for
the top N list from each of the shards
5> return the list.

So as you can see, there's an extra
round trip to each shard to get the
full document. Perhaps this is what
you're seeing? <4> seems like it
might be what you're seeing, I don't
think it's counted in QTime.

HTH
Erick

On Tue, Mar 11, 2014 at 3:17 AM, remi tassing  wrote:
> Hi,
>
> I've just setup a SolrCloud with Tomcat. 5 Shards with one replication each
> and total 10million docs (evenly distributed).
>
> I've noticed the query response time is faster than using one single node
> but still not as fast as I expected.
>
> After turning debugQuery on, I noticed the query time is different to the
> value returned in the debug explanation (see some excerpt below). More
> importantly, while making a query to one, and only one, shard then the
> result is consistent. It appears the server spends most of its time doing
> result aggregation (merging).
>
> After searching on Google in vain I didn't find anything concrete except
> that the problem could be in 'SearchComponent'.
>
> Could you point me in the right direction (e.g. configuration...)?
>
> Thanks!
>
> Remi
>
> Solr Cloud result:
>
> 
>
> 0
>
> 3471
>
> 
>
> on
>
> project development agile
>
> 
>
> 
>
>  maxScore="0.17022902">...
>
> ...
>
>
>
> 
>
> 508.0
>
> 
>
> 8.0
>
> 
>
> 8.0
>
> 
>
> 
>
> 0.0
>
> 
>
> 
>
> 0.0
>
> 
>
> 
>
> 0.0
>
> 
>
> 
>
> 0.0
>
> 
>
> 
>
> 0.0
>
> 
>
> 
>
> 
>
> 499.0
>
> 
>
> 195.0
>
> 
>
> 
>
> 0.0
>
> 
>
> 
>
> 0.0
>
> 
>
> 
>
> 228.0
>
> 
>
> 
>
> 0.0
>
> 
>
> 
>
> 76.0
>
> 
>
> 
>
> 


query with local params

2014-03-11 Thread Andreas Owen
This works great but i would like to use lacal params "r" and "org" instead of 
hard-coded
 (*:* -organisations:[* TO *] -roles:[* TO *]) 
(+organisations:(150 42) +roles:(174 72))

I would like
 (*:* -organisations:[* TO *] -roles:[* TO *]) 
(+organisations:($org) +roles:($r))

Shouldn't the numbers be in the output below (parsed_filter_queries) and not $r 
and $org? I use this in my requesthandler and need it to be added as fq or 
query params without being able to be overriden, has anybody any idees? Oh and 
i use facets so fq has to be combinable.

Debug query:


  0
  109
  
true
true
267
yh_cug
1394533792473
xml
  
...

{!q.op=OR} (*:* -organisations:["" TO *] -roles:["" TO *]) 
(+organisations:($org) +roles:($r)) (-organisations:["" TO *] +roles:($r)) 
(+organisations:($org) -roles:["" TO *])
  
  
(MatchAllDocsQuery(*:*) -organisations:["" TO *] -roles:["" TO *]) 
(+organisations:$org +roles:$r) (-organisations:["" TO *] +roles:$r) 
(+organisations:$org -roles:["" TO *])
  







Re: confirm unsubscribe from solr-user@lucene.apache.org

2014-03-11 Thread Daniel Exner
Am Tue, 11 Mar 2014 11:10:28 +0100
schrieb "solr-user-h...@lucene.apache.org"
:

> Hi! This is the ezmlm program. I'm managing the
> solr-user@lucene.apache.org mailing list.
> 
> I'm working for my owner, who can be reached
> at solr-user-ow...@lucene.apache.org.
> 
> To confirm that you would like
> 
>daniel.ex...@esemos.de
> 
> removed from the solr-user mailing list, please send a short reply 
> to this address:
> 
>
> solr-user-uc.1394532628.ekahnncjocdebhjoicac-daniel.exner=esemos...@lucene.apache.org
> 
> Usually, this happens when you just hit the "reply" button.
> If this does not work, simply copy the address and paste it into
> the "To:" field of a new message.
> 
> or click here:
>   
> mailto:solr-user-uc.1394532628.ekahnncjocdebhjoicac-daniel.exner=esemos...@lucene.apache.org
> 
> I haven't checked whether your address is currently on the mailing
> list. To see what address you used to subscribe, look at the messages
> you are receiving from the mailing list. Each message has your
> address hidden inside its return path; for example, m...@xdd.ff.com
> receives messages with return path:
> -mary=xdd.ff@lucene.apache.org.
> 
> Some mail programs are broken and cannot handle long addresses. If you
> cannot reply to this request, instead send a message to
>  and put the entire address
> listed above into the "Subject:" line.
> 
> 
> --- Administrative commands for the solr-user list ---
> 
> I can handle administrative requests automatically. Please
> do not send them to the list address! Instead, send
> your message to the correct command address:
> 
> To subscribe to the list, send a message to:
>
> 
> To remove your address from the list, send a message to:
>
> 
> Send mail to the following for info and FAQ for this list:
>
>
> 
> Similar addresses exist for the digest list:
>
>
> 
> To get messages 123 through 145 (a maximum of 100 per request), mail:
>
> 
> To get an index with subject and author for messages 123-456 , mail:
>
> 
> They are always returned as sets of 100, max 2000 per request,
> so you'll actually get 100-499.
> 
> To receive all messages with the same subject as message 12345,
> send a short message to:
>
> 
> The messages should contain one line or word of text to avoid being
> treated as sp@m, but I will ignore their content.
> Only the ADDRESS you send to is important.
> 
> You can start a subscription for an alternate address,
> for example "john@host.domain", just add a hyphen and your
> address (with '=' instead of '@') after the command word:
> 
> 
> To stop subscription for this address, mail:
> 
> 
> In both cases, I'll send a confirmation message to that address. When
> you receive it, simply reply to it to complete your subscription.
> 
> If despite following these instructions, you do not get the
> desired results, please contact my owner at
> solr-user-ow...@lucene.apache.org. Please be patient, my owner is a
> lot slower than I am ;-)
> 
> --- Enclosed is a copy of the request I received.
> 
> Return-Path: 
> Received: (qmail 35266 invoked by uid 99); 11 Mar 2014 10:10:27 -
> Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230)
> by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Mar 2014
> 10:10:27 + X-ASF-Spam-Status: No, hits=-12.0 required=10.0
>   tests=ASF_LIST_OPS,ASF_LIST_UNSUB_A,SPF_PASS
> X-Spam-Check-By: apache.org
> Received-SPF: pass (nike.apache.org: local policy)
> Received: from [81.20.94.250] (HELO mx-relay07.cloudservice.ag)
> (81.20.94.250) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Mar
> 2014 10:10:20 + Received: from fw1.hostedoffice.ag
> ([81.20.90.82]) by mx-gate07.cloudservice.ag; Tue, 11 Mar 2014
> 11:09:52 +0100 Received: from EX10HUB1.hosting.inetserver.de (unknown
> [10.20.10.69]) by qhexrelay2.hosting.inetserver.de (Postfix) with
> ESMTP id 371F9187075 for ;
> Tue, 11 Mar 2014 11:09:53 +0100 (CET) Received: from
> swordbreaker.fritz.box (217.92.221.124) by mail.hostedoffice.ag
> (10.20.10.202) with Microsoft SMTP Server (TLS) id 14.3.158.1; Tue,
> 11 Mar 2014 11:11:18 +0100 Date: Tue, 11 Mar 2014 11:09:44 +0100
> From: Daniel Exner 
> To: 
> Subject: Unsubscribe
> Message-ID: <2014030944.0cc54...@swordbreaker.fritz.box>
> Organization: ESEMOS GmbH
> X-Mailer: Claws Mail 3.9.3 (GTK+ 2.24.22; x86_64-frugalware-linux-gnu)
> MIME-Version: 1.0
> Content-Type: text/plain; charset="ISO-8859-1"
> Content-Transfer-Encoding: quoted-printable
> X-Originating-IP: [217.92.221.124]
> X-EXCLAIMER-MD-CONFIG: e299f49b-79fd-4cde-afcd-99df35de8a6e
> X-cloud-security-sender:daniel.ex...@esemos.de
> X-cloud-security-recipient:solr-user-unsubscr...@lucene.apache.org
> X-cloud-security-Virusscan:CLEAN
> X-cloud-security-disclaimer: This E-Mail was scanned by E-Mailservice
> on mx-gate07 with 75F673880002 X-cloud-security-connect:
> fw1.hostedoffice.ag[81.20.90.82], TLS=, IP=81.20.90.82
> X-cloud-security:scantime:.1047 X-Virus-Checked: Checked by ClamAV on
>

FW: Files locked after indexing

2014-03-11 Thread Croci Francesco Luigi (ID SWS)
Hi to all,

I'm pretty new with solr and tika and I have a problem.

I have the following workflow in my (web)application:

  *   download a pdf file from an archive
  *   index the file
  *   delete the file


My problem is that after indexing the file, it remains locked and the 
delete-part throws an exception.

Here is my code-snippet for indexing the file:

try
{
   ContentStreamUpdateRequest req = new 
ContentStreamUpdateRequest("/update/extract");
   req.addFile(file, type);
   req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);

   NamedList result = server.request(req);

   Assert.assertEquals(0, ((NamedList) 
result.get("responseHeader")).get("status"));
}

I also tried the "ContentStream" way but without success:
ContentStream contentStream = null;

try
{
  contentStream = new ContentStreamBase.FileStream(document);

  ContentStreamUpdateRequest req = new 
ContentStreamUpdateRequest(UPDATE_EXTRACT_REQUEST);
  req.addContentStream(contentStream);
  req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);

  NamedList result = server.request(req);

  if (!((NamedList) 
result.get("responseHeader")).get("status").equals(0))
  {
throw new IDSystemException(LOG, "Document could not be indexed. Status 
returned: " +
 ((NamedList) 
result.get("responseHeader")).get("status"));
  }
}
   catch...
   finally
{
  try
  {
if(contentStream != null && contentStream.getStream() != null)
{
  contentStream.getStream().close();
}
  }
  catch (IOException ioe)
  {
throw new IDSystemException(LOG, ioe.getMessage(), ioe);
  }
}


Do I miss something?

Thank you
Francesco



Many PDFs indexed but only one returned in te Solr-UI

2014-03-11 Thread Croci Francesco Luigi (ID SWS)
I followed the example here 
(http://searchhub.org/2012/02/14/indexing-with-solrj/) for indexing all the 
pdfs in a directory. The process seems to work well, but at the end, when I go 
in the Solr-UI and click on "Execute query"(with q=*:*), I get only one entry.

Do I miss something in my code?

...

String[] files = documentDir.list();



if (files != null)

{

  for (String document : files)

  {

ContentHandler textHandler = new BodyContentHandler();

Metadata metadata = new Metadata();

ParseContext context = new ParseContext();

AutoDetectParser autoDetectParser = new AutoDetectParser();



InputStream inputStream = null;



try

{

  inputStream = new FileInputStream(new File(documentDir, document));



  autoDetectParser.parse(inputStream, textHandler, metadata, context);



  SolrInputDocument doc = new SolrInputDocument();

  doc.addField("id", document);



  String content = textHandler.toString();



  if (content != null)

  {

doc.addField("fullText", content);

  }



  UpdateResponse resp = server.add(doc, 1);



  server.commit(true, true, true);



  if (resp.getStatus() != 0)

  {

throw new IDSystemException(LOG, "Document could not be indexed. 
Status returned: " + resp.getStatus());

  }

}

catch (FileNotFoundException fnfe)

{

  throw new IDSystemException(LOG, fnfe.getMessage(), fnfe);

}

catch (IOException ioe)

{

  throw new IDSystemException(LOG, ioe.getMessage(), ioe);

}

catch (SAXException se)

{

  throw new IDSystemException(LOG, se.getMessage(), se);

}

catch (TikaException te)

{

  throw new IDSystemException(LOG, te.getMessage(), te);

}

catch (SolrServerException sse)

{

  throw new IDSystemException(LOG, sse.getMessage(), sse);

}

finally

{

  if (inputStream != null)

  {

try

{

  inputStream.close();

}

catch (IOException ioe)

{

  throw new IDSystemException(LOG, ioe.getMessage(), ioe);

}

  }

}

   ...

Thank you for any hint.

Francesco


Re: Partial Counts in SOLR

2014-03-11 Thread Salman Akram
Its a long video and I will definitely go through it but it seems this is
not possible with SOLR as it is?

I just thought it would be quite a common issue; I mean generally for
search engines its more important to show the first page results, rather
than using timeAllowed which might not even return a single result.

Thanks!


-- 
Regards,

Salman Akram


Re: How to customize Solr

2014-03-11 Thread Ahmet Arslan
Hi,

In the link has two custom classes : AccessControlQParserPlugin and 
AccessControlQuery. They can be used as an example to write 
OnlineUsersQParserPlugin and OnlineUsersQuery. This Query implementation can 
_only_ be used as an fq. They can be loaded as described here : 
https://wiki.apache.org/solr/SolrPlugins

Ahmet



On Tuesday, March 11, 2014 7:24 AM, ~$alpha`  wrote:
the link you provided has no information about customizing 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-customize-Solr-tp4122551p4122760.html

Sent from the Solr - User mailing list archive at Nabble.com.



RE: NOT SOLVED searches for single char tokens instead of from 3 uppwards

2014-03-11 Thread Andreas Owen
I got it roght the first time and here is my requesthandler. The field 
"plain_text" is searched correctly and has the sam fieldtype as "title" -> 
"text_de"


  

  
standard
  
  
shingle
true
true
2
4
  
  
synonym
solr.KeywordTokenizerFactory
synonyms.txt
true
true
  

  



 
   explicit
   10
   synonym_edismax
   true
   plain_text^10 editorschoice^200
title^20 h_*^14 
tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10
contentmanager^5 links^5
last_modified^5 url^5
   

{!q.op=OR} (*:* -organisations:["" TO *] -roles:["" TO *]) 
(+organisations:($org) +roles:($r)) (-organisations:["" TO *] +roles:($r)) 
(+organisations:($org) -roles:["" TO *])
   (expiration:[NOW TO *] OR (*:* 
-expiration:*))^6  
   div(clicks,max(displays,1))^8 

   text
   *,path,score
   json
   AND
   
   
   on
   plain_text,title
   200
   
   
   
 
on
1
{!ex=inhaltstyp_s}inhaltstyp_s
index
{!ex=doctype}doctype
index
{!ex=thema_f}thema_f
index
{!ex=author_s}author_s
index
{!ex=sachverstaendiger_s}sachverstaendiger_s
index
{!ex=veranstaltung_s}veranstaltung_s
index
{!ex=last_modified}last_modified
+1MONTH
NOW/MONTH+1MONTH
NOW/MONTH-36MONTHS
after


   

 


 i have a field with the following type:
 
 
   
 
 


   
   
   
   
 
 
 
 shouldn't this make tokens from 3 to 15 in length and not from 1? heres is a 
query report of 2 results:

>   0   name="QTime">125   name="debugQuery">true name="fl">title,roles,organisations,id name="indent">trueyh_cugtest name="_">1394522589347xml name="fq">organisations:* roles:*name="response" numFound="5" start="0">
>..
> 
> 1.6365329 = (MATCH) sum of:   1.6346203 = (MATCH) max of: 
> 0.14759353 = (MATCH) product of:   0.28596246 = (MATCH) sum of: 
> 0.01528686 = (MATCH) weight(plain_text:cug in 0) [DefaultSimilarity], 
> result of:   0.01528686 = score(doc=0,freq=1.0 = termFreq=1.0 
> ), product of: 0.035319194 = queryWeight, product of: 
>   
> 5.540098 = idf(docFreq=9, maxDocs=937)   0.0063751927 = 
> queryNorm 0.43282017 = fieldWeight in 0, product of:  
>  
> 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0  
>  
> 5.540098 = idf(docFreq=9, maxDocs=937)   0.078125 = 
> fieldNorm(doc=0) 0.0119499 = (MATCH) weight(plain_text:ugt in 
> 0) [DefaultSimilarity], result of:   0.0119499 = 
> score(doc=0,freq=1.0 = termFreq=1.0 ),
product of: 0.031227252 = queryWeight, product of:  
> 4.8982444 = idf(docFreq=18, maxDocs=937)   0.0063751927 = 
> queryNorm 0.38267535 = fieldWeight in 0, product of:  
>  
> 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0  
>  
> 4.8982444 = idf(docFreq=18, maxDocs=937)   0.078125 = 
> fieldNorm(doc=0) 0.019351374 = (MATCH) weight(plain_text:yhc 
> in 0) [DefaultSimilarity], result of:   0.019351374 = 
> score(doc=0,freq=1.0 = termFreq=1.0 ), product of: 
> 0.03973814 = queryWeight, product of:   6.2332454 = 
> idf(docFreq=4, maxDocs=937)   0.0063751927 = queryNorm
>  
> 0.4869723 = fieldWeight in 0, product of:   1.0 = 
> tf(freq=1.0), with freq of: 1.0 = termFreq=1.0   
> 6.2332454 =
idf(docFreq=4, maxDocs=937)   0.078125 = fieldNorm(doc=0) 
0.019351374 = (MATCH)
> weight(plain_text:hcu in 0) [DefaultSimilarity], result of:   
> 0.019351374 = score(doc=0,freq=1.0 = termFreq=1.0 ), product of: 
> 0.03973814 = queryWeight, product of:   6.2332454 = 
> idf(docFreq=4, maxDocs=937)   0.0063751927 = queryNorm
>  
> 0.4869723 = fieldWeight in 0, product of:   1.0 = 
> tf(freq=1.0), with freq of: 1.0 = termF

PHP Solr Client - spellchecker

2014-03-11 Thread rachun
Dear all gurus,


I'm having problem with trying to use spell checker for my suggestion and
I'm using PHP Solr client. So I tried to code like this
===PHP===

$config = array
(
'hostname'  => 'localhost',
'port'  => '8983',
'path'  => 'solr'
);

$q='macbook';
$client = new SolrClient($config);
$query = new SolrQuery();
$query->setQuery($q);
$query->addParam("shards.qt","/spell");
$query->addParam("fl","product_name_th");

$query_response = $client->query($query);
$result = $query_response->getResponse();

print_r($result);

 Result =

SolrObject Object
(
[responseHeader] => SolrObject Object
(
[status] => 0
[QTime] => 2
[params] => SolrObject Object
(
[fl] => product_name_th
[indent] => on
[shards.qt] => /spell
[start] => 0
[q] => macbook
[wt] => xml
[rows] => 0
[version] => 2.2
)

)

[response] => SolrObject Object
(
[numFound] => 4
[start] => 0
[docs] => 
)

)

=== Solr Log =

INFO  - 2014-03-11 15:23:48.556; org.apache.solr.core.SolrCore;
[collection1] webapp=/solr path=/select/
params={fl=product_name_th&indent=on&shards.qt=/spell&start=0&q=macbook&wt=xml&rows=0&version=2.2}
hits=4 status=0 QTime=2 
==
At this log you can see it didn't go through my requestHandler named spell
but when I try this


http://localhost:8983/solr/spell?spellcheck=true&qt=spellchecker&spellcheck.accuracy=0.8&spellcheck.collate=true&fl=product_name_th&extendedResults=true+&q=macbook

I get the result like this (which is the way I would love it to be)



0
1



Macbook Pro
Macbook air
กระเป๋าใส่ macbook air
กระเป๋าใส่ macbook pro



false

==Solr log

INFO  - 2014-03-11 15:34:57.013; org.apache.solr.core.SolrCore;
[collection1] webapp=/solr path=/spell
params={spellcheck=true&extendedResults=true&fl=product_name_th&spellcheck.accuracy=0.8&q=macbook&spellcheck.collate=true&qt=spellchecker}
hits=4 status=0 QTime=2 

At this point I can see that it goes through my requestHandler named spell


Did I do something wrong?
I really need help.

Thank you very much,
Rachun

 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/PHP-Solr-Client-spellchecker-tp4122780.html
Sent from the Solr - User mailing list archive at Nabble.com.


Shard replication

2014-03-11 Thread Gastone Penzo
Hello,
i i'm testing new solr 4.7 with solr cloud and solr replication.
I can't find any documentation on replicationFactor parameter.
It seems it can be passed only by api on the creation of new collection.
How does this parameter works?
It there a way to specify it statically on solrconfig.xml?

Another question:
How does replication (not the standard master/slave, but the shard one)
works?

I explain my situation.

i'd like to setup 4 nodes with 1 collection and 4 shards and obtain some
type of replication such that if one of four nodes goes down, all data are
still available.
And if i have multiple collections?

Thank you

*Gastone Penzo*


Re: searches for single char tokens instead of from 3 uppwards

2014-03-11 Thread Alexandre Rafalovitch
Have you tried using Analysis section in the admin web interface?

You can just pick the type from drop down and feed your string to it.
It will show you (with debug enabled) exactly what happens at every
stage and which particular step in the chain might be causing
problems.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Tue, Mar 11, 2014 at 2:45 PM, Andreas Owen  wrote:
> i have a field with the following type:
>
> 
>   
> 
> 
>  words="lang/stopwords_de.txt" format="snowball" 
> enablePositionIncrements="true"/> 
>
>  language="German"/>
>  maxGramSize="15"/>
>  generateWordParts="1" generateNumberParts="1" catenateWords="1" 
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>   
> 
>
>
> shouldn't this make tokens from 3 to 15 in length and not from 1? heres is a 
> query report of 2 results:
>   0   name="QTime">125   name="debugQuery">true name="fl">title,roles,organisations,idtrue 
>yh_cugtest1394522589347
> xmlorganisations:* roles:*  
> 
> 
>..
> 
> 1.6365329 = (MATCH) sum of:   1.6346203 = (MATCH) max of: 0.14759353 = 
> (MATCH) product of:   0.28596246 = (MATCH) sum of: 0.01528686 = 
> (MATCH) weight(plain_text:cug in 0) [DefaultSimilarity], result of:   
> 0.01528686 = score(doc=0,freq=1.0 = termFreq=1.0 ), product of: 
> 0.035319194 = queryWeight, product of:   5.540098 = 
> idf(docFreq=9, maxDocs=937)   0.0063751927 = queryNorm
>  0.43282017 = fieldWeight in 0, product of:   1.0 = tf(freq=1.0), 
> with freq of: 1.0 = termFreq=1.0   5.540098 = 
> idf(docFreq=9, maxDocs=937)   0.078125 = fieldNorm(doc=0) 
> 0.0119499 = (MATCH) weight(plain_text:ugt in 0) [DefaultSimilarity], result 
> of:   0.0119499 = score(doc=0,freq=1.0 = termFreq=1.0 ), product of:  
>0.031227252 = queryWeight, product of:
> 4.8982444 = idf(docFreq=18, maxDocs=937)   0.0063751927 = 
> queryNorm 0.38267535 = fieldWeight in 0, product of:  
>  1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 
>   4.8982444 = idf(docFreq=18, maxDocs=937)   0.078125 = 
> fieldNorm(doc=0) 0.019351374 = (MATCH) weight(plain_text:yhc in 0) 
> [DefaultSimilarity], result of:   0.019351374 = score(doc=0,freq=1.0 
> = termFreq=1.0 ), product of: 0.03973814 = queryWeight, product 
> of:   6.2332454 = idf(docFreq=4, maxDocs=937)   
> 0.0063751927 = queryNorm 0.4869723 = fieldWeight in 0, product 
> of:   1.0 = tf(freq=1.0), with freq of: 1.0 = 
> termFreq=1.0   6.2332454 = idf(docFreq=4, maxDocs=937)
>0.078125 = fieldNorm(doc=0) 0.019351374 = (MATCH)
> weight(plain_text:hcu in 0) [DefaultSimilarity], result of:   
> 0.019351374 = score(doc=0,freq=1.0 = termFreq=1.0 ), product of: 
> 0.03973814 = queryWeight, product of:   6.2332454 = 
> idf(docFreq=4, maxDocs=937)   0.0063751927 = queryNorm
>  0.4869723 = fieldWeight in 0, product of:   1.0 = tf(freq=1.0), 
> with freq of: 1.0 = termFreq=1.0   6.2332454 = 
> idf(docFreq=4, maxDocs=937)   0.078125 = fieldNorm(doc=0) 
> 0.01528686 = (MATCH) weight(plain_text:cug in 0) [DefaultSimilarity], result 
> of:   0.01528686 = score(doc=0,freq=1.0 = termFreq=1.0 ), product of: 
> 0.035319194 = queryWeight, product of:   5.540098 = 
> idf(docFreq=9, maxDocs=937)   0.0063751927 = queryNorm
>  0.43282017 = fieldWeight in 0, product of:   1.0 =
> tf(freq=1.0), with freq of: 1.0 = termFreq=1.0   
> 5.540098 = idf(docFreq=9, maxDocs=937)   0.078125 = 
> fieldNorm(doc=0) 0.019351374 = (MATCH) weight(plain_text:cugt in 0) 
> [DefaultSimilarity], result of:   0.019351374 = score(doc=0,freq=1.0 
> = termFreq=1.0 ), product of: 0.03973814 = queryWeight, product 
> of:   6.2332454 = idf(docFreq=4, maxDocs=937)   
> 0.0063751927 = queryNorm 0.4869723 = fieldWeight in 0, product 
> of:   1.0 = tf(freq=1.0), with freq of: 1.0 = 
> termFreq=1.0   6.2332454 = idf(docFreq=4, maxDocs=937)
>0.078125 = fieldNorm(doc=0) 0.019351374 = (MATCH) 
> weight(plain_text:yhcu in 0) [DefaultSimilarity], result of:   
> 0.019351374 = score(doc=0,freq=1.0 = te

Re: SOLVED searches for single char tokens instead of from 3 uppwards

2014-03-11 Thread Andreas Owen
sorry i looked at the wrong fieldtype

-Original-Nachricht- 
> Von: "Andreas Owen"  
> An: solr-user@lucene.apache.org 
> Datum: 11/03/2014 08:45 
> Betreff: searches for single char tokens instead of from 3 uppwards 
> 
> i have a field with the following type:
> 
> 
>        
>         
>         
>    words="lang/stopwords_de.txt" format="snowball" 
> enablePositionIncrements="true"/> 
>                
>    language="German"/> 
>    maxGramSize="15"/>
>    generateWordParts="1" generateNumberParts="1" catenateWords="1" 
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>       
>     
> 
> 
> shouldn't this make tokens from 3 to 15 in length and not from 1? heres is a 
> query report of 2 results:
>   0   name="QTime">125       name="debugQuery">true     name="fl">title,roles,organisations,id    true 
>    yh_cugtest    1394522589347    
> xml    organisations:* roles:*  
> 
> 
>    ..
> 
> 1.6365329 = (MATCH) sum of:   1.6346203 = (MATCH) max of:     0.14759353 = 
> (MATCH) product of:       0.28596246 = (MATCH) sum of:         0.01528686 = 
> (MATCH) weight(plain_text:cug in 0) [DefaultSimilarity], result of:           
> 0.01528686 = score(doc=0,freq=1.0 = termFreq=1.0 ), product of:             
> 0.035319194 = queryWeight, product of:               5.540098 = 
> idf(docFreq=9, maxDocs=937)               0.0063751927 = queryNorm            
>  0.43282017 = fieldWeight in 0, product of:               1.0 = tf(freq=1.0), 
> with freq of:                 1.0 = termFreq=1.0               5.540098 = 
> idf(docFreq=9, maxDocs=937)               0.078125 = fieldNorm(doc=0)         
> 0.0119499 = (MATCH) weight(plain_text:ugt in 0) [DefaultSimilarity], result 
> of:           0.0119499 = score(doc=0,freq=1.0 = termFreq=1.0 ),
product of:             0.031227252 = queryWeight, product of:              
> 4.8982444 = idf(docFreq=18, maxDocs=937)               0.0063751927 = 
> queryNorm             0.38267535 = fieldWeight in 0, product of:              
>  1.0 = tf(freq=1.0), with freq of:                 1.0 = termFreq=1.0         
>       4.8982444 = idf(docFreq=18, maxDocs=937)               0.078125 = 
> fieldNorm(doc=0)         0.019351374 = (MATCH) weight(plain_text:yhc in 0) 
> [DefaultSimilarity], result of:           0.019351374 = score(doc=0,freq=1.0 
> = termFreq=1.0 ), product of:             0.03973814 = queryWeight, product 
> of:               6.2332454 = idf(docFreq=4, maxDocs=937)               
> 0.0063751927 = queryNorm             0.4869723 = fieldWeight in 0, product 
> of:               1.0 = tf(freq=1.0), with freq of:                 1.0 = 
> termFreq=1.0               6.2332454 =
idf(docFreq=4, maxDocs=937)               0.078125 = fieldNorm(doc=0)         
0.019351374 = (MATCH)
> weight(plain_text:hcu in 0) [DefaultSimilarity], result of:           
> 0.019351374 = score(doc=0,freq=1.0 = termFreq=1.0 ), product of:             
> 0.03973814 = queryWeight, product of:               6.2332454 = 
> idf(docFreq=4, maxDocs=937)               0.0063751927 = queryNorm            
>  0.4869723 = fieldWeight in 0, product of:               1.0 = tf(freq=1.0), 
> with freq of:                 1.0 = termFreq=1.0               6.2332454 = 
> idf(docFreq=4, maxDocs=937)               0.078125 = fieldNorm(doc=0)         
> 0.01528686 = (MATCH) weight(plain_text:cug in 0) [DefaultSimilarity], result 
> of:           0.01528686 = score(doc=0,freq=1.0 = termFreq=1.0 ), product of: 
>             0.035319194 = queryWeight, product of:               5.540098 = 
> idf(docFreq=9, maxDocs=937)               0.0063751927 =
queryNorm             0.43282017 = fieldWeight in 0, product of:               
1.0 =
> tf(freq=1.0), with freq of:                 1.0 = termFreq=1.0               
> 5.540098 = idf(docFreq=9, maxDocs=937)               0.078125 = 
> fieldNorm(doc=0)         0.019351374 = (MATCH) weight(plain_text:cugt in 0) 
> [DefaultSimilarity], result of:           0.019351374 = score(doc=0,freq=1.0 
> = termFreq=1.0 ), product of:             0.03973814 = queryWeight, product 
> of:               6.2332454 = idf(docFreq=4, maxDocs=937)               
> 0.0063751927 = queryNorm             0.4869723 = fieldWeight in 0, product 
> of:               1.0 = tf(freq=1.0), with freq of:                 1.0 = 
> termFreq=1.0               6.2332454 = idf(docFreq=4, maxDocs=937)            
>    0.078125 = fieldNorm(doc=0)         0.019351374 = (MATCH) 
> weight(plain_text:yhcu in 0) [DefaultSimilarity], result of:          
0.019351374 = score(doc=0,freq=1.0 = termFreq=1.0 ), product of:             
0.03973814 =
> queryWeight, product of:               6.2332454 = idf(docFreq=4, 
> maxDocs=937)               0.0063751927 = queryNorm             0.4869723 = 
> fieldWeight in 0, product of:               1.0 = tf(freq=1.0), with freq of: 
>                 1.0 = termFreq=1.0               6.2332454 = idf(docFreq=4, 
> maxDocs=937)               0

searches for single char tokens instead of from 3 uppwards

2014-03-11 Thread Andreas Owen
i have a field with the following type:


       
        
        
 
               
 


      
    


shouldn't this make tokens from 3 to 15 in length and not from 1? heres is a 
query report of 2 results:
  0  125  truetitle,roles,organisations,idtrue   
 yh_cugtest1394522589347xmlorganisations:* roles:*  


   ..

1.6365329 = (MATCH) sum of:   1.6346203 = (MATCH) max of: 0.14759353 = 
(MATCH) product of:   0.28596246 = (MATCH) sum of: 0.01528686 = 
(MATCH) weight(plain_text:cug in 0) [DefaultSimilarity], result of:   
0.01528686 = score(doc=0,freq=1.0 = termFreq=1.0 ), product of: 
0.035319194 = queryWeight, product of:   5.540098 = idf(docFreq=9, 
maxDocs=937)   0.0063751927 = queryNorm 0.43282017 = 
fieldWeight in 0, product of:   1.0 = tf(freq=1.0), with freq of:   
  1.0 = termFreq=1.0   5.540098 = idf(docFreq=9, 
maxDocs=937)   0.078125 = fieldNorm(doc=0) 0.0119499 = 
(MATCH) weight(plain_text:ugt in 0) [DefaultSimilarity], result of:   
0.0119499 = score(doc=0,freq=1.0 = termFreq=1.0 ), product of: 
0.031227252 = queryWeight, product of:  
4.8982444 = idf(docFreq=18, maxDocs=937)   0.0063751927 = queryNorm 
0.38267535 = fieldWeight in 0, product of:   1.0 = 
tf(freq=1.0), with freq of: 1.0 = termFreq=1.0   
4.8982444 = idf(docFreq=18, maxDocs=937)   0.078125 = 
fieldNorm(doc=0) 0.019351374 = (MATCH) weight(plain_text:yhc in 0) 
[DefaultSimilarity], result of:   0.019351374 = score(doc=0,freq=1.0 = 
termFreq=1.0 ), product of: 0.03973814 = queryWeight, product of:   
6.2332454 = idf(docFreq=4, maxDocs=937)   0.0063751927 
= queryNorm 0.4869723 = fieldWeight in 0, product of:   
1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0
   6.2332454 = idf(docFreq=4, maxDocs=937)   0.078125 = 
fieldNorm(doc=0) 0.019351374 = (MATCH)
weight(plain_text:hcu in 0) [DefaultSimilarity], result of:   
0.019351374 = score(doc=0,freq=1.0 = termFreq=1.0 ), product of: 
0.03973814 = queryWeight, product of:   6.2332454 = idf(docFreq=4, 
maxDocs=937)   0.0063751927 = queryNorm 0.4869723 = 
fieldWeight in 0, product of:   1.0 = tf(freq=1.0), with freq of:   
  1.0 = termFreq=1.0   6.2332454 = idf(docFreq=4, 
maxDocs=937)   0.078125 = fieldNorm(doc=0) 0.01528686 = 
(MATCH) weight(plain_text:cug in 0) [DefaultSimilarity], result of:   
0.01528686 = score(doc=0,freq=1.0 = termFreq=1.0 ), product of: 
0.035319194 = queryWeight, product of:   5.540098 = idf(docFreq=9, 
maxDocs=937)   0.0063751927 = queryNorm 0.43282017 = 
fieldWeight in 0, product of:   1.0 =
tf(freq=1.0), with freq of: 1.0 = termFreq=1.0   
5.540098 = idf(docFreq=9, maxDocs=937)   0.078125 = 
fieldNorm(doc=0) 0.019351374 = (MATCH) weight(plain_text:cugt in 0) 
[DefaultSimilarity], result of:   0.019351374 = score(doc=0,freq=1.0 = 
termFreq=1.0 ), product of: 0.03973814 = queryWeight, product of:   
6.2332454 = idf(docFreq=4, maxDocs=937)   0.0063751927 
= queryNorm 0.4869723 = fieldWeight in 0, product of:   
1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0
   6.2332454 = idf(docFreq=4, maxDocs=937)   0.078125 = 
fieldNorm(doc=0) 0.019351374 = (MATCH) weight(plain_text:yhcu in 0) 
[DefaultSimilarity], result of:   0.019351374 = score(doc=0,freq=1.0 = 
termFreq=1.0 ), product of: 0.03973814 =
queryWeight, product of:   6.2332454 = idf(docFreq=4, maxDocs=937)  
 0.0063751927 = queryNorm 0.4869723 = fieldWeight in 0, 
product of:   1.0 = tf(freq=1.0), with freq of: 1.0 
= termFreq=1.0   6.2332454 = idf(docFreq=4, maxDocs=937)
   0.078125 = fieldNorm(doc=0) 0.01528686 = (MATCH) 
weight(plain_text:cug in 0) [DefaultSimilarity], result of:   
0.01528686 = score(doc=0,freq=1.0 = termFreq=1.0 ), product of: 
0.035319194 = queryWeight, product of:   5.540098 = idf(docFreq=9, 
maxDocs=937)   0.0063751927 = queryNorm 0.43282017 = 
fieldWeight in 0, product of:   1.0 = tf(freq=1.0), with freq of:   
  1.0 = termFreq=1.0   5.540098 = idf(docFreq=9, 
maxDocs=937)   0.078125 = fieldNorm(doc=0)
0.019351374 = (MATCH) weight(plain_text:hcug in 0) [DefaultSimila

Result merging takes too long

2014-03-11 Thread remi tassing
Hi,

I've just setup a SolrCloud with Tomcat. 5 Shards with one replication each
and total 10million docs (evenly distributed).

I've noticed the query response time is faster than using one single node
but still not as fast as I expected.

After turning debugQuery on, I noticed the query time is different to the
value returned in the debug explanation (see some excerpt below). More
importantly, while making a query to one, and only one, shard then the
result is consistent. It appears the server spends most of its time doing
result aggregation (merging).

After searching on Google in vain I didn't find anything concrete except
that the problem could be in 'SearchComponent'.

Could you point me in the right direction (e.g. configuration...)?

Thanks!

Remi

Solr Cloud result:



0

3471



on

project development agile





...

...





508.0



8.0



8.0





0.0





0.0





0.0





0.0





0.0







499.0



195.0





0.0





0.0





228.0





0.0





76.0








DocumentCache Out Of Memory

2014-03-11 Thread david . davila
Hello,

in our project we need to execute some big queries against Solr once a 
day, with maybe more than 1000 results,  in order to trigger a batch 
proccess with the results. In the fl parameter we only are putting the ID 
field, because we don't need large text fields.

This is our scenary:

- Our documents are generally very big, but as I have said we only request 
for the ID field.
- We have the enableLazyFieldLoading parameter set to true in 
solrconfig.xml, so the DocumentCache should load only the ID field that we 
are requesting.
- Our DocumetCache is set to 8192 objects.
- This test have been executed in Solr 4.2.1, 4.6.1 and 4.7, in no Solr 
Cloud as well as in Solr Cloud mode.


The issue we have got is this:
 
- When we request for more than 1000 docs. more or less JVM takes a lot of 
memory and ends with an OOM.
- Seeing in "real time" as the DocumentCache inserts documents, we have 
seen that memory grows when the documents are bigger (and time needed to 
load that documents in Cache is larger also), but we don't understand why, 
because with enableLazyFieldLoading only the ID should be loaded, so, ¿why 
does memory grow in that way?

We know that one solution is increase RAM memory and other is decrease the 
size of DocumentCache (we have already done this), but we'd like to know 
why this issue with memory.

On the other hand, one good solution for us would be make the queries 
without cache. ¿Is there any way to say Solr not to cache some specific 
queries? I don't think so, but maybe I am wrong.

Thank very much,



David Dávila Atienza
AEAT - Departamento de Informática Tributaria
Subdirección de Tecnologías de Análisis de la Información e Investigación 
del Fraude
Teléfono: 917681160
Extensión: 30160