Re: MergerFacor effect on indexes

2011-06-30 Thread Romi
To see the changes i am deleting my old indexes and recreating them but still
getting the same result :(

-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/MergerFacor-effect-on-indexes-tp3125146p3128432.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to optimize solr indexes

2011-06-30 Thread Romi
when i run as : deltaimport?command=delta-import&optimize=false 

But i am still getting optimize=true when i look at admin console which is
in my original post

-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-optimize-solr-indexes-tp3125293p3128424.html
Sent from the Solr - User mailing list archive at Nabble.com.


[ANNOUNCE] Apache Solr 3.3

2011-06-30 Thread Robert Muir
July 2011, Apache Solr™ 3.3 available
The Lucene PMC is pleased to announce the release of Apache Solr 3.3.

Solr is the popular, blazing fast open source enterprise search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search, dynamic clustering, database
integration, rich document (e.g., Word, PDF) handling, and geospatial search.
Solr is highly scalable, providing distributed search and index replication,
and it powers the search and navigation features of many of the world's
largest internet sites.

This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below.  The release
is available for immediate download at:
   http://www.apache.org/dyn/closer.cgi/lucene/solr (see note below).

See the CHANGES.txt file included with the release for a full list of
details as well as instructions on upgrading.

Solr 3.3 Release Highlights

 * Grouping / Field Collapsing

 * A new, automaton-based suggest/autocomplete implementation offering an
   order of magnitude smaller RAM consumption.

 * KStemFilterFactory, an optimized implementation of a less aggressive
   stemmer for English.

 * Solr defaults to a new, more efficient merge policy (TieredMergePolicy).
   See http://s.apache.org/merging for more information.

 * Important bugfixes, including extremely high RAM usage in spellchecking.

 * Bugfixes and improvements from Apache Lucene 3.3

Note: The Apache Software Foundation uses an extensive mirroring network for
distributing releases.  It is possible that the mirror you are using may not
have replicated the release yet.  If that is the case, please try another
mirror.  This also goes for Maven access.

Thanks,
Apache Solr Developers


Re: Uninstall Solr

2011-06-30 Thread Erik Hatcher
How'd you install it?

Generally you just delete the directory where you "installed" it.  But you 
might be deploying solr.war in a container somewhere besides Solr's example 
Jetty setup, in which case you need to undeploy it from those other containers 
and remove the remnants.

Curious though... why uninstall it?  Solr makes a mighty fine hammer to have 
around :)

Erik

On Jun 30, 2011, at 19:49 , GAURAV PAREEK wrote:

> Hi All,
> 
> How to *uninstall* Solr completely ?
> 
> Any help will be appreciated.
> 
> Regards,
> Gaurav



Uninstall Solr

2011-06-30 Thread GAURAV PAREEK
Hi All,

How to *uninstall* Solr completely ?

Any help will be appreciated.

Regards,
Gaurav


Re: Taxonomy faceting

2011-06-30 Thread Chris Hostetter

: Lucid Imagination did a webcast on this, as far as I remember?

that was me ... the webcast was a pre-run of my apachecon talk...

http://www.lucidimagination.com/why-lucid/webinars/mastering-power-faceted-search
http://people.apache.org/~hossman/apachecon2010/facets/

...taxonomy stuff comes up ~slide 30

: The '1/topics/computing'-solution works at a single level, so if you are
: interested in a multi-level result like

if you want to show the whole tree when facetig you can just leave the 
"depth" number prefix out of terms, thta should work fine (but i haven't 
though about hard)

: > Are there better ways to achieve this?
: 
: Taxonomy faceting is a bit of a mess right now, but it is also an area
: where a lot is happening. For SOLR, there is

right, some of which i havne't been able to keep up on and can't comment 
on -- but in my experience if you are serious organizing your data in a 
taxonomy then you probably already have some data structure in your 
application layer that models the whole thing in memory, and maps nodeIds 
to nodeLabels and what not.  What usually works fine is to just index the 
nodeIds for the entire ancestory of the category each Document is in can 
work fine for the filtering (ie: fq=cat:1234), and to generate the facet 
presentation you do a simple facet.field=ancestorCategories&facet.limit=-1 
to get all the counts in a big hashmap and then use that to annotate your 
own own category tree data structure that you use to generate the 
presentaiton.



-Hoss


JOIN, query on the parent?

2011-06-30 Thread Ryan McKinley
Hello-

I'm looking for a way to find all the links from a set of results.  Consider:


 id:1
 type:X
 link:a
 link:b



 id:2
 type:X
 link:a
 link:c



 id:3
 type:Y
 link:a


Is there a way to search for all the links from stuff of type X -- in
this case (a,b,c)

If I'm understanding the {!join stuff, it lets you search on the
children, but i don't really see how to limit the parent values.

Am I missing something, or is this a further extension to the JoinQParser?


thanks
ryan


Re: After the query component has the results, can I do more filtering on them?

2011-06-30 Thread arian487
Sorry for the double post but in this case, is it possible for me to access
the queryResultCache in my component and play with it?  Ideally what I want
is this:

1) I have 1 (just a random large number) total results. 
2) In my component I access all of these results, score them, and take the
top 3500 (a random smaller number) and drop the rest.  
3) The 3500 I have now should end up going into the queryResultCache and
essentially replacing the other one.
4) The number returned to the user should then be rows and subsequent
queries which are the same just gets them from my new result cache.

I'm pretty noob about all if this so I'm hoping someone can help.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/After-the-query-component-has-the-results-can-I-do-more-filtering-on-them-tp3114775p3127581.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: After the query component has the results, can I do more filtering on them?

2011-06-30 Thread arian487
unfortunately the userIdsToScore updates very often.  I'd get more Ids almost
every single query (hence why I made the new component).  But I see the
problem of not being able to score the whole resultSet.  I'd actually need
to do this now that I think about it.  I want to get a whole whack of users
(lets say 10,000), score them using my system, and then 'remember' the top
3500 of these users in the result cache or something.  

How would I go about operating on the whole resultSet rather then just the
'rows' I set.  I wonder if I can set rows to be really large, score them in
the component, and then remember all of these results in the result cache
and then dynamically change rows in my component so not all 3500 (or w/e
number I choose) are returned.  

--
View this message in context: 
http://lucene.472066.n3.nabble.com/After-the-query-component-has-the-results-can-I-do-more-filtering-on-them-tp3114775p3127560.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: TermVectors and custom queries

2011-06-30 Thread Jamie Johnson
Perhaps a better question, is this possible?

On Mon, Jun 27, 2011 at 5:15 PM, Jamie Johnson  wrote:
> I have a field named content with the following definition
>
>     multiValued="true" termVectors="true" termPositions="true"
> termOffsets="true"/>
>
> I'm now trying to execute a query against content and get back the term
> vectors for the pieces that matched my query, but I must be messing
> something up.  My query is as follows:
>
> http://localhost:8983/solr/select/?qt=tvrh&q=content:test&fl=content&tv.all=true
>
> where the word test is in my content field.  When I get information back
> though I am getting the term vectors for all of the tokens in that field.
> How do I get back just the ones that match my search?
>


Re: Core Administration

2011-06-30 Thread zarni aung
Thank you very much Stefan.  This helps.

Zarni

On Thu, Jun 30, 2011 at 4:10 PM, Stefan Matheis <
matheis.ste...@googlemail.com> wrote:

> Zarni,
>
> Am 30.06.2011 20:32, schrieb zarni aung:
>
>  But I need to know if Solr already handles that case.  I wouldn't want to
>> have to write the tool if Solr already supports creating cores with new
>> configs on the fly.
>>
>
> there isn't. you have to create the directory structure & the related files
> yourself. solr (the AdminCoreHandler) does only "activate" the core for
> usage.
>
> Few Weeks ago, there was a Question about modifying Configuration Files
> from the Browser: http://search.**lucidimagination.com/search/**
> document/ec79172e7613d1a/**modifying_configuration_from_**a_browser
>
> Regards
> Stefan
>


Re: Core Administration

2011-06-30 Thread Stefan Matheis

Zarni,

Am 30.06.2011 20:32, schrieb zarni aung:

But I need to know if Solr already handles that case.  I wouldn't want to
have to write the tool if Solr already supports creating cores with new
configs on the fly.


there isn't. you have to create the directory structure & the related 
files yourself. solr (the AdminCoreHandler) does only "activate" the 
core for usage.


Few Weeks ago, there was a Question about modifying Configuration Files 
from the Browser: 
http://search.lucidimagination.com/search/document/ec79172e7613d1a/modifying_configuration_from_a_browser


Regards
Stefan


Re: Multicore clustering setup problem

2011-06-30 Thread Walter Closenfleight
Staszek,

That makes sense, but this has always been a multi-core setup, so the paths
have not changed, and the clustering component worked fine for core0. The
only thing new is I have fine tuned core1 (to begin implementing it).
Previously the solrconfig.xml file was very basic. I replaced it with
core0's solrconfig.xml and made very minor changes to it (unrelated to
clustering) - it's a nearly identical solrconfig.xml file so I'm surprised
it doesn't work for core1.

In other words, the paths here are the same for core0 and core1:
  
  
  
  
Again, I'm wondering if perhaps since both cores have the clustering
component, if it should have a shared configuration in a different file used
by both cores(?). Perhaps the duplicate clusteringComponent configuration
for both cores is the problem?

Thanks for looking at this!

On Thu, Jun 30, 2011 at 1:29 PM, Stanislaw Osinski <
stanislaw.osin...@carrotsearch.com> wrote:

> It looks like the whole clustering component JAR is not in the classpath. I
> remember that I once dealt with a similar issue in Solr 1.4 and the cause
> was the relative path of the  tag being resolved against the core's
> instanceDir, which made the path incorrect when directly copying and
> pasting
> from the single core configuration. Try correcting the relative  paths
> or replacing them with absolute ones, it should solve the problem.
>
> Cheers,
>
> Staszek
>


Re: Core Administration

2011-06-30 Thread zarni aung
I have an idea.  I  believe I can discover the Properties of an object (C#
reflection) and then code gen schema.xml file based on the field type and
other meta data of that type (possibly from database).  After that, I should
be able to ftp the files over to the solr machine.  Then I can invoke core
admin to create the new index on the fly.  My original question would be, is
there a tool that already does what I'm describing?

Z

On Thu, Jun 30, 2011 at 2:32 PM, zarni aung  wrote:

> Hi,
>
> I am researching about core administration using Solr.  My requirement is
> to be able to provision/create/delete indexes dynamically.  I have tried it
> and it works.  Apparently core admin handler will create a new core by
> specifying the instance Directory (required), along with data directory, and
> so on.  The issue I'm having is that a separate app that lives on a
> different machine need to create these new cores on demand along with
> creating new schema.xml and data directories.  The required instance
> directory, data directory and others need to be separate from each core.
>
> My first approach is to write a tool that would take additional params that
> can code gen the schema config files and so on based on different type of
> documents.  ie: Homes, People, etc...
>
> But I need to know if Solr already handles that case.  I wouldn't want to
> have to write the tool if Solr already supports creating cores with new
> configs on the fly.
>
> Thanks,
>
> Z
>


Re: Solr 3.2 filter cache warming taking longer than 1.4.1

2011-06-30 Thread Shawn Heisey

On 6/29/2011 10:16 PM, Shawn Heisey wrote:
I was thinking perhaps I might actually decrease the termIndexInterval 
value below the default of 128.  I know from reading the Hathi Trust 
blog that memory usage for the tii file is much more than the size of 
the file would indicate, but if I increase it from 13MB to 26MB, it 
probably would still be OK.


Decreasing the termIndexInterval to 64 almost doubled the tii file size, 
as expected.  It made the filterCache warming much faster, but made the 
queryResultCache warming very very slow.  Regular queries also seem like 
they're slower.


I am trying again with 256.  I may go back to the default before I'm 
done.  I'm guessing that a lot of trial and error was put into choosing 
the default value.


It's been fun having a newer index available on my backup servers.  I've 
been able to do a lot of trials, learned a lot of things that don't work 
and a few that do.  I might do some experiments with trunk once I've 
moved off 1.4.1.


Thanks,
Shawn



Problems with SolrCloud

2011-06-30 Thread Andrey Sapegin
Dear ladies and gentlemen.

Can I ask you to help me with SolrCloud

1) I try to setup a SolrCloud on 2 computers with 3 Zookepers, but it
fails:(

I need to set Zookeper port to 8001, so I change clientPort=8001 in
solr/zoo.cfg.

When I try the command from the example C, to run shard1, it works:
java -Dbootstrap_confdir=./solr/conf -Dcollection.configName=myconf
-DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900  -jar
start.jar

But if I change it to and try to run shard1:
java -Dbootstrap_confdir=./solr/conf -Dcollection.configName=myconf
-DzkRun -DzkHost=localhost:8001,localhost:8004 -jar start.jar

it fails with the following message:
SEVERE: java.lang.IllegalArgumentException: solr/zoo_data/myid file is
missing

2) to solve it I tried to set
*-Dsolr.solr.home=/data/a.sapegin/SolrCloud/shard1*
(without any slashes in the end)

But then I receive another exception:
"Caused by:
org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException:
Error processing /data/a.sapegin/SolrCloud/shard1//zoo.cfg"

I think this "//" is a bug.


Could you please help?
Thank You in advance,
Kind Regards,

-- 

Andrey Sapegin,
Software Developer,

Unister GmbH
Dittrichring 18-20 | 04109 Leipzig

+49 (0)341 492885069,
+4915778339304,
andrey.sape...@unister-gmbh.de

www.unister.de



Solr Importing database field issues . how to I use postgres pgpool connection?

2011-06-30 Thread rsaravanakumar
I am using postgres database and pgpool . Postgres database port : 5432 is
woking fine. But 
I am using Pgpool port :  is Not Working.

MY importing xml file (*myproduct.xml*)
*Working *


*Not Working *


It is pgpool problem or solr problem? please any onle let me know the issues
and
How to I salve pgpool this problem?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Importing-database-field-issues-how-to-I-use-postgres-pgpool-connection-tp3126212p3126212.html
Sent from the Solr - User mailing list archive at Nabble.com.


Core Administration

2011-06-30 Thread zarni aung
Hi,

I am researching about core administration using Solr.  My requirement is to
be able to provision/create/delete indexes dynamically.  I have tried it and
it works.  Apparently core admin handler will create a new core by
specifying the instance Directory (required), along with data directory, and
so on.  The issue I'm having is that a separate app that lives on a
different machine need to create these new cores on demand along with
creating new schema.xml and data directories.  The required instance
directory, data directory and others need to be separate from each core.

My first approach is to write a tool that would take additional params that
can code gen the schema config files and so on based on different type of
documents.  ie: Homes, People, etc...

But I need to know if Solr already handles that case.  I wouldn't want to
have to write the tool if Solr already supports creating cores with new
configs on the fly.

Thanks,

Z


Re: Wildcard search not working if full word is queried

2011-06-30 Thread François Schiettecatte
I would run that word through the analyzer, I suspect that the word 'teste' is 
being stemmed to 'test' in the index, at least that is the first place I would 
check.

François

On Jun 30, 2011, at 2:21 PM, Celso Pinto wrote:

> Hi everyone,
> 
> I'm having some trouble figuring out why a query with an exact word
> followed by the * wildcard, eg. teste*, returns no results while a
> query for test* returns results that have the word "teste" in them.
> 
> I've created a couple of pasties:
> 
> Exact word with wildcard : http://pastebin.com/n9SMNsH0
> Similar word: http://pastebin.com/jQ56Ww6b
> 
> Parameters other than title, description and content have no effect
> other than filtering out unwanted results. In a two of the four
> results, the title has the complete word "teste". On the other two,
> the word appears in the other fields.
> 
> Does anyone have any insights about what I'm doing wrong?
> 
> Thanks in advance.
> 
> Regards,
> Celso



Re: Multicore clustering setup problem

2011-06-30 Thread Stanislaw Osinski
It looks like the whole clustering component JAR is not in the classpath. I
remember that I once dealt with a similar issue in Solr 1.4 and the cause
was the relative path of the  tag being resolved against the core's
instanceDir, which made the path incorrect when directly copying and pasting
from the single core configuration. Try correcting the relative  paths
or replacing them with absolute ones, it should solve the problem.

Cheers,

Staszek


Wildcard search not working if full word is queried

2011-06-30 Thread Celso Pinto
Hi everyone,

I'm having some trouble figuring out why a query with an exact word
followed by the * wildcard, eg. teste*, returns no results while a
query for test* returns results that have the word "teste" in them.

I've created a couple of pasties:

Exact word with wildcard : http://pastebin.com/n9SMNsH0
Similar word: http://pastebin.com/jQ56Ww6b

Parameters other than title, description and content have no effect
other than filtering out unwanted results. In a two of the four
results, the title has the complete word "teste". On the other two,
the word appears in the other fields.

Does anyone have any insights about what I'm doing wrong?

Thanks in advance.

Regards,
Celso


Re: Text field case sensitivity problem

2011-06-30 Thread Mike Sokolov

Yes, and this too: https://issues.apache.org/jira/browse/SOLR-219

On 06/30/2011 12:46 PM, Erik Hatcher wrote:

Jamie - there is a JIRA about this, at least 
one:

Erik

On Jun 15, 2011, at 10:12 , Jamie Johnson wrote:

   

So simply lower casing the works but can get complex.  The query that I'm
executing may have things like ranges which require some words to be upper
case (i.e. TO).  I think this would be much better solved on Solrs end, is
there a JIRA about this?

On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolov  wrote:

 

opps, please s/Highlight/Wildcard/


On 06/14/2011 05:31 PM, Mike Sokolov wrote:

   

Wildcard queries aren't analyzed, I think?  I'm not completely sure what
the best workaround is here: perhaps simply lowercasing the query terms
yourself in the application.  Also - I hope someone more knowledgeable will
say that the new HighlightQuery in trunk doesn't have this restriction, but
I'm not sure about that.

-Mike

On 06/14/2011 05:13 PM, Jamie Johnson wrote:

 

Also of interest to me is this returns results
http://localhost:8983/solr/select?defType=lucene&q=Person_Name:Kristine


On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnson
wrote:

I am using the following for my text field:
   























I have a field defined as


when I execute a go to the following url I get results
http://localhost:8983/solr/select?defType=lucene&q=Person_Name:kris*
but if I do
http://localhost:8983/solr/select?defType=lucene&q=Person_Name:Kris*
I get nothing.  I thought the LowerCaseFilterFactory would have handled
lowercasing both the query and what is being indexed, am I missing
something?


 
   


Re: Strip Punctuation From Field

2011-06-30 Thread Tomás Fernández Löbbe
Not that I'm aware of. This is probably something you want to do at the
application layer. If you want to do it in Solr, a good place would be an
UpdateRequestProcessor, but I guess you'll have to implement your own.

On Wed, Jun 29, 2011 at 4:12 PM, Curtis Wilde  wrote:

> From all I've read, using something like PatternReplaceFilterFactory allows
> you to replace / remove text in an index, but is there anything similar
> that
> allows manipulation of the text in the associated field? For example, if I
> pulled a status from Twitter like, "Hi, this is a #hashtag." I would like
> to
> remove the "#" from that string and use it for both the index, and also the
> field value that is returned from a query, i.e., "Hi, this is a hashtag".
>


Re: Returning total matched document count with SolrJ

2011-06-30 Thread Kissue Kissue
Thanks Michael. Quite helpful.

On Thu, Jun 30, 2011 at 4:06 PM, Michael Ryan  wrote:

> SolrDocumentList docs = queryResponse.getResults();
> long totalMatches = docs.getNumFound();
>
> -Michael
>


Re: Text field case sensitivity problem

2011-06-30 Thread Erik Hatcher
Jamie - there is a JIRA about this, at least one: 


Erik
 
On Jun 15, 2011, at 10:12 , Jamie Johnson wrote:

> So simply lower casing the works but can get complex.  The query that I'm
> executing may have things like ranges which require some words to be upper
> case (i.e. TO).  I think this would be much better solved on Solrs end, is
> there a JIRA about this?
> 
> On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolov  wrote:
> 
>> opps, please s/Highlight/Wildcard/
>> 
>> 
>> On 06/14/2011 05:31 PM, Mike Sokolov wrote:
>> 
>>> Wildcard queries aren't analyzed, I think?  I'm not completely sure what
>>> the best workaround is here: perhaps simply lowercasing the query terms
>>> yourself in the application.  Also - I hope someone more knowledgeable will
>>> say that the new HighlightQuery in trunk doesn't have this restriction, but
>>> I'm not sure about that.
>>> 
>>> -Mike
>>> 
>>> On 06/14/2011 05:13 PM, Jamie Johnson wrote:
>>> 
 Also of interest to me is this returns results
 http://localhost:8983/solr/select?defType=lucene&q=Person_Name:Kristine
 
 
 On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnson
 wrote:
 
 I am using the following for my text field:
> 
>  positionIncrementGap="100" autoGeneratePhraseQueries="true">
> 
> 
> 
> 
> ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
> 
> 
> 
> 
>  ignoreCase="true" expand="true"/>
> ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
> 
> 
> 
> 
> I have a field defined as
> 
> 
> when I execute a go to the following url I get results
> http://localhost:8983/solr/select?defType=lucene&q=Person_Name:kris*
> but if I do
> http://localhost:8983/solr/select?defType=lucene&q=Person_Name:Kris*
> I get nothing.  I thought the LowerCaseFilterFactory would have handled
> lowercasing both the query and what is being indexed, am I missing
> something?
> 
> 



mutliple webapps vs multi-core vs distruibuted

2011-06-30 Thread Tod
Currently I'm working with a group implementing Solr on an enterprise 
level.  Their initial toe dipping into Solr consists of running multiple 
(two) webapps on Tomcat using identical schemas.


Content is dispersed among a variety of repositories from CMS, DMS, WCMS 
to file systems and RDBS'.  The expectation is that this implementation 
is going to get very popular very quick.  With that in mind there is 
also a very large, very diverse set of business groups spanning the 
entire organization all of which want to participate.


This participation is based mostly on marketing their wares, not making 
sure a unified enterprise taxonomy exists that can ultimately facilitate 
search relevancy at an enterprise level.  Therefore accomplishing a 
unified taxonomy most likely can't be completed within the time frame 
the customer wants to have the search up and running.


So its up to us to figure out how to satisfy the immediate needs of each 
individual business entity, without the benefit of a unified enterprise 
wide taxonomy, and with advance knowledge there is a likelihood that 
each unit's search index may be based on a different schema dependent on 
their individual business drivers.


At an enterprise level users should be able to search the entire set of 
individual indexes returning a merged result with a desire to provide a 
high level of relevancy to individual business groups along with the 
enterprise audience both internal and external.


From what I've been reading I think the current configuration may not 
stand up to the long term demand both from a usability and 
administrative standpoint, but I'm not completely sure.  That leaves 
multi-core and distributed search as possibilities.


I'm leaning towards multi-core.  Part of this decision is based on my 
perceived performance and administrative gains over the current 
configuration.  Distributed search is a possibility but in the short to 
medium term I don't see the number of indexed documents increasing to a 
size that would require it.  Plus I think the lack of a unified schema 
might throw a monkey wrench into the mix limiting the available solutions.


Does anyone have a similar experience that would be willing to share? 
Its early enough in the project life cycle that alternative ideas can be 
considered.  I'd be interested to hear other's opinions.



TIA - Tod


token exceeding provided text size error since Solr 3.2

2011-06-30 Thread getagrip

A bug was introduced between Solr 3.1 and 3.2.

With Solr 3.2 we are now getting the follwing error when querying 
several pdf and word documents:


SEVERE: org.apache.solr.common.SolrException: 
org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token 
17 exceeds length of provided text sized 168
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:474)
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:378)
at 
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:116)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at 
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)

at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: 
org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token 
17 exceeds length of provided text sized 168
at 
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233)
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:467)

... 24 more




Re: MergerFacor effect on indexes

2011-06-30 Thread Tomás Fernández Löbbe
Hi Romi, after doing the changes, to se the impact you'll have to index some
documents, Solr won't change your index unless you add more documents and
commit them.
It looks like your maxMergeDocs parameter is too small, I would use a grater
value here.
You can see an good explanation on how the merge policy works in Solr here:

http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/

The default Merge policy has changed in 3_x and trunk, you can probably also
take a look at

http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Regards,

Tomás

On Thu, Jun 30, 2011 at 6:47 AM, Romi  wrote:

> my solrconfig.xml configuration is as :
> 
>   false
>32
>5
>10
>1
>false
>  
>
>
>
> my solrconfig.xml configuration is as :
>
> *
>   false
>32
>5
>10
>1
>false
>  *
>
> and index size is 12mb. but when i change my mergeFactor i am not finding
> any effect in my indexes., ie. the no of segments are exactly same. i am
> not
> getting which configuration will effect the no of segments. as i suppose it
> is mergefactor. and my next problem is which configuration defines the
> number of docs per segments and what will be the size of this segment so
> that next segments will be created
>
> please make me clear about these points
>
>
> -
> Thanks & Regards
> Romi
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/MergerFacor-effect-on-indexes-tp3125146p3125146.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Text field case sensitivity problem

2011-06-30 Thread Mike Sokolov
Yes, after posting that response, I read some more and came to the same 
conclusion... there seems to be some interest on the dev list in 
building a capability to specify an analysis chain for use with wildcard 
and related queries, but it doesn't exist now.


-Mike

On 06/30/2011 10:34 AM, Jamie Johnson wrote:

I think my answer is here...

"On wildcard and fuzzy searches, no text analysis is performed on the
search word. "

taken from http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers


On Thu, Jun 30, 2011 at 10:23 AM, Jamie Johnson  wrote:
   

I'm not familiar with the CharFilters, I'll look into those now.

Is the solr.LowerCaseFilterFactory not handling wildcards the expected
result or is this a bug?

On Wed, Jun 15, 2011 at 4:34 PM, Mike Sokolov  wrote:
 

I wonder whether CharFilters are applied to wildcard terms?  I suspect they
might be.  If that's the case, you could use the MappingCharFilter to
perform lowercasing (and strip diacritics too if you want that)

-Mike

On 06/15/2011 10:12 AM, Jamie Johnson wrote:

So simply lower casing the works but can get complex.  The query that I'm
executing may have things like ranges which require some words to be upper
case (i.e. TO).  I think this would be much better solved on Solrs end, is
there a JIRA about this?

On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolov  wrote:
   

opps, please s/Highlight/Wildcard/

On 06/14/2011 05:31 PM, Mike Sokolov wrote:
 

Wildcard queries aren't analyzed, I think?  I'm not completely sure what
the best workaround is here: perhaps simply lowercasing the query terms
yourself in the application.  Also - I hope someone more knowledgeable will
say that the new HighlightQuery in trunk doesn't have this restriction, but
I'm not sure about that.

-Mike

On 06/14/2011 05:13 PM, Jamie Johnson wrote:
   

Also of interest to me is this returns results
http://localhost:8983/solr/select?defType=lucene&q=Person_Name:Kristine


On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnson
  wrote:

 

I am using the following for my text field:























I have a field defined as


when I execute a go to the following url I get results
http://localhost:8983/solr/select?defType=lucene&q=Person_Name:kris*
but if I do
http://localhost:8983/solr/select?defType=lucene&q=Person_Name:Kris*
I get nothing.  I thought the LowerCaseFilterFactory would have handled
lowercasing both the query and what is being indexed, am I missing
something?

   


   
 


Problems with SolrCloud

2011-06-30 Thread Andrey Sapegin
Dear ladies and gentlemen.

Can I ask you to help me with SolrCloud

1) I try to setup a SolrCloud on 2 computers with 3 Zookepers, but it
fails:(

I need to set Zookeper port to 8001, so I change clientPort=8001 in
solr/zoo.cfg.

When I try the command from the example C, to run shard1, it works:
java -Dbootstrap_confdir=./solr/conf -Dcollection.configName=myconf
-DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900  -jar
start.jar

But if I change it to and try to run shard1:
java -Dbootstrap_confdir=./solr/conf -Dcollection.configName=myconf
-DzkRun -DzkHost=localhost:8001,localhost:8004 -jar start.jar

it fails with the following message:
SEVERE: java.lang.IllegalArgumentException: solr/zoo_data/myid file is
missing

2) to solve it I tried to set
*-Dsolr.solr.home=/data/a.sapegin/SolrCloud/shard1*
(without any slashes in the end)

But then I receive another exception:
"Caused by:
org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException:
Error processing /data/a.sapegin/SolrCloud/shard1//zoo.cfg"

I think this "//" is a bug.


Could you please help?
Thank You in advance,
Kind Regards,

-- 

Andrey Sapegin,
Software Developer,

Unister GmbH
Dittrichring 18-20 | 04109 Leipzig

+49 (0)341 492885069,
+4915778339304,
andrey.sape...@unister-gmbh.de

www.unister.de



RE: Returning total matched document count with SolrJ

2011-06-30 Thread Michael Ryan
SolrDocumentList docs = queryResponse.getResults();
long totalMatches = docs.getNumFound();

-Michael


Returning total matched document count with SolrJ

2011-06-30 Thread Kissue Kissue
Hi,

I am using Solr 3.1 and using the SolrJ client. Does anyone know how i can
get the *TOTAL* number of matched documents returned with the QueryResponse?
I am interested in the total documents matched not just the result returned
with the limit applied. Any help will be appreciated.

Thanks.


Re: Text field case sensitivity problem

2011-06-30 Thread Jamie Johnson
I think my answer is here...

"On wildcard and fuzzy searches, no text analysis is performed on the
search word. "

taken from http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers


On Thu, Jun 30, 2011 at 10:23 AM, Jamie Johnson  wrote:
> I'm not familiar with the CharFilters, I'll look into those now.
>
> Is the solr.LowerCaseFilterFactory not handling wildcards the expected
> result or is this a bug?
>
> On Wed, Jun 15, 2011 at 4:34 PM, Mike Sokolov  wrote:
>> I wonder whether CharFilters are applied to wildcard terms?  I suspect they
>> might be.  If that's the case, you could use the MappingCharFilter to
>> perform lowercasing (and strip diacritics too if you want that)
>>
>> -Mike
>>
>> On 06/15/2011 10:12 AM, Jamie Johnson wrote:
>>
>> So simply lower casing the works but can get complex.  The query that I'm
>> executing may have things like ranges which require some words to be upper
>> case (i.e. TO).  I think this would be much better solved on Solrs end, is
>> there a JIRA about this?
>>
>> On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolov  wrote:
>>>
>>> opps, please s/Highlight/Wildcard/
>>>
>>> On 06/14/2011 05:31 PM, Mike Sokolov wrote:

 Wildcard queries aren't analyzed, I think?  I'm not completely sure what
 the best workaround is here: perhaps simply lowercasing the query terms
 yourself in the application.  Also - I hope someone more knowledgeable will
 say that the new HighlightQuery in trunk doesn't have this restriction, but
 I'm not sure about that.

 -Mike

 On 06/14/2011 05:13 PM, Jamie Johnson wrote:
>
> Also of interest to me is this returns results
> http://localhost:8983/solr/select?defType=lucene&q=Person_Name:Kristine
>
>
> On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnson
>  wrote:
>
>> I am using the following for my text field:
>>
>> > positionIncrementGap="100" autoGeneratePhraseQueries="true">
>> 
>> 
>> 
>> 
>> >                 ignoreCase="true"
>>                 words="stopwords.txt"
>>                 enablePositionIncrements="true"
>>                 />
>> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>> 
>> > protected="protwords.txt"/>
>> 
>> 
>> 
>> 
>> > ignoreCase="true" expand="true"/>
>> >                 ignoreCase="true"
>>                 words="stopwords.txt"
>>                 enablePositionIncrements="true"
>>                 />
>> > generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>> 
>> > protected="protwords.txt"/>
>> 
>> 
>> 
>>
>> I have a field defined as
>> 
>>
>> when I execute a go to the following url I get results
>> http://localhost:8983/solr/select?defType=lucene&q=Person_Name:kris*
>> but if I do
>> http://localhost:8983/solr/select?defType=lucene&q=Person_Name:Kris*
>> I get nothing.  I thought the LowerCaseFilterFactory would have handled
>> lowercasing both the query and what is being indexed, am I missing
>> something?
>>
>>
>>
>


Re: Text field case sensitivity problem

2011-06-30 Thread Jamie Johnson
I'm not familiar with the CharFilters, I'll look into those now.

Is the solr.LowerCaseFilterFactory not handling wildcards the expected
result or is this a bug?

On Wed, Jun 15, 2011 at 4:34 PM, Mike Sokolov  wrote:
> I wonder whether CharFilters are applied to wildcard terms?  I suspect they
> might be.  If that's the case, you could use the MappingCharFilter to
> perform lowercasing (and strip diacritics too if you want that)
>
> -Mike
>
> On 06/15/2011 10:12 AM, Jamie Johnson wrote:
>
> So simply lower casing the works but can get complex.  The query that I'm
> executing may have things like ranges which require some words to be upper
> case (i.e. TO).  I think this would be much better solved on Solrs end, is
> there a JIRA about this?
>
> On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolov  wrote:
>>
>> opps, please s/Highlight/Wildcard/
>>
>> On 06/14/2011 05:31 PM, Mike Sokolov wrote:
>>>
>>> Wildcard queries aren't analyzed, I think?  I'm not completely sure what
>>> the best workaround is here: perhaps simply lowercasing the query terms
>>> yourself in the application.  Also - I hope someone more knowledgeable will
>>> say that the new HighlightQuery in trunk doesn't have this restriction, but
>>> I'm not sure about that.
>>>
>>> -Mike
>>>
>>> On 06/14/2011 05:13 PM, Jamie Johnson wrote:

 Also of interest to me is this returns results
 http://localhost:8983/solr/select?defType=lucene&q=Person_Name:Kristine


 On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnson
  wrote:

> I am using the following for my text field:
>
>  positionIncrementGap="100" autoGeneratePhraseQueries="true">
> 
> 
> 
> 
>                  ignoreCase="true"
>                 words="stopwords.txt"
>                 enablePositionIncrements="true"
>                 />
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
> 
> 
> 
> 
>  ignoreCase="true" expand="true"/>
>                  ignoreCase="true"
>                 words="stopwords.txt"
>                 enablePositionIncrements="true"
>                 />
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
> 
> 
> 
>
> I have a field defined as
> 
>
> when I execute a go to the following url I get results
> http://localhost:8983/solr/select?defType=lucene&q=Person_Name:kris*
> but if I do
> http://localhost:8983/solr/select?defType=lucene&q=Person_Name:Kris*
> I get nothing.  I thought the LowerCaseFilterFactory would have handled
> lowercasing both the query and what is being indexed, am I missing
> something?
>
>
>


Re: Taxonomy faceting

2011-06-30 Thread Toke Eskildsen
On Thu, 2011-06-30 at 11:38 +0200, Russell B wrote:
> a multivalued field labelled category which for each document defines
> where in the tree it should appear.  For example: doc1 has the
> category field set to "0/topics", "1/topics/computing",
> "2/topic/computing/systems".
> 
> I then facet on the 'category' field, filter the results with fq={!raw
> f=category}1/topics/computing to get everything below that point on the
> tree, and use f.category.facet.prefix to restrict the facet fields to the
> current level.

Lucid Imagination did a webcast on this, as far as I remember?

> Playing around with the results, it seems to work ok but despite reading
> lots about faceting I can't help feel there might be a better solution.

The '1/topics/computing'-solution works at a single level, so if you are
interested in a multi-level result like
- topic
 - computing
  - hardware
  - software
 - biology
  - plants
  - animals
you have to do more requests.

> Are there better ways to achieve this?

Taxonomy faceting is a bit of a mess right now, but it is also an area
where a lot is happening. For SOLR, there is

https://issues.apache.org/jira/browse/SOLR-64
(single path/document hierarchical faceting)

https://issues.apache.org/jira/browse/SOLR-792
(pivot faceting, now part of trunk AFAIR)

https://issues.apache.org/jira/browse/SOLR-2412
(multi path/document hierarchical faceting, very experimental)

Just yesterday, another multi path/document hierarchical faceting
solution was added to the Lucene 3.x branch and Lucene trunk. It has
been used by IBM for some time and appears to be mature and stable.
https://issues.apache.org/jira/browse/LUCENE-3079
However, this solution requires a sidecar index for the taxonomy and I
am a bit worried about how this fits into the Solr index workflow.



Re: Multicore clustering setup problem

2011-06-30 Thread Walter Closenfleight
Sure, thanks for having a look!

By the way, if I attempt to hit a solr URL, I get this error, followed by
the stacktrace. If I set abortOnConfigurationError to false (I've found you
must put the setting in both solr.xml and solrconfig.xml for both cores
otherwise you keep getting the error), then the main URL to solr (
http://localhost/solr) lists just the first core.

HTTP Status 500 - Severe errors in solr configuration. Check your log files
for more detailed information on what may be wrong. If you want solr to
continue after configuration errors, change:
false in solr.xml
-
org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.clustering.ClusteringComponent' at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
at

*Tomcat Log:*

INFO: [core1] Added SolrEventListener:
org.apache.solr.core.QuerySenderListener{queries=[{q=solr
rocks,start=0,rows=10}, {q=static firstSearcher warming query from
solrconfig.xml}]}
Jun 30, 2011 8:51:23 AM org.apache.solr.request.XSLTResponseWriter init
INFO: xsltCacheLifetimeSeconds=5
Jun 30, 2011 8:51:23 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.clustering.ClusteringComponent'
 at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
 at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413)
 at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435)
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498)
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492)
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525)
 at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:833)
 at org.apache.solr.core.SolrCore.(SolrCore.java:551)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:428)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278)
 at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
 at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
 at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
 at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
 at
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
 at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)
 at
org.apache.catalina.core.StandardContext.start(StandardContext.java:4450)
 at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
 at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
 at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
 at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:630)
 at
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:556)
 at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:491)
 at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1206)
 at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:314)
 at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
 at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
 at org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
 at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
 at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
 at org.apache.catalina.core.StandardService.start(StandardService.java:516)
 at org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
 at org.apache.catalina.startup.Catalina.start(Catalina.java:583)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288)
 at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413)
Caused by: java.lang.ClassNotFoundException:
org.apache.solr.handler.clustering.ClusteringComponent
 at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:592)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
 at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:247)
 at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:359)
 ...

Re: How to optimize solr indexes

2011-06-30 Thread Ahmet Arslan


--- On Thu, 6/30/11, Romi  wrote:

> From: Romi 
> Subject: Re: How to optimize solr indexes
> To: solr-user@lucene.apache.org
> Date: Thursday, June 30, 2011, 3:01 PM
> and if i want to set it as
> optimize=false then what i need to do ??

When calling import, use dataimport?command=delta-import&optimize=false

See other command available, like clean, commit, entity, etc.
http://wiki.apache.org/solr/DataImportHandler#Commands


Re: AW: Adding german phonetic to solr

2011-06-30 Thread Paul Libbrecht
Jürgen,

clearly the Cologne-phonetic was not yet supported, please read:

http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/analysis/PhoneticFilterFactory.java

one would need  to add the line about Cologne-phonetic and recompile.

It'd make sense to open a jira issue for this.

paul

Le 30 juin 2011 à 14:24, Jürgen Tiedemann a écrit :

> Hi Paul,
> 
> thanks for the quick reply. I replaced commons-codec-1.4.jar with 
> commons-codec-1.5.jar to get the ColognePhonetic. In schema.xml I added
> 
>  inject="true"/>
> 
> but then I get
> 
> org.apache.solr.common.SolrException: Unknown encoder: ColognePhonetic 
> [[CAVERPHONE, SOUNDEX, METAPHONE, DOUBLEMETAPHONE, REFINEDSOUNDEX]].
> 
> How do I get PhoneticFilterFactory to know ColognePhonetic? Or is my approach 
> completely wrong?
> 
> Jürgen
> 
> 
> 
> 
> 
> 
> 
> Von: Paul Libbrecht 
> An: solr-user@lucene.apache.org
> Gesendet: Donnerstag, den 30. Juni 2011, 12:09:18 Uhr
> Betreff: Re: Adding german phonetic to solr
> 
> Jürgen,
> 
> I haven't had the time to deploy it but i heard about "Kölner Phonetik" that 
> was 
> to be contributed as part of apache-commons-codec.
> It probably still is just a patch in a jira issue.
>https://issues.apache.org/jira/browse/CODEC-106
> The contribution was posted to commons-dev on september 15th 2010.
> 
> Bringing this reachable into Solr would be interesting but it's a bit of work.
> 
> We have used the Double-Metaphone indexer with Lucene with reasonable success 
> in 
> ActiveMath but it was not as fine as the Kölner analyzer and fine-graininess 
> is 
> really a desirable feature of a phonetic environment.
> You might want to also care for all the "proper nouns" around for which 
> tradition phonetics is doomed to fail if, at least, your texts are a bit with 
> international names!
> 
> paul
> 
> 
> Le 30 juin 2011 à 11:58, Jürgen Tiedemann a écrit :
> 
>> Hi all,
>> 
>> does solar support german phonetic? Searching for "how to add german 
>> phonetic 
>> to 
>> 
>> solr" on google does not deliver good results, just lots of JIRA stuff. I 
>> searched for "cologne phonetic" too. The wikis 
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=%28phonetic%29#solr.PhoneticFilterFactory
>> y
>> and http://wiki.apache.org/solr/LanguageAnalysis#German haven't also 
>> answered 
>> my question. Please, can someone tell me how to do it or where to look for 
>> appropriate information.
>> 
>> Nice regards
>> 
>> Jürgen



Re: Looking for Custom Highlighting guidance

2011-06-30 Thread Mike Sokolov
It's going to be a bit complicated, but I would start by looking at 
providing a facility for merging an array of FieldTermStacks. The 
constructor for FieldTermStack() takes a fieldName and builds up a list 
of TermInfos (terms with positions and offsets): I *think* that if you 
make two of these, merge them, and hand that to the FieldPhraseList 
constructor (this is done in the main FVH class), you should get what 
you want.  This is a bit speculative; I haven't tried it.


-Mike

On 06/30/2011 08:26 AM, Jamie Johnson wrote:

Thanks for the suggestion Mike, I will give that a shot.  Having no
familiarity with FastVectorHighlighter is there somewhere specific I
should be looking?

On Wed, Jun 29, 2011 at 3:20 PM, Mike Sokolov  wrote:
   

Does the phonetic analysis preserve the offsets of the original text field?

If so, you should probably be able to hack up FastVectorHighlighter to do what 
you want.

-Mike

On 06/29/2011 02:22 PM, Jamie Johnson wrote:
 

I have a schema with a text field and a text_phonetic field and would like
to perform highlighting on them in such a way that the tokens that match are
combined.  What would be a reasonable way to accomplish this?


   


Re: Looking for Custom Highlighting guidance

2011-06-30 Thread Jamie Johnson
Thanks for the suggestion Mike, I will give that a shot.  Having no
familiarity with FastVectorHighlighter is there somewhere specific I
should be looking?

On Wed, Jun 29, 2011 at 3:20 PM, Mike Sokolov  wrote:
>
> Does the phonetic analysis preserve the offsets of the original text field?
>
> If so, you should probably be able to hack up FastVectorHighlighter to do 
> what you want.
>
> -Mike
>
> On 06/29/2011 02:22 PM, Jamie Johnson wrote:
>>
>> I have a schema with a text field and a text_phonetic field and would like
>> to perform highlighting on them in such a way that the tokens that match are
>> combined.  What would be a reasonable way to accomplish this?
>>
>>


AW: Adding german phonetic to solr

2011-06-30 Thread Jürgen Tiedemann
Hi Paul,

thanks for the quick reply. I replaced commons-codec-1.4.jar with 
commons-codec-1.5.jar to get the ColognePhonetic. In schema.xml I added



but then I get

org.apache.solr.common.SolrException: Unknown encoder: ColognePhonetic 
[[CAVERPHONE, SOUNDEX, METAPHONE, DOUBLEMETAPHONE, REFINEDSOUNDEX]].

How do I get PhoneticFilterFactory to know ColognePhonetic? Or is my approach 
completely wrong?

Jürgen







Von: Paul Libbrecht 
An: solr-user@lucene.apache.org
Gesendet: Donnerstag, den 30. Juni 2011, 12:09:18 Uhr
Betreff: Re: Adding german phonetic to solr

Jürgen,

I haven't had the time to deploy it but i heard about "Kölner Phonetik" that 
was 
to be contributed as part of apache-commons-codec.
It probably still is just a patch in a jira issue.
https://issues.apache.org/jira/browse/CODEC-106
The contribution was posted to commons-dev on september 15th 2010.

Bringing this reachable into Solr would be interesting but it's a bit of work.

We have used the Double-Metaphone indexer with Lucene with reasonable success 
in 
ActiveMath but it was not as fine as the Kölner analyzer and fine-graininess is 
really a desirable feature of a phonetic environment.
You might want to also care for all the "proper nouns" around for which 
tradition phonetics is doomed to fail if, at least, your texts are a bit with 
international names!

paul


Le 30 juin 2011 à 11:58, Jürgen Tiedemann a écrit :

> Hi all,
> 
> does solar support german phonetic? Searching for "how to add german phonetic 
>to 
>
> solr" on google does not deliver good results, just lots of JIRA stuff. I 
> searched for "cologne phonetic" too. The wikis 
>http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=%28phonetic%29#solr.PhoneticFilterFactory
>y
> and http://wiki.apache.org/solr/LanguageAnalysis#German haven't also answered 
> my question. Please, can someone tell me how to do it or where to look for 
> appropriate information.
> 
> Nice regards
> 
> Jürgen

Re: How to optimize solr indexes

2011-06-30 Thread Romi
and if i want to set it as optimize=false then what i need to do ??

-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-optimize-solr-indexes-tp3125293p3125474.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Fuzzy Query Param

2011-06-30 Thread Michael McCandless
Good question... I think in Lucene 4.0, the edit distance is (will be)
in Unicode code points, but in past releases, it's UTF16 code units.

Mike McCandless

http://blog.mikemccandless.com

2011/6/30 Floyd Wu :
> if this is edit distance implementation, what is the result apply to CJK
> query? For example, "您好"~3
>
> Floyd
>
>
> 2011/6/30 entdeveloper 
>
>> I'm using Solr trunk.
>>
>> If it's levenstein/edit distance, that's great, that's what I want. It just
>> didn't seem to be officially documented anywhere so I wanted to find out
>> for
>> sure. Thanks for confirming.
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Fuzzy-Query-Param-tp3120235p3122418.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>


Re: How to optimize solr indexes

2011-06-30 Thread Ahmet Arslan
> when i run solr/admin page i got this
> information, it shows optimize=true,
> but i have not set optimize=true in configuration file than
> how it is
> optimizing the indexes. and how can i set it to false then
> .
> 
> 
> /Schema Information
> 
>     Unique Key: UID_PK
> 
>     Default Search Field: text
> 
>     numDocs: 2881
> 
>     maxDoc: 2881
> 
>     numTerms: 41960
> 
>     version: 1309429290159
> 
>     optimized: true
> 
>     current: true
> 
>     hasDeletions: false
> 
>     directory:
> org.apache.lucene.store.SimpleFSDirectory:org.apache.lucene.store.SimpleFSDirectory@
> C:\apache-solr-1.4.0\example\example-DIH\solr\db\data\index
> 
>     lastModified: 2011-06-30T10:25:04.89Z/
> 

It seems that you are using DIH. By default both delta and full import issues 
an optimize at the end.


Re: Taxonomy faceting

2011-06-30 Thread darren
That's a good way. How does it perform?

Another way would be to store the "parent" topics in a field.
Whenever a parent node is drilled-into, simply search for all documents
with that parent. Perhaps not as elegant as your approach though.

I'd be interested in the performance comparison between the two approaches.

> I have a hierarchical taxonomy of documents that I would like users to be
> able to search either through search or "drill-down" faceting.  The
> documents may appear at multiple points in the hierarchy.  I've got a
> solution working as follows: a multivalued field labelled category which
> for
> each document defines where in the tree it should appear.  For example:
> doc1
> has the category field set to "0/topics", "1/topics/computing",
> "2/topic/computing/systems".
>
> I then facet on the 'category' field, filter the results with fq={!raw
> f=category}1/topics/computing to get everything below that point on the
> tree, and use f.category.facet.prefix to restrict the facet fields to the
> current level.
>
> Full query something like:
>
> http://localhost:8080/solr/select/?q=something&facet=true&facet.field=category&fq={!rawf=category}1/topics/computing&f.category.facet.prefix=2/topic/computing
>
>
> Playing around with the results, it seems to work ok but despite reading
> lots about faceting I can't help feel there might be a better solution.
> Are
> there better ways to achieve this?  Any comments/suggestions are welcome.
>
> (Any suggestions as to what interface I can put on top of this are also
> gratefully received!).
>
>
> Thanks,
>
> Russell
>



How to optimize solr indexes

2011-06-30 Thread Romi
when i run solr/admin page i got this information, it shows optimize=true,
but i have not set optimize=true in configuration file than how it is
optimizing the indexes. and how can i set it to false then .


/Schema Information

Unique Key: UID_PK

Default Search Field: text

numDocs: 2881

maxDoc: 2881

numTerms: 41960

version: 1309429290159

optimized: true

current: true

hasDeletions: false

directory:
org.apache.lucene.store.SimpleFSDirectory:org.apache.lucene.store.SimpleFSDirectory@
C:\apache-solr-1.4.0\example\example-DIH\solr\db\data\index

lastModified: 2011-06-30T10:25:04.89Z/


-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-optimize-solr-indexes-tp3125293p3125293.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Adding german phonetic to solr

2011-06-30 Thread Paul Libbrecht
Jürgen,

I haven't had the time to deploy it but i heard about "Kölner Phonetik" that 
was to be contributed as part of apache-commons-codec.
It probably still is just a patch in a jira issue.
https://issues.apache.org/jira/browse/CODEC-106
The contribution was posted to commons-dev on september 15th 2010.

Bringing this reachable into Solr would be interesting but it's a bit of work.

We have used the Double-Metaphone indexer with Lucene with reasonable success 
in ActiveMath but it was not as fine as the Kölner analyzer and fine-graininess 
is really a desirable feature of a phonetic environment.
You might want to also care for all the "proper nouns" around for which 
tradition phonetics is doomed to fail if, at least, your texts are a bit with 
international names!

paul


Le 30 juin 2011 à 11:58, Jürgen Tiedemann a écrit :

> Hi all,
> 
> does solar support german phonetic? Searching for "how to add german phonetic 
> to 
> solr" on google does not deliver good results, just lots of JIRA stuff. I 
> searched for "cologne phonetic" too. The wikis 
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=%28phonetic%29#solr.PhoneticFilterFactory
> and http://wiki.apache.org/solr/LanguageAnalysis#German haven't also answered 
> my question. Please, can someone tell me how to do it or where to look for 
> appropriate information.
> 
> Nice regards
> 
> Jürgen



Adding german phonetic to solr

2011-06-30 Thread Jürgen Tiedemann
Hi all,

does solar support german phonetic? Searching for "how to add german phonetic 
to 
solr" on google does not deliver good results, just lots of JIRA stuff. I 
searched for "cologne phonetic" too. The wikis 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=%28phonetic%29#solr.PhoneticFilterFactory
 and http://wiki.apache.org/solr/LanguageAnalysis#German haven't also answered 
my question. Please, can someone tell me how to do it or where to look for 
appropriate information.

Nice regards

Jürgen


How to use solr clustering to show in search results

2011-06-30 Thread Romi
wanted to use clustering in my search results, i configured solr for
clustering and i got following json for clusters. But i am not getting how
to use it to show in search results. as corresponding to one doc i have
number of fields and up till now i am showing name, description and id. now
in clusters i have labels and doc id. then how to use my docs in clusters, i
am really confused what to do Please reply. 

*
"clusters":[

{
   "labels":[
   "Complement any Business Casual or Semi-formal
Attire"
],
   "docs":[
"7799",
"7801"
]
  },
{
   "labels":[
"Design"
],
   "docs":[
"8252",
"7885"
]
  },
{
   "labels":[
"Elegant Ring has an Akoya Cultured Pearl"
],
   "docs":[
"8142",
"8139"
]
  },
{
   "labels":[
"Feel Amazing in these Scintillating Earrings
Perfect"
],
   "docs":[
"12250",
"12254"
]
  },
{
   "labels":[
"Formal Evening Attire"
],
   "docs":[
"8151",
"8004"
]
  },
{
   "labels":[
"Pave Set"
],
   "docs":[
"7788",
"8169"
]
  },
{
   "labels":[
"Subtle Look or Layer it or Attach"
],
   "docs":[
"8014",
"8012"
]
  },
   {
   "labels":[
"Three-stone Setting is Elegant and Fun"
],
   "docs":[
"8335",
"8337"
]
  },
{
   "labels":[
"Other Topics"
],
   "docs":[
"8038",
"7850",
"7795",
"7989",
"7797"
]
  {
]*


-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-use-solr-clustering-to-show-in-search-results-tp3125149p3125149.html
Sent from the Solr - User mailing list archive at Nabble.com.


MergerFacor effect on indexes

2011-06-30 Thread Romi
my solrconfig.xml configuration is as :

   false
32
5
10
1
false
  



my solrconfig.xml configuration is as :

*
   false
32
5
10
1
false
  *

and index size is 12mb. but when i change my mergeFactor i am not finding
any effect in my indexes., ie. the no of segments are exactly same. i am not
getting which configuration will effect the no of segments. as i suppose it
is mergefactor. and my next problem is which configuration defines the
number of docs per segments and what will be the size of this segment so
that next segments will be created

please make me clear about these points


-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/MergerFacor-effect-on-indexes-tp3125146p3125146.html
Sent from the Solr - User mailing list archive at Nabble.com.


Taxonomy faceting

2011-06-30 Thread Russell B
I have a hierarchical taxonomy of documents that I would like users to be
able to search either through search or "drill-down" faceting.  The
documents may appear at multiple points in the hierarchy.  I've got a
solution working as follows: a multivalued field labelled category which for
each document defines where in the tree it should appear.  For example: doc1
has the category field set to "0/topics", "1/topics/computing",
"2/topic/computing/systems".

I then facet on the 'category' field, filter the results with fq={!raw
f=category}1/topics/computing to get everything below that point on the
tree, and use f.category.facet.prefix to restrict the facet fields to the
current level.

Full query something like:

http://localhost:8080/solr/select/?q=something&facet=true&facet.field=category&fq={!rawf=category}1/topics/computing&f.category.facet.prefix=2/topic/computing


Playing around with the results, it seems to work ok but despite reading
lots about faceting I can't help feel there might be a better solution.  Are
there better ways to achieve this?  Any comments/suggestions are welcome.

(Any suggestions as to what interface I can put on top of this are also
gratefully received!).


Thanks,

Russell


Re: conditionally update document on unique id

2011-06-30 Thread Shalin Shekhar Mangar
On Thu, Jun 30, 2011 at 2:06 AM, Yonik Seeley wrote:

> On Wed, Jun 29, 2011 at 4:32 PM, eks dev  wrote:
> > req.getSearcher().getFirstMatch(t) != -1;
>
> Yep, this is currently the fastest option we have.
>
>
Just for my understanding, this method won't use any caches but still may be
faster across repeated runs for the same token? I'm asking because Eks said
that they have 50%-55% duplicate documents.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Fuzzy Query Param

2011-06-30 Thread Floyd Wu
if this is edit distance implementation, what is the result apply to CJK
query? For example, "您好"~3

Floyd


2011/6/30 entdeveloper 

> I'm using Solr trunk.
>
> If it's levenstein/edit distance, that's great, that's what I want. It just
> didn't seem to be officially documented anywhere so I wanted to find out
> for
> sure. Thanks for confirming.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Fuzzy-Query-Param-tp3120235p3122418.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>