Re: Multicore clustering setup problem

2011-06-29 Thread Stanislaw Osinski
Hi,

Can you post the full strack trace? I'd need to know if it's
really org.apache.solr.handler.clustering.ClusteringComponent that's missing
or some other class ClusteringComponent depends on.

Cheers,

Staszek

On Thu, Jun 30, 2011 at 04:19, Walter Closenfleight <
walter.p.closenflei...@gmail.com> wrote:

> I had set up the clusteringComponent in solrconfig.xml for my first core.
> It
> has been working fine and now I want to get my next core working. I set up
> the second core with the clustering component so that I could use it, use
> solritas properly, etc. but Solr did not like the solrconfig.xml changes
> for
> the second core. I'm getting this error when Solr is started or when I hit
> a
> Solr related URL:
>
> SEVERE: org.apache.solr.common.SolrException: Error loading class
> 'org.apache.solr.handler.clustering.ClusteringComponent'
>
> Should the clusteringComponent be set up in a shared configuration file
> somehow or is there something else I am doing wrong?
>
> Thanks in advance!
>


Re: what is solr clustering component

2011-06-29 Thread Stanislaw Osinski
>
> and my second question is does clustering effect indexes.
>

No, it doesn't. Clustering is performed only on the search results produced
by Solr, it doesn't change anything in the index.

Cheers,

Staszek


Re: what is solr clustering component

2011-06-29 Thread Romi
thanks iorixxx, i changed my configuration to include clustering in search
results. in my xml format search results i got a tag , to show
this clusters in to search results do i need to parse this xml.
and my second question is does clustering effect indexes. 

-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-is-solr-clustering-component-tp3121484p3124627.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 3.2 filter cache warming taking longer than 1.4.1

2011-06-29 Thread Shawn Heisey

On 6/29/2011 7:50 PM, Yonik Seeley wrote:

OK, your filter queries have hundreds of terms in them (and that means
hundreds of term lookups, which uses the term index).
Thus, your termIndexInterval change is be the leading suspect for the
slowdown.  A termIndexInterval of 1024 means that
a term lookup will seek to the closest 1024th term and then call
next() until the desired term is found.  Hence instead of calling
next()
an average of 64 times internally, it's now 512 times.

Of course there is still a mystery about why your tii (which is the
term index) would be so much bigger instead of smaller...


It turns out I got the two indexes backwards, the smaller one was the 
new index.  I may have mixed up the indexes on some of the other files 
too, but they weren't much different, so I'm not going to try and figure 
out where any mistakes might be.


Earlier in the afternoon I figured this out, removed termIndexInterval 
from my config, and rebuilt the index.  I had originally put this in to 
speed up indexing.  The evidence I had available at the time told me 
that this goal was accomplished, but the rebuild actually went faster 
without the statement.  Warming times are now averaging under 10 seconds 
even with the warmup count back up to 8.  This is still slower than I 
would like, but it is a major improvement.  Even more important, I 
understand what happened.


I was thinking perhaps I might actually decrease the termIndexInterval 
value below the default of 128.  I know from reading the Hathi Trust 
blog that memory usage for the tii file is much more than the size of 
the file would indicate, but if I increase it from 13MB to 26MB, it 
probably would still be OK.


Are any index intervals for the other Lucene files configurable in a 
similar manner?  I know that screwing too much with the defaults can 
make things much worse, so I would be very careful with any adjustments, 
and try to fully understand why any performance gain or loss occurred.


Thanks,
Shawn



Multicore clustering setup problem

2011-06-29 Thread Walter Closenfleight
I had set up the clusteringComponent in solrconfig.xml for my first core. It
has been working fine and now I want to get my next core working. I set up
the second core with the clustering component so that I could use it, use
solritas properly, etc. but Solr did not like the solrconfig.xml changes for
the second core. I'm getting this error when Solr is started or when I hit a
Solr related URL:

SEVERE: org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.clustering.ClusteringComponent'

Should the clusteringComponent be set up in a shared configuration file
somehow or is there something else I am doing wrong?

Thanks in advance!


Re: Solr 3.2 filter cache warming taking longer than 1.4.1

2011-06-29 Thread Yonik Seeley
On Wed, Jun 29, 2011 at 3:28 PM, Yonik Seeley
 wrote:
>
> On Wed, Jun 29, 2011 at 1:43 PM, Shawn Heisey  wrote:
> > Just now, three of the six shards had documents deleted, and they took
> > 29.07, 27.57, and 28.66 seconds to warm.  The 1.4.1 counterpart to the 29.07
> > second one only took 4.78 seconds, and it did twice as many autowarm
> > queries.
>
> Can you post the logs at the INFO level that covers the warming period?

OK, your filter queries have hundreds of terms in them (and that means
hundreds of term lookups, which uses the term index).
Thus, your termIndexInterval change is be the leading suspect for the
slowdown.  A termIndexInterval of 1024 means that
a term lookup will seek to the closest 1024th term and then call
next() until the desired term is found.  Hence instead of calling
next()
an average of 64 times internally, it's now 512 times.

Of course there is still a mystery about why your tii (which is the
term index) would be so much bigger instead of smaller...

-Yonik
http://www.lucidimagination.com


Re: Building a facet search filter frontend in XSLT

2011-06-29 Thread lee carroll
Hi Filype,

in the response you should have a list of fq arguments something like

field:facetValue
field:FacetValue


use this to set your inputs to be selected / checked



On 29 June 2011 23:54, Filype Pereira  wrote:
> Hi all,
> I am looking for some help in building a front end facet filter using XSLT.
> The code I use is: http://pastebin.com/xVv9La9j
> On the image attached, the checkbox should be selected. (You clicked and
> submited the facet form. The URL changed)
> I can use xsl:if, but there's nothing that I can use on the XML that will
> let me test before outputting the input checkbox.
> Has anyone done any similar thing?
> I haven't seen any examples building a facet search filter frontend in XSLT,
> the example.xsl that comes with solr is pretty basic, are there any other
> examples in XSLT implementing the facet filters around?
> Thanks,
> Filype
>


Re: How to Create a weighted function (dismax or otherwise)

2011-06-29 Thread Ahmet Arslan
> Are there any best practices or
> preferred ways to accomplish what I am
> trying? 

People usually prefer multiplicative boosting. But in your case you want 
additive boosting. Dismax's bf is additive. 

There is also _val_ hook. http://wiki.apache.org/solr/SolrQuerySyntax

> Do the params for defType, qf and bf belong in a solr
> request handler? 

They can be defined in defaults section in request handler as well as via query 
parameters. q=test&pf=...&qf=...

> Is it possible to have the weights as variables so they can
> be tweaked till
> we find the optimum balance in showing our results?

Yes you can try different settings on the fly using query parameters.




Building a facet search filter frontend in XSLT

2011-06-29 Thread Filype Pereira
Hi all,

I am looking for some help in building a front end facet filter using XSLT.

The code I use is: http://pastebin.com/xVv9La9j

On the image attached, the checkbox should be selected. (You clicked and
submited the facet form. The URL changed)

I can use xsl:if, but there's nothing that I can use on the XML that will
let me test before outputting the input checkbox.

Has anyone done any similar thing?

I haven't seen any examples building a facet search filter frontend in XSLT,
the example.xsl that comes with solr is pretty basic, are there any other
examples in XSLT implementing the facet filters around?

Thanks,

Filype


Re: After the query component has the results, can I do more filtering on them?

2011-06-29 Thread Ahmet Arslan
> So I made a custom search component
> which runs right after the query
> component and this custom component will update the score
> of each based on
> some things (and no, I definitely can't use existing
> components).  I didn't
> see any easy way to just update the score so what I
> currently do is
> something like this:
> 
>                
> DocList docList = rb.getResults().docList;
>             float[]
> scores = new float[docList.size()];
>         int[] docs = new
> int[docList.size()];
>         int docCounter = 0;
>         int maxScore = 0;
>         
>         while
> (docList.iterator().hasNext()) {
>            
> int userId = docList.iterator().nextDoc();
>            
> int score = userIdsToScore.get(userId);
>             
>            
> scores[docCounter] = score;
>            
> docs[docCounter] = userId;
>            
> docCounter++;
>            
>     
>             if
> (maxScore < score) {
>            
>     maxScore = score;
>             }
>         }
>         docList = new
> DocSlice(0, docCounter, docs, scores, 0, maxScore);
> 
> my userIdsToScore hashtable is how I'm determining the new
> score.  There are
> a few other things I'm doing but this is the gist. 
> I'm also not sure how to
> go about sorting this...but basically my question is, is
> this how I should
> be updating the score of the documents?

This way you are just updating the scores cosmetically. i.e. they are not 
sorted by score anymore. Plus with this approach you can only process start + 
rows many documents at maximum. Obtaining the whole result set is not an option.

If you have some mapping like userIdsToScore, may be you can use 
ExternalFileField combined with FunctionQueries to influence score.


Re: Sorting by vale of field

2011-06-29 Thread Judioo
Thanks,
Yes this is the work around I am currently doing.
 Still wondering is the sort method can be used alone.




On 29 June 2011 18:34, Michael Ryan  wrote:

> You could try adding a new int field (like "typeSort") that has the desired
> sort values. So when adding a document with type:car, also add typeSort:1;
> when adding type:van, also add typeSort:2; etc. Then you could do
> "sort=typeSort asc" to get them in your desired order.
>
> I think this is also possible with custom function queries, but I've never
> done that.
>
> -Michael
>


Re: After the query component has the results, can I do more filtering on them?

2011-06-29 Thread arian487
bump

--
View this message in context: 
http://lucene.472066.n3.nabble.com/After-the-query-component-has-the-results-can-I-do-more-filtering-on-them-tp3114775p3123502.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: conditionally update document on unique id

2011-06-29 Thread eks dev
Hi Yonik,
as this recommendation comes from you, I am not going to test it, you
are well known as a speed junkie ;)

When we are there (in SignatureUpdateProcessor), why is this code not
moved to the constructor, but remains in processAdd

...
Signature sig = (Signature)
req.getCore().getResourceLoader().newInstance(signatureClass);
sig.init(params);
...
Should we be expecting on the fly signatureClass changes / params? I
am still not all that familiar with solr life cycles... might be
stupid question.

Thanks,
eks


On Wed, Jun 29, 2011 at 10:36 PM, Yonik Seeley
 wrote:
> On Wed, Jun 29, 2011 at 4:32 PM, eks dev  wrote:
>> req.getSearcher().getFirstMatch(t) != -1;
>
> Yep, this is currently the fastest option we have.
>
> -Yonik
> http://www.lucidimagination.com
>


Re: conditionally update document on unique id

2011-06-29 Thread Yonik Seeley
On Wed, Jun 29, 2011 at 4:32 PM, eks dev  wrote:
> req.getSearcher().getFirstMatch(t) != -1;

Yep, this is currently the fastest option we have.

-Yonik
http://www.lucidimagination.com


Re: conditionally update document on unique id

2011-06-29 Thread eks dev
Thanks Shalin!

would you not expect

req.getSearcher().docFreq(t);

to be slightly faster? Or maybe even

req.getSearcher().getFirstMatch(t) != -1;

which one should be faster, any known side effects?




On Wed, Jun 29, 2011 at 1:45 PM, Shalin Shekhar Mangar
 wrote:
> On Wed, Jun 29, 2011 at 2:01 AM, eks dev  wrote:
>
>> Quick question,
>> Is there a way with solr to conditionally update document on unique
>> id? Meaning, default, add behavior if id is not already in index and
>> *not to touch index" if already there.
>>
>> Deletes are not important (no sync issues).
>>
>> I am asking because I noticed with deduplication turned on,
>> index-files get modified even if I update the same documents again
>> (same signatures).
>> I am facing very high dupes rate (40-50%), and setup is going to be
>> master-slave with high commit rate (requirement is to reduce
>> propagation latency for updates). Having unnecessary index
>> modifications is going to waste  "effort" to ship the same information
>> again and again.
>>
>> if there is no standard way, what would be the fastest way to check if
>> Term exists in index from UpdateRequestProcessor?
>>
>>
> I'd suggest that you use the searcher's getDocSet with a TermQuery.
>
> Use the SolrQueryRequest#getSearcher so you don't need to worry about ref
> counting.
>
> e.g. req.getSearcher().getDocSet(new TermQuery(new Term(signatureField,
> sigString))).size();
>
>
>
>> I intend to extend SignatureUpdateProcessor to prevent a document from
>> propagating down the chain if this happens?
>> Would that be a way to deal with it? I repeat, there are no deletes to
>> make headaches with synchronization
>>
>>
> Yes, that should be fine.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Writing SolrPlugin example

2011-06-29 Thread Ravi Prakash
Is there a Solr plugin example similar to Nutch's(
http://wiki.apache.org/nutch/WritingPluginExample) example? I found was a
SolrPlugin(http://wiki.apache.org/solr/SolrPlugins) wiki page but it didn't
have any example code. It would be helpful if there was a concrete example
that would explain how to write, compile and build a custom plugin.

Thanks,
Ravi


Re: Solr 3.2 filter cache warming taking longer than 1.4.1

2011-06-29 Thread Yonik Seeley
On Wed, Jun 29, 2011 at 1:43 PM, Shawn Heisey  wrote:
> Just now, three of the six shards had documents deleted, and they took
> 29.07, 27.57, and 28.66 seconds to warm.  The 1.4.1 counterpart to the 29.07
> second one only took 4.78 seconds, and it did twice as many autowarm
> queries.

Can you post the logs at the INFO level that covers the warming period?

-Yonik
http://www.lucidimagination.com


Re: Looking for Custom Highlighting guidance

2011-06-29 Thread Mike Sokolov

Does the phonetic analysis preserve the offsets of the original text field?

If so, you should probably be able to hack up FastVectorHighlighter to 
do what you want.


-Mike

On 06/29/2011 02:22 PM, Jamie Johnson wrote:

I have a schema with a text field and a text_phonetic field and would like
to perform highlighting on them in such a way that the tokens that match are
combined.  What would be a reasonable way to accomplish this?

   


Strip Punctuation From Field

2011-06-29 Thread Curtis Wilde
>From all I've read, using something like PatternReplaceFilterFactory allows
you to replace / remove text in an index, but is there anything similar that
allows manipulation of the text in the associated field? For example, if I
pulled a status from Twitter like, "Hi, this is a #hashtag." I would like to
remove the "#" from that string and use it for both the index, and also the
field value that is returned from a query, i.e., "Hi, this is a hashtag".


Re: Default schema - 'keywords' not multivalued

2011-06-29 Thread Chris Hostetter

: The problem with TikaEntityProcessor is this installation is still running
: v1.4.1 so I'll need to upgrade.
: 
: Any short and sweet instructions for upgrading to 3.2?  I have a pretty
: straight forward Tomcat install, would just dropping in the new war suffice?

It should be fairly straight forward, check the instructions in 
CHANGES.txt for any potential gotchas.

I posted a writtup a while back on upgrading from 1.4 to 3.1 from a user 
perspective...

http://www.lucidimagination.com/blog/2011/04/01/solr-powered-isfdb-part-8/



-Hoss


Looking for Custom Highlighting guidance

2011-06-29 Thread Jamie Johnson
I have a schema with a text field and a text_phonetic field and would like
to perform highlighting on them in such a way that the tokens that match are
combined.  What would be a reasonable way to accomplish this?


Re: Custom Query Processing

2011-06-29 Thread Jamie Johnson
Anything is an option, but I think I found another way.  I am going to add a
new SearchComponent which reads some additional query parameters and builds
the appropriate filter.


On Tue, Jun 28, 2011 at 2:07 PM, Dmitry Kan  wrote:

> You should modify the SolrCore for this, if I'm not mistaken.
>
> Would extending LuceneQParserPlugin (solr 1.4) be an option for you?
>
> On Tue, Jun 28, 2011 at 12:25 AM, Jamie Johnson  wrote:
>
> > I have a need to take an incoming solr query and apply some additional
> > constraints to it on the Solr end.  Our previous implementation used a
> > QueryWrapperFilter along with some custom code to build a new Filter from
> > the query provided.  How can we plug this filter into Solr?
> >
>
>
>
> --
> Regards,
>
> Dmitry Kan
>


RE: Solr just 'hangs' under load test - ideas?

2011-06-29 Thread Bob Sandiford
OK - I figured it out.  It's not solr at all (and I'm not really surprised).

In the prototype benchmarks, we used a different instance of tomcat than we're 
using for production load tests.  Our prototype tomcat instance had no 
maxThreads value set, so was using the default value of 200.  The production 
tomcat environment has a maxThreads value of 15 - we were just running out of 
threads and getting connection refused exceptions thrown when we ramped up the 
Solr hits past a certain level.

Thanks for considering, Yonik (and any others waiting to see any reply I 
made)...

(As others have said - this listserv is great!)

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com


> -Original Message-
> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
> Seeley
> Sent: Wednesday, June 29, 2011 12:18 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr just 'hangs' under load test - ideas?
> 
> Can you get a thread dump to see what is hanging?
> 
> -Yonik
> http://www.lucidimagination.com
> 
> On Wed, Jun 29, 2011 at 11:45 AM, Bob Sandiford
>  wrote:
> > Hi, all.
> >
> > I'm hoping someone has some thoughts here.
> >
> > We're running Solr 3.1 (with the patch for SolrQueryParser.java to
> not do the getLuceneVersion() calls, but use luceneMatchVersion
> directly).
> >
> > We're running in a Tomcat instance, 64 bit Java.  CATALINA_OPTS are:
> -Xmx7168m -Xms7168m -XX:MaxPermSize=256M
> >
> > We're running 2 Solr cores, with the same schema.
> >
> > We use SolrJ to run our searches from a Java app running in JBoss.
> >
> > JBoss, Tomcat, and the Solr Index folders are all on the same server.
> >
> > In case it's relevant, we're using JMeter as a load test harness.
> >
> > We're running on Solaris, a 16 processor box with 48GB physical
> memory.
> >
> > I've run a successful load test at a 100 user load (at that rate
> there are about 5-10 solr searches / second), and solr search responses
> were coming in under 100ms.
> >
> > When I tried to ramp up, as far as I can tell, Solr is just hanging.
>  (We have some logging statements around the SolrJ calls - just before,
> we log how long our query construction takes, then we run the SolrJ
> query and log the search times.  We're getting a number of the query
> construction logs, but no corresponding search time logs).
> >
> > Symptoms:
> > The Tomcat and JBoss processes show as well under 1% CPU, and they
> are still the top processes.  CPU states show around 99% idle.   RES
> usage for the two Java processes around 3GB each.  LWP under 120 for
> each.  STATE just shows as sleep.  JBoss is still 'alive', as I can get
> into a piece of software that talks to our JBoss app to get data.
> >
> > We set things up to use log4j logging for Solr - the log isn't
> showing any errors or exceptions.
> >
> > We're not indexing - just searching.
> >
> > Back in January, we did load testing on a prototype, and had no
> problems (though that was Solr 1.4 at the time).  It ramped up
> beautifully - bottle necks were our apps, not Solr.  What I'm
> benchmarking now is a descendent of that prototyping - a bit more
> complex on searches and more fields in the schema, but same basic
> search logic as far as SolrJ usage.
> >
> > Any ideas?  What else to look at?  Ringing any bells?
> >
> > I can send more details if anyone wants specifics...
> >
> > Bob Sandiford | Lead Software Engineer | SirsiDynix
> > P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
> > www.sirsidynix.com
> >
> >




Re: How to Create a weighted function (dismax or otherwise)

2011-06-29 Thread aster
Are there any best practices or preferred ways to accomplish what I am
trying? 

Do the params for defType, qf and bf belong in a solr request handler? 

Is it possible to have the weights as variables so they can be tweaked till
we find the optimum balance in showing our results?

Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-Create-a-weighted-function-dismax-or-otherwise-tp3119977p3122630.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Methods for preserving text entities?

2011-06-29 Thread Walter Closenfleight
Ah, I think I suddenly answered my own question, but appreciate further
insight if you have it. I converted the & in &myword; to an & so it
looks like this:

 Solr is a really &myword; search engine!



On Wed, Jun 29, 2011 at 12:40 PM, Walter Closenfleight <
walter.p.closenflei...@gmail.com> wrote:

> We have some text entities in fields to index (and search) like so:
>
> Solr is a really &myword; search engine!
>
> I would like to preserve/protect &myword; and not resolve it in the
> indexing or search results.
>
> What sort of methods have people used? I realize the results are returned
> in XML format, so preserving these text entities may be hard. Are people
> replacing the "&" character or doing something else?
>
> Thanks in advance!
>


Re: Solr 3.2 filter cache warming taking longer than 1.4.1

2011-06-29 Thread Shawn Heisey

On 6/29/2011 11:27 AM, Shawn Heisey wrote:

On 6/29/2011 9:17 AM, Yonik Seeley wrote:
Hmmm, you could comment out the query and filter caches on both 1.4.1 
and 3.2
and then run some of the queries to see if you can figure out which 
are slower?


Do any of the queries have stopwords in fields where you now index
those?  If so, that could entirely account for the difference.


The query cache warms very quickly, it's the filter cache that's 
taking forever.  I'm not intimately familiar with what is being put in 
our filter queries by our webapp, but I'd be a little surprised if 
there are stopwords there.  A quick grep through solr logs (when I've 
turned it up to INFO) for the really common ones didn't reveal any.  
People do type them in fairly frequently, but they go into q= ... fq 
values are constructed internally, not from what a user types, and as 
far as I know, they involve fields that have never had stopwords removed.


I should add that this happens only after the index has had at least a 
few hundred queries, when deletes are committed.  The delete process 
runs every ten minutes, and checks for document presence before issuing 
the delete, which avoids unnecessary commits.


Just now, three of the six shards had documents deleted, and they took 
29.07, 27.57, and 28.66 seconds to warm.  The 1.4.1 counterpart to the 
29.07 second one only took 4.78 seconds, and it did twice as many 
autowarm queries.  I know it's not my single *:* sorted warming query 
(firstSearcher and newSearcher), because on solr startup with either 
version, warm time is 0.01 seconds.  I have useColdSearcher set to false.


Thanks,
Shawn



Re: Default schema - 'keywords' not multivalued

2011-06-29 Thread Tod

On 06/28/2011 12:04 PM, Chris Hostetter wrote:


: I'm streaming over the document content (presumably via tika) and its
: gathering the document's metadata which includes the keywords metadata field.
: Since I'm also passing that field from the DB to the REST call as a list (as
: you suggested) there is a collision because the keywords field is single
: valued.
:
: I can change this behavior using a copy field.  What I wanted to know is if
: there was a specific reason the default schema defined a field like keywords
: single valued so I could make sure I wasn't missing something before I changed
: things.

That file is just an example, you're absolutely free to change it to meet
your use case.

I'm not very familiar with Tika, but based on the comment in the example
config...



...i suspect it was intentional that that field is *not* multiValued (i
guess Tika always returns a single delimited value?) but if you have
multiple descrete values you want to send for your DB backed data there is
no downside to changing that.

: While I'm at it, I'd REALLY like to know how to use DIH to index the metadata
: from the database while simultaneously streaming over the document content and
: indexing it.  I've never quite figured it out yet but I have to believe it is
: a possibility.

There's a TikaEntityProcessor that can be used to have Tika crunch the
data that comes from an "entity" and extract out specific fields, and it
can be used in combination with a JdbcDataSource and a BinFileDataSource
so that a field in your db data specifies the name of a file on disk to
use as the TikaEntity -- but i've personally never tried it

Here's a simple example someone posted last year that they got working...

http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-td856965.html



-Hoss



Thanks Hoss, I'll just change the schema then.

The problem with TikaEntityProcessor is this installation is still 
running v1.4.1 so I'll need to upgrade.


Any short and sweet instructions for upgrading to 3.2?  I have a pretty 
straight forward Tomcat install, would just dropping in the new war suffice?



- Tod


Methods for preserving text entities?

2011-06-29 Thread Walter Closenfleight
We have some text entities in fields to index (and search) like so:

Solr is a really &myword; search engine!

I would like to preserve/protect &myword; and not resolve it in the indexing
or search results.

What sort of methods have people used? I realize the results are returned in
XML format, so preserving these text entities may be hard. Are people
replacing the "&" character or doing something else?

Thanks in advance!


RE: Sorting by vale of field

2011-06-29 Thread Michael Ryan
You could try adding a new int field (like "typeSort") that has the desired 
sort values. So when adding a document with type:car, also add typeSort:1; when 
adding type:van, also add typeSort:2; etc. Then you could do "sort=typeSort 
asc" to get them in your desired order.

I think this is also possible with custom function queries, but I've never done 
that.

-Michael


Re: Solr 3.2 filter cache warming taking longer than 1.4.1

2011-06-29 Thread Shawn Heisey

On 6/29/2011 9:17 AM, Yonik Seeley wrote:

Hmmm, you could comment out the query and filter caches on both 1.4.1 and 3.2
and then run some of the queries to see if you can figure out which are slower?

Do any of the queries have stopwords in fields where you now index
those?  If so, that could entirely account for the difference.


The query cache warms very quickly, it's the filter cache that's taking 
forever.  I'm not intimately familiar with what is being put in our 
filter queries by our webapp, but I'd be a little surprised if there are 
stopwords there.  A quick grep through solr logs (when I've turned it up 
to INFO) for the really common ones didn't reveal any.  People do type 
them in fairly frequently, but they go into q= ... fq values are 
constructed internally, not from what a user types, and as far as I 
know, they involve fields that have never had stopwords removed.


I will do some experimentation with your suggestions.

Thanks,
Shawn



Sorting by vale of field

2011-06-29 Thread Judioo
Hi

Say I have a field type in multiple documents which can be either
type:bike
type:boat
type:car
type:van


and I want to order a search to give me documents in the following order

type:car
type:van
type:boat
type:bike

Is there a way I can do this just using the &sort method?

Thanks


Re: Fuzzy Query Param

2011-06-29 Thread entdeveloper
I'm using Solr trunk. 

If it's levenstein/edit distance, that's great, that's what I want. It just
didn't seem to be officially documented anywhere so I wanted to find out for
sure. Thanks for confirming.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fuzzy-Query-Param-tp3120235p3122418.html
Sent from the Solr - User mailing list archive at Nabble.com.


CopyField into another CopyField?

2011-06-29 Thread entdeveloper
In solr, is it possible to 'chain' copyfields so that you can copy the value
of one into another?

Example:













Point being, every time I add a new field to the autocomplete, I want it to
automatically also be added to ac_spellcheck without having to do it twice.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/CopyField-into-another-CopyField-tp3122408p3122408.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr just 'hangs' under load test - ideas?

2011-06-29 Thread Yonik Seeley
Can you get a thread dump to see what is hanging?

-Yonik
http://www.lucidimagination.com

On Wed, Jun 29, 2011 at 11:45 AM, Bob Sandiford
 wrote:
> Hi, all.
>
> I'm hoping someone has some thoughts here.
>
> We're running Solr 3.1 (with the patch for SolrQueryParser.java to not do the 
> getLuceneVersion() calls, but use luceneMatchVersion directly).
>
> We're running in a Tomcat instance, 64 bit Java.  CATALINA_OPTS are: 
> -Xmx7168m -Xms7168m -XX:MaxPermSize=256M
>
> We're running 2 Solr cores, with the same schema.
>
> We use SolrJ to run our searches from a Java app running in JBoss.
>
> JBoss, Tomcat, and the Solr Index folders are all on the same server.
>
> In case it's relevant, we're using JMeter as a load test harness.
>
> We're running on Solaris, a 16 processor box with 48GB physical memory.
>
> I've run a successful load test at a 100 user load (at that rate there are 
> about 5-10 solr searches / second), and solr search responses were coming in 
> under 100ms.
>
> When I tried to ramp up, as far as I can tell, Solr is just hanging.  (We 
> have some logging statements around the SolrJ calls - just before, we log how 
> long our query construction takes, then we run the SolrJ query and log the 
> search times.  We're getting a number of the query construction logs, but no 
> corresponding search time logs).
>
> Symptoms:
> The Tomcat and JBoss processes show as well under 1% CPU, and they are still 
> the top processes.  CPU states show around 99% idle.   RES usage for the two 
> Java processes around 3GB each.  LWP under 120 for each.  STATE just shows as 
> sleep.  JBoss is still 'alive', as I can get into a piece of software that 
> talks to our JBoss app to get data.
>
> We set things up to use log4j logging for Solr - the log isn't showing any 
> errors or exceptions.
>
> We're not indexing - just searching.
>
> Back in January, we did load testing on a prototype, and had no problems 
> (though that was Solr 1.4 at the time).  It ramped up beautifully - bottle 
> necks were our apps, not Solr.  What I'm benchmarking now is a descendent of 
> that prototyping - a bit more complex on searches and more fields in the 
> schema, but same basic search logic as far as SolrJ usage.
>
> Any ideas?  What else to look at?  Ringing any bells?
>
> I can send more details if anyone wants specifics...
>
> Bob Sandiford | Lead Software Engineer | SirsiDynix
> P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
> www.sirsidynix.com
>
>


Field Value Highlighting

2011-06-29 Thread zarni aung
Hi,

I need help in figuring out the right configuration to perform highlighting
in Solr.  I can retrieve the matching documents plus the highlighted
matches.

I've done another tool called DTSearch where it would return the offset
positions of the field value to highlight.  I've tried a few different
configurations but it appears that Solr returns the actual matched documents
+ a section called highlighting with snippets (which can be configured to
have length of 'X').  I was wondering if there is a way to retrieve just the
actual documents with highlighted values or a way to retrieve the offset
position of the field values so that I can perform highlighting.

I am using SolrNet client to integrate to Solr.  I've also tweaked the
configs and used the web admin interface to test highlighting but not yet
successful.

Thank you in advance.

Z


[Announce] Solr 3.2 with RankingAlgorithm NRT capability, very high performance 1428 tps

2011-06-29 Thread Nagendra Nagarajayya

Hi!

I would like to announce Solr 3.2 with RankingAlgorithm has Near Real 
Time capability now. The NRT performance is very high, 1428 
documents/sec [ MBArtists 390k index]. The NRT functionality allows you 
to add documents without the IndexSearchers being closed or caches being 
cleared. A commit is not needed with the document update. Searches can 
run concurrently with document updates. No changes are needed except for 
enabling the NRT through solrconfig.xml.


A new visible attribute has been introduced that allows one to tune the 
visibility of a document added to the index. The default is 150ms. This 
can be set to 0 enabling documents to become visible for searches as 
soon as they are added. The visibility attribute is added as below:


true

With visible attribute at 200ms,  the performance is about  1428 TPS 
(document adds) on a dual core intel system with 2GB heap with searches 
in parallel.


I have a wiki page that describes NRT performance in detail and can be 
accessed from here:


http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_ver3.2

You can download Solr 3.2 with RankingAlgorithm (NRT version) from here:

http://solr-ra.tgels.com


I would like to invite you to give this version a try as the performance 
is very high, comparable to the default load.


Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.com
http://rankingalgorithm.tgels.com



Solr just 'hangs' under load test - ideas?

2011-06-29 Thread Bob Sandiford
Hi, all.

I'm hoping someone has some thoughts here.

We're running Solr 3.1 (with the patch for SolrQueryParser.java to not do the 
getLuceneVersion() calls, but use luceneMatchVersion directly).

We're running in a Tomcat instance, 64 bit Java.  CATALINA_OPTS are: -Xmx7168m 
-Xms7168m -XX:MaxPermSize=256M

We're running 2 Solr cores, with the same schema.

We use SolrJ to run our searches from a Java app running in JBoss.

JBoss, Tomcat, and the Solr Index folders are all on the same server.

In case it's relevant, we're using JMeter as a load test harness.

We're running on Solaris, a 16 processor box with 48GB physical memory.

I've run a successful load test at a 100 user load (at that rate there are 
about 5-10 solr searches / second), and solr search responses were coming in 
under 100ms.

When I tried to ramp up, as far as I can tell, Solr is just hanging.  (We have 
some logging statements around the SolrJ calls - just before, we log how long 
our query construction takes, then we run the SolrJ query and log the search 
times.  We're getting a number of the query construction logs, but no 
corresponding search time logs).

Symptoms:
The Tomcat and JBoss processes show as well under 1% CPU, and they are still 
the top processes.  CPU states show around 99% idle.   RES usage for the two 
Java processes around 3GB each.  LWP under 120 for each.  STATE just shows as 
sleep.  JBoss is still 'alive', as I can get into a piece of software that 
talks to our JBoss app to get data.

We set things up to use log4j logging for Solr - the log isn't showing any 
errors or exceptions.

We're not indexing - just searching.

Back in January, we did load testing on a prototype, and had no problems 
(though that was Solr 1.4 at the time).  It ramped up beautifully - bottle 
necks were our apps, not Solr.  What I'm benchmarking now is a descendent of 
that prototyping - a bit more complex on searches and more fields in the 
schema, but same basic search logic as far as SolrJ usage.

Any ideas?  What else to look at?  Ringing any bells?

I can send more details if anyone wants specifics...

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com



Re: Solr 3.2 filter cache warming taking longer than 1.4.1

2011-06-29 Thread Yonik Seeley
Hmmm, you could comment out the query and filter caches on both 1.4.1 and 3.2
and then run some of the queries to see if you can figure out which are slower?

Do any of the queries have stopwords in fields where you now index
those?  If so, that could entirely account for the difference.

-Yonik
http://www.lucidimagination.com

On Wed, Jun 29, 2011 at 10:59 AM, Shawn Heisey  wrote:
> I have noticed a significant difference in filter cache warming times on my
> shards between 3.2 and 1.4.1.  What can I do to troubleshoot this?  Please
> let me know what additional information you might need to look deeper.  I
> know this isn't enough.
>
> It takes about 3 seconds to do an autowarm count of 8 on 1.4.1 and 10-15
> seconds to do an autowarm count of 4 on 3.2.  The only explicit warming
> query is *:*, sorted descending by post_date, a tlong field containing a
> UNIX timestamp, precisionStep 16.  The indexes are not entirely identical,
> but the new one did evolve from the old one.  Perhaps one of the experts
> might spot something that makes for much slower filter cache warming, or
> some way to look deeper if this seems wrong?  Is there a way to see the
> search URL bits that populated the cache?
>
> Index differences: The new index has four extra small fields, is no longer
> removing stopwords, and has omitTermFreqAndPositions enabled on a
> significant number of fields.  Most of the fields are tokenized text, and
> now more than half of those don't have tf and tp enabled.  Naturally the
> largest text field where most of the matches happen still does have them
> enabled.
>
> To increase reindex speed, the new index has a termIndexInterval of 1024,
> the old one is at the default of 128.  In terms of raw size, the new index
> is less than one percent larger than the old one.  The old shards average
> out to 17.22GB, the new ones to 17.41GB.  Here's an overview of the
> differences of each type of file (comparing the huge optimized segment only,
> not the handful of tiny ones since) on one the index with the largest size
> gap, old value listed first:
>
> fdt: 6317180127/6055634923 (4.1% decrease)
> fdx: 76447972/75647412 (1% decrease)
> fnm: 382, 338 (44 bytes!  woohoo!)
> frq: 2828400926/2873249038 (1.5% increase)
> nrm: 28367782/38223988 (35% increase)
> prx: 2449154203/2684249069 (9.5% increase)
> tii: 1686298/13329832 (790% increase)  
> tis: 923045932/999294109 (8% increase)
> tvd: 18910972/19111840 (1% increase)
> tvf: 5867309063/5640332282 (3.9% decrease)
> tvx: 151294820/152895940 (1% increase)
>
> The tii and nrm files are the only ones that saw a significant size
> increase, but the tii file is MUCH bigger.
>
> Thanks,
> Shawn
>
>


Solr 3.2 filter cache warming taking longer than 1.4.1

2011-06-29 Thread Shawn Heisey
I have noticed a significant difference in filter cache warming times on 
my shards between 3.2 and 1.4.1.  What can I do to troubleshoot this?  
Please let me know what additional information you might need to look 
deeper.  I know this isn't enough.


It takes about 3 seconds to do an autowarm count of 8 on 1.4.1 and 10-15 
seconds to do an autowarm count of 4 on 3.2.  The only explicit warming 
query is *:*, sorted descending by post_date, a tlong field containing a 
UNIX timestamp, precisionStep 16.  The indexes are not entirely 
identical, but the new one did evolve from the old one.  Perhaps one of 
the experts might spot something that makes for much slower filter cache 
warming, or some way to look deeper if this seems wrong?  Is there a way 
to see the search URL bits that populated the cache?


Index differences: The new index has four extra small fields, is no 
longer removing stopwords, and has omitTermFreqAndPositions enabled on a 
significant number of fields.  Most of the fields are tokenized text, 
and now more than half of those don't have tf and tp enabled.  Naturally 
the largest text field where most of the matches happen still does have 
them enabled.


To increase reindex speed, the new index has a termIndexInterval of 
1024, the old one is at the default of 128.  In terms of raw size, the 
new index is less than one percent larger than the old one.  The old 
shards average out to 17.22GB, the new ones to 17.41GB.  Here's an 
overview of the differences of each type of file (comparing the huge 
optimized segment only, not the handful of tiny ones since) on one the 
index with the largest size gap, old value listed first:


fdt: 6317180127/6055634923 (4.1% decrease)
fdx: 76447972/75647412 (1% decrease)
fnm: 382, 338 (44 bytes!  woohoo!)
frq: 2828400926/2873249038 (1.5% increase)
nrm: 28367782/38223988 (35% increase)
prx: 2449154203/2684249069 (9.5% increase)
tii: 1686298/13329832 (790% increase)  
tis: 923045932/999294109 (8% increase)
tvd: 18910972/19111840 (1% increase)
tvf: 5867309063/5640332282 (3.9% decrease)
tvx: 151294820/152895940 (1% increase)

The tii and nrm files are the only ones that saw a significant size 
increase, but the tii file is MUCH bigger.


Thanks,
Shawn



Re: Regex replacement not working!

2011-06-29 Thread Ahmet Arslan
> too bad it is still in todo, that's
> why i was asking some for some tips on
> writing, compiling, registration, calling...

Here is general information about how to customize solr via plugins.
http://wiki.apache.org/solr/SolrPlugins

Here is the registration and code example.
http://wiki.apache.org/solr/UpdateRequestProcessor


Re: Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo
too bad it is still in todo, that's why i was asking some for some tips on
writing, compiling, registration, calling...


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121856.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Regex replacement not working!

2011-06-29 Thread Adam Estrada
I have had the same problems with regex and I went with the regular pattern
replace filter rather than the charfilter. When I added it to the very end
of the chain, only then would it work...I am on Solr 3.2. I have also
noticed that the HTML filter factory is not working either. When I dump the
field that it's supposed to be working on, all the hyperlinks and everything
that you would expect to be stripped are still present.

Adam

On Wed, Jun 29, 2011 at 10:04 AM, samuele.mattiuzzo wrote:

> ok, last question on the UpdateProcessor: can you please give me the steps
> to
> implement my own?
> i mean, i can push my custom processor in solr's code, and then what?
> i don't understand how i have to change the solrconf.xml and how can i bind
> that to the updater i just wrotea
> and also i don't understand how i do have to change the schema.xml
>
> i'm sorry for this question, but i started working on solr 5 days ago and
> for some things i really need a lot of documentation, and this isn't fully
> covered anywhere
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121743.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Regex replacement not working!

2011-06-29 Thread Ahmet Arslan
> ok, last question on the
> UpdateProcessor: can you please give me the steps to
> implement my own?
> i mean, i can push my custom processor in solr's code, and
> then what?
> i don't understand how i have to change the solrconf.xml
> and how can i bind
> that to the updater i just wrotea
> and also i don't understand how i do have to change the
> schema.xml
> 
> i'm sorry for this question, but i started working on solr
> 5 days ago and
> for some things i really need a lot of documentation, and
> this isn't fully
> covered anywhere

"Implementing a conditional copyField" example is a good place start. You can 
use it as a template. 

You don't need to modify the solr source code for this. You can write your 
class, compile it, put the resulting jar into solrHome/lib directory. It is 
explained here, how to register your new update processor in solrconfig.xml

http://wiki.apache.org/solr/SolrPlugins#UpdateRequestProcessorFactory  


Re: Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo
ok, last question on the UpdateProcessor: can you please give me the steps to
implement my own?
i mean, i can push my custom processor in solr's code, and then what?
i don't understand how i have to change the solrconf.xml and how can i bind
that to the updater i just wrotea
and also i don't understand how i do have to change the schema.xml

i'm sorry for this question, but i started working on solr 5 days ago and
for some things i really need a lot of documentation, and this isn't fully
covered anywhere

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121743.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: what is solr clustering component

2011-06-29 Thread Ahmet Arslan

> I just went through solr wiki page
> for clustering. But i am not getting what
> is the benefit of using clustering. Can anyone tell me what
> is actually
> clusering and what its use in indexing and searching.
> does it effect search results??
> Please reply

It is for search result clustering. Try the demo with the query word jaguar.  
http://search.carrot2.org/stable/search

It generates clusters and labels. (on the left)



Re: Regex replacement not working!

2011-06-29 Thread Ahmet Arslan
> my goal is/was storing the value into
> the field, and i get i have to create
> my Update handler.
> 
> i was trying to use query with salary_min:[100 TO 200] and
> it's actually
> working... since i just need it to search, i'll stay with
> this solution
> 
> is the [100 TO 200] a performance killer? i remember
> reading something
> around, but cannot find it again...

Please be aware that range query is working on strings. It will return unwanted 
results. String sorting and integer sorting is different.

If you are after range queries you need to defied price_min and price_max 
fields as trie-based types. tint, tdouble etc. And populate them with the 
update processor or at client side.


Re: Solr - search queries not returning results

2011-06-29 Thread Walter Closenfleight
Thanks to both of you, I understand now and am now getting the expected
results.

Cheers!

On Wed, Jun 29, 2011 at 2:21 AM, Ahmet Arslan  wrote:

>
> > I believe I am missing something very elementary. The
> > following query
> > returns zero hits:
> >
> > http://localhost:8983/solr/core0/select/?q=testabc
>
> With this URL, you are hitting the RequestHandler defined as
>  in your
> core0/conf/solrconfig.xml.
>
> > However, using solritas, it finds many results:
> >
> > http://localhost:8983/solr/core0/itas?q=testabc
>
> With this one, you are hitting the one registered as  name="/itas">
>
> > Do you have any idea what the issue may be?
>
> Probably they have different default parameters configured.
>
> For example (e)dismax versus lucene query parser. lucene query parser
> searches testabc in your default field. dismax searches it in all of the
> fields defined in qf parameter.
>
> You can see the full parameter list by appending &echoParams=all to your
> search URL.
>


Re: Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo
my goal is/was storing the value into the field, and i get i have to create
my Update handler.

i was trying to use query with salary_min:[100 TO 200] and it's actually
working... since i just need it to search, i'll stay with this solution

is the [100 TO 200] a performance killer? i remember reading something
around, but cannot find it again...

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121625.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Regex replacement not working!

2011-06-29 Thread Michael Kuhlmann
Am 29.06.2011 12:30, schrieb samuele.mattiuzzo:
> 
>   
...

> this is the "final" version of my schema part, but what i get is this:
> 
> 
> 
> 1.0
> Negotiable
> Negotiable
> Negotiable
> 
...


The mistake is that you assume that the filter applied to the result.
This is not true. Index filters only affect the index (as the name
says), not the contents.

Therefore, if you have copyFields that are stored, the'll always return
the same value as the original field.

Try inspecting your index data with luke or the admin console. Then
you'll see whether your regex applies.

Greetings,
Kuli


Re: Regex replacement not working!

2011-06-29 Thread Juan Grande
Hi Samuele,

It's not clear for me if your goal is to search on that field (for example,
"salary_min:[100 TO 200]") or if you want to show the transformed field to
the user (so you want the result of the regex replacement to be included in
the search results).

If your goal is to show the results to the user, then (as Ahmet said in a
previous mail) it won't work, because the content of the documents is stored
verbatim. The analysis only affects the way that documents are searched.

If your goal is to search, could you please show us the query that you're
using to test the use case?

Thanks!

*Juan*



On Wed, Jun 29, 2011 at 10:02 AM, samuele.mattiuzzo wrote:

> ok, but i'm not applying the filtering on the copyfields.
> this is how my schema looks:
>
>
>
> 
>  stored="true"
> />
>  stored="true"
> />
>
>
> 
> 
>
> and the two datatypes defined before. that's why i tought i could first use
> "copyField" to copy the value then index them with my two datatypes
> filtering...
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121497.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: How to disable Phonetic search

2011-06-29 Thread Mohammad Shariq
I was using SnowballPorterFilterFactory for stemming, and that stammer was
stemming the words.
I added the keyword "ansys" to   file "protwords.txt".
Now the stemming is not happening for "ansys" and Its OK now.

On 29 June 2011 17:12, Ahmet Arslan  wrote:

> > I am using solr1.4
> > When I search for keyword "ansys" I get lot of posts.
> > but when I search for "ansys NOT ansi" I get nothing.
> > I guess its because of Phonetic search, "ansys" is
> > converted into "ansi" (
> > that is NOT keyword) and nothing returns.
> >
> > How to handle this kind of problem.
>
> Find and remove occurrences of "solr.PhoneticFilterFactory" from your
> schema.xml file.
>



-- 
Thanks and Regards
Mohammad Shariq


Re: filters effect on search results

2011-06-29 Thread Romi
admin/analysis.jsp page shows RemoveDuplicatesTokenFilterFactory
,ReversedWildcardFilterFactory ,.EnglishPorterFilterFactory 

-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/filters-effect-on-search-results-tp3120968p3121506.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo
ok, but i'm not applying the filtering on the copyfields.
this is how my schema looks:






 




and the two datatypes defined before. that's why i tought i could first use
"copyField" to copy the value then index them with my two datatypes
filtering...

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121497.html
Sent from the Solr - User mailing list archive at Nabble.com.


what is solr clustering component

2011-06-29 Thread Romi
I just went through solr wiki page for clustering. But i am not getting what
is the benefit of using clustering. Can anyone tell me what is actually
clusering and what its use in indexing and searching.
does it effect search results??
Please reply


-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-is-solr-clustering-component-tp3121484p3121484.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Regex replacement not working!

2011-06-29 Thread Ahmet Arslan
> i have the string "You may earn 25k
> dollars per week" stored in the field
> "salary"
> 
> i'm using 2 copyfields "salary_min" and "salary_max" with
> source in "salary"
> with those 2 datatypes 
> 
> salary is "text"
> salary_min is "salary_min_text"
> salary_max is "salary_max_text"
> 
> so, i was expecting this:
> 
> solr updates its index
> solr copies the value from salary to salary_min and applies
> the value with
> the regex
> solr copies the value from salary to salary_max and applies
> the value with
> the regex
> 
> 
> but it's not working, it copies the value from one field to
> another, but the
> filter isn't applied, even if it's working as you could
> see

Okey, that makes sense. copyField just copies the content. It has nothing to do 
with analyzers. Two solutions comes to my mind.

1-) If you are using data import handler, I think (i am not good with regex), 
you can use regex transformer to populate these two fields.

http://wiki.apache.org/solr/DataImportHandler#RegexTransformer

2-) If not, you can populate these two field in a custom 
UpdateRequestProcessor. There is an example to modify and to start here :

http://wiki.apache.org/solr/UpdateRequestProcessor


Re: Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo
i have the string "You may earn 25k dollars per week" stored in the field
"salary"

i'm using 2 copyfields "salary_min" and "salary_max" with source in "salary"
with those 2 datatypes 

salary is "text"
salary_min is "salary_min_text"
salary_max is "salary_max_text"

so, i was expecting this:

solr updates its index
solr copies the value from salary to salary_min and applies the value with
the regex
solr copies the value from salary to salary_max and applies the value with
the regex


but it's not working, it copies the value from one field to another, but the
filter isn't applied, even if it's working as you could see


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121386.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Regex replacement not working!

2011-06-29 Thread Ahmet Arslan
> Index Analyzer
> org.apache.solr.analysis.KeywordTokenizerFactory
> {luceneMatchVersion=LUCENE_31}
> position    1
> term text    £22000 - £25000 per annum +
> benefits
> startOffset    0
> endOffset    36
> 
> 
> org.apache.solr.analysis.PatternReplaceFilterFactory
> {replacement=$2,
> pattern=[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.*,
> luceneMatchVersion=LUCENE_31}
> position    1
> term text    25000
> startOffset    0
> endOffset    36
> 
> 
> this is my output for the field salary_max, it seems to be
> working from the
> admin jsp interface

That's good to know. If you explain your final goal in detail, users can give 
better pointers.


Re: Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo
Index Analyzer
org.apache.solr.analysis.KeywordTokenizerFactory
{luceneMatchVersion=LUCENE_31}
position1
term text   £22000 - £25000 per annum + benefits
startOffset 0
endOffset   36


org.apache.solr.analysis.PatternReplaceFilterFactory {replacement=$2,
pattern=[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.*,
luceneMatchVersion=LUCENE_31}
position1
term text   25000
startOffset 0
endOffset   36


this is my output for the field salary_max, it seems to be working from the
admin jsp interface

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121353.html
Sent from the Solr - User mailing list archive at Nabble.com.


Encoding problem while indexing

2011-06-29 Thread Engy Morsy
I am working on indexing arabic documents containg arabic diacritics and 
dotless characters (old arabic characters), I am using Apache Tomcat server, 
and I am using my modified version of the aramorph analyzer as the arabic 
analyzer. I managed on the development enviorment to normalize the arabic 
diacritics and dotless characters (same concept as in the 
solr.ArabicNormalizationFilterFactory). and i can verfiy that the analyzer is 
working fine, and i get the correct stem for arabic words. the input text file 
for testing has a utf-8 encoding.

When i build the aramorph jar file and place it under solr lib, the diacritics 
and the dotless characters splits the word. I made sure that the server.xml 
contains the URI-Encoding="utf-8".

I also made sure that the text being send to solr using solj is utf-8 encoding
example : solr.addBean(new Doc("4",new String("حِباًَ".getBytes("UTF8";

but nothing is working.

I tried to use the analyze link on solr admin for both indexing and querying 
and both shows that the arabic word is splited if a diacritics or dotless 
character is found.

Do you have any idea what might be the problem


schema snippet:






I also added the following parameter to the JVM: -Dfile.encoding=UTF-8

Thanks,
engy


Re: filters effect on search results

2011-06-29 Thread François Schiettecatte
Indeed, I find the Porter stemmer to be too 'aggressive' for my taste, I prefer 
the EnglishMinimalStemFilterFactory, with the caveat that it depends on your 
data set.

Cheers

François

On Jun 29, 2011, at 6:21 AM, Ahmet Arslan wrote:

>> Hi, when i query for "elegant" in
>> solr i get results for "elegance" too. 
>> 
>> *I used these filters for index analyze*
>> WhitespaceTokenizerFactory 
>> StopFilterFactory 
>> WordDelimiterFilterFactory
>> LowerCaseFilterFactory 
>> SynonymFilterFactory
>> EnglishPorterFilterFactory
>> RemoveDuplicatesTokenFilterFactory
>> ReversedWildcardFilterFactory 
>> 
>> *
>> and for query analyze:*
>> 
>> .WhitespaceTokenizerFactory
>> SynonymFilterFactory
>> StopFilterFactory
>> WordDelimiterFilterFactory 
>> LowerCaseFilterFactory 
>> EnglishPorterFilterFactory 
>> RemoveDuplicatesTokenFilterFactory 
>> 
>> I want to know which filter affecting my search result.
>> 
> 
> It is EnglishPorterFilterFactory, you can verify it from admin/analysis.jsp 
> page.



Re: conditionally update document on unique id

2011-06-29 Thread Shalin Shekhar Mangar
On Wed, Jun 29, 2011 at 2:01 AM, eks dev  wrote:

> Quick question,
> Is there a way with solr to conditionally update document on unique
> id? Meaning, default, add behavior if id is not already in index and
> *not to touch index" if already there.
>
> Deletes are not important (no sync issues).
>
> I am asking because I noticed with deduplication turned on,
> index-files get modified even if I update the same documents again
> (same signatures).
> I am facing very high dupes rate (40-50%), and setup is going to be
> master-slave with high commit rate (requirement is to reduce
> propagation latency for updates). Having unnecessary index
> modifications is going to waste  "effort" to ship the same information
> again and again.
>
> if there is no standard way, what would be the fastest way to check if
> Term exists in index from UpdateRequestProcessor?
>
>
I'd suggest that you use the searcher's getDocSet with a TermQuery.

Use the SolrQueryRequest#getSearcher so you don't need to worry about ref
counting.

e.g. req.getSearcher().getDocSet(new TermQuery(new Term(signatureField,
sigString))).size();



> I intend to extend SignatureUpdateProcessor to prevent a document from
> propagating down the chain if this happens?
> Would that be a way to deal with it? I repeat, there are no deletes to
> make headaches with synchronization
>
>
Yes, that should be fine.

-- 
Regards,
Shalin Shekhar Mangar.


Re: How to disable Phonetic search

2011-06-29 Thread Ahmet Arslan
> I am using solr1.4
> When I search for keyword "ansys" I get lot of posts.
> but when I search for "ansys NOT ansi" I get nothing.
> I guess its because of Phonetic search, "ansys" is
> converted into "ansi" (
> that is NOT keyword) and nothing returns.
> 
> How to handle this kind of problem.

Find and remove occurrences of "solr.PhoneticFilterFactory" from your 
schema.xml file.


Re: Regex replacement not working!

2011-06-29 Thread Ahmet Arslan
>      name="salary_min_text" class="solr.TextField" >
>       
>          class="solr.PatternReplaceCharFilterFactory"
> pattern="[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.*"
> replacement="$1"/>
>          class="solr.KeywordTokenizerFactory"/>
>          class="solr.LowerCaseFilterFactory" />
>          class="solr.TrimFilterFactory" />
>       
>       
>          class="solr.PatternReplaceCharFilterFactory"
> pattern="[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.*"
> replacement="$1"/>
>          class="solr.KeywordTokenizerFactory"/>
>          class="solr.LowerCaseFilterFactory" />
>          class="solr.TrimFilterFactory" />
>       
>     
> 
>      class="solr.TextField" >
>       
>          class="solr.PatternReplaceCharFilterFactory"
> pattern="[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.*"
> replacement="$2"/>
>          class="solr.KeywordTokenizerFactory"/>
>          class="solr.LowerCaseFilterFactory" />
>          class="solr.TrimFilterFactory" />
>       
>       
>          class="solr.PatternReplaceCharFilterFactory"
> pattern="[^\d]?([0-9]+[k,.]?[0-9]*)+.*?([0-9]+[k,.]?[0-9]*)+.*"
> replacement="$2"/>
>          class="solr.KeywordTokenizerFactory"/>
>          class="solr.LowerCaseFilterFactory" />
>          class="solr.TrimFilterFactory" />
>       
>     
> 
> this is the "final" version of my schema part, but what i
> get is this:
> 
> 
> 
> 1.0
> Negotiable
> Negotiable
> Negotiable
> 
> 
> 1.0
> £7 to £8 per hour
> £7 to £8 per
> hour
> £7 to £8 per
> hour
> 
> 
> 1.0
> £125 to £150 per
> day
> £125 to £150 per
> day
> £125 to £150 per
> day
> 
> 
> which is not what i'm expecting... the regular expression
> works in
> http://www.fileformat.info/tool/regex.htm
> without any problem

I am not good with regular expressions, but response always contains 
untouched/un-analyzed version of fields. You can visually test your 
fieldType/regex on admin/analysis.jsp page. It show indexed terms step by step.


How to disable Phonetic search

2011-06-29 Thread Mohammad Shariq
I am using solr1.4
When I search for keyword "ansys" I get lot of posts.
but when I search for "ansys NOT ansi" I get nothing.
I guess its because of Phonetic search, "ansys" is converted into "ansi" (
that is NOT keyword) and nothing returns.

How to handle this kind of problem.

-- 
Thanks and Regards
Mohammad Shariq


Re: Fuzzy Query Param

2011-06-29 Thread Michael McCandless
Which version of Solr (Lucene) are you using?

Recent versions of Lucene now accept ~N > 1 to be edit distance.  Ie
foobar~2 matches any term that's <= 2 edit distance away from foobar.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jun 28, 2011 at 11:00 PM, entdeveloper
 wrote:
> According to the docs on lucene query syntax:
>
> "Starting with Lucene 1.9 an additional (optional) parameter can specify the
> required similarity. The value is between 0 and 1, with a value closer to 1
> only terms with a higher similarity will be matched."
>
> I was messing around with this and started doing queries with values greater
> than 1 and it seemed to be doing something. However I haven't been able to
> find any documentation on this.
>
> What happens when specifying a fuzzy query with a value > 1?
>
> "tiger"~2
> "animal"~3
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Fuzzy-Query-Param-tp3120235p3120235.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo

  




  
  




  



  




  
  




  


this is the "final" version of my schema part, but what i get is this:



1.0
Negotiable
Negotiable
Negotiable


1.0
£7 to £8 per hour
£7 to £8 per hour
£7 to £8 per hour


1.0
£125 to £150 per day
£125 to £150 per day
£125 to £150 per day


which is not what i'm expecting... the regular expression works in
http://www.fileformat.info/tool/regex.htm without any problem

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3121055.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: filters effect on search results

2011-06-29 Thread Ahmet Arslan
> Hi, when i query for "elegant" in
> solr i get results for "elegance" too. 
> 
> *I used these filters for index analyze*
> WhitespaceTokenizerFactory 
> StopFilterFactory 
> WordDelimiterFilterFactory
> LowerCaseFilterFactory 
> SynonymFilterFactory
> EnglishPorterFilterFactory
> RemoveDuplicatesTokenFilterFactory
> ReversedWildcardFilterFactory 
> 
> *
> and for query analyze:*
> 
> .WhitespaceTokenizerFactory
> SynonymFilterFactory
> StopFilterFactory
> WordDelimiterFilterFactory 
> LowerCaseFilterFactory 
> EnglishPorterFilterFactory 
> RemoveDuplicatesTokenFilterFactory 
> 
> I want to know which filter affecting my search result.
> 

It is EnglishPorterFilterFactory, you can verify it from admin/analysis.jsp 
page.


Re: Regex replacement not working!

2011-06-29 Thread Ahmet Arslan

> Hi, i have this bunch of lines in my
> schema.xml that should do a replacement
> but it doesn't work!
> 
>      class="solr.TextField"
> omitNorms="true">
>       
>            class="solr.StandardTokenizerFactory"/>
>          class="solr.PatternReplaceCharFilterFactory"
> pattern="([0-9]+k?[.,]?[0-9]*).*?([0-9]+k?[.,]?[0-9]*)"
> replacement="$2"/>
>       
>     
> 



filters effect on search results

2011-06-29 Thread Romi
Hi, when i query for "elegant" in solr i get results for "elegance" too. 

*I used these filters for index analyze*
WhitespaceTokenizerFactory 
StopFilterFactory 
WordDelimiterFilterFactory
LowerCaseFilterFactory 
SynonymFilterFactory
EnglishPorterFilterFactory
RemoveDuplicatesTokenFilterFactory
ReversedWildcardFilterFactory 

*
and for query analyze:*

.WhitespaceTokenizerFactory
SynonymFilterFactory
StopFilterFactory
WordDelimiterFilterFactory 
LowerCaseFilterFactory 
EnglishPorterFilterFactory 
RemoveDuplicatesTokenFilterFactory 

I want to know which filter affecting my search result.

-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/filters-effect-on-search-results-tp3120968p3120968.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using RAMDirectoryFactory in Master/Slave setup

2011-06-29 Thread eks dev
sure,  SSD or RAM disks fix these problems with IO.

Anyhow, I can really see no alternative for some in memory index for
slaves, especially for low latency master-slave apps (high commit rate
is a problem).

having possibility to run slaves  in memory that are slurping updates
from Master  seams to me like a preffered method (you need no
twiddling with OS, just CPU and RAM is what you need for your slaves,
run slave and point it to master ). I assume that update propagation
times could be better by having
some sexy ReadOnlySlaveRAMDirectorySlurpingUpdatesFromTheMaster that
does reload() directly from the Master (maybe even uncommitted,
somehow NRT-likish).

Point being, lower latency update than current 1-5 Minutes (wiki
recommended values) is not going to be possible with current
master-slave solution, due to the nature of it (commit to disk on
master, copy delta to slave disk, reload...) This is a lot of ping
pong... ES and solandra are by nature better suited if you need update
propagation in  seconds range.

It is just thinking aloud, and slightly off-topic... solr/lucene as it
is today, rocks  anyhow.

On Wed, Jun 29, 2011 at 10:55 AM, Toke Eskildsen  
wrote:
> On Wed, 2011-06-29 at 09:35 +0200, eks dev wrote:
>> In MMAP, you need to have really smart warm up (MMAP) to beat IO
>> quirks, for RAMDir  you need to tune gc(), choose your poison :)
>
> Other alternatives are operating system RAM disks (avoids the GC
> problem) and using SSDs (nearly the same performance as RAM).
>
>


Re: Using RAMDirectoryFactory in Master/Slave setup

2011-06-29 Thread eks dev
sure,  SSD or RAM disks fix these problems with IO.

Anyhow, I can really see no alternative for some in memory index for
slaves, especially for low latency master-slave apps (high commit rate
is a problem).

having possibility to run slaves  in memory that are slurping updates
from Master  seams to me like a preffered method (you need no
twiddling with OS, just CPU and RAM is what you need for your slaves,
run slave and point it to master ). I assume that update propagation
times could be better by having
some sexy ReadOnlySlaveRAMDirectorySlurpingUpdatesFromTheMaster that
does reload() directly from the Master (maybe even uncommitted,
somehow NRT-likish).

Point being, lower latency update than current 1-5 Minutes (wiki
recommended values) is not going to be possible with current
master-slave solution, due to the nature of it (commit to disk on
master, copy delta to slave disk, reload...) This is a lot of ping
pong... ES and solandra are by nature better suited if you need update
propagation in  seconds range.

It is just thinking aloud, and slightly off-topic... solr/lucene as it
is today, rocks  anyhow.



On Wed, Jun 29, 2011 at 10:55 AM, Toke Eskildsen  
wrote:
> On Wed, 2011-06-29 at 09:35 +0200, eks dev wrote:
>> In MMAP, you need to have really smart warm up (MMAP) to beat IO
>> quirks, for RAMDir  you need to tune gc(), choose your poison :)
>
> Other alternatives are operating system RAM disks (avoids the GC
> problem) and using SSDs (nearly the same performance as RAM).
>
>


Re: Using RAMDirectoryFactory in Master/Slave setup

2011-06-29 Thread Toke Eskildsen
On Wed, 2011-06-29 at 09:35 +0200, eks dev wrote:
> In MMAP, you need to have really smart warm up (MMAP) to beat IO
> quirks, for RAMDir  you need to tune gc(), choose your poison :)

Other alternatives are operating system RAM disks (avoids the GC
problem) and using SSDs (nearly the same performance as RAM).



Regex replacement not working!

2011-06-29 Thread samuele.mattiuzzo
Hi, i have this bunch of lines in my schema.xml that should do a replacement
but it doesn't work!


  
  

  



I need it to extract only the numbers from some other string. The strings
can be anything: only letters (so it should replace it with an empty
string), letters + numbers. The numbers can be in one of those formats

17000 --> ok
17,000 --> should be replaced with 17000
17.000 --> should be replaced with 17000
17k --> should be replaced with 17000

how can i accomplish this? 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-replacement-not-working-tp3120748p3120748.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using RAMDirectoryFactory in Master/Slave setup

2011-06-29 Thread eks dev
...Using RAMDirectory really does not help performance...

I kind of agree,  but in my experience with lucene,  there are cases
where RAMDirectory helps a lot, with all its drawbacks (huge heap and
gc() tuning).

We had very good experience with MMAP on average, but moving to
RAMDirectory with properly tuned gc() reduced 95% of "slow performers"
in upper range of response times (e.g. slowest 5% queries). On average
it made practically no difference.
Maybe is this mitigated by better warm up on solr than our hand-tuned
warmup, maybe not, I do not really know.

In MMAP, you need to have really smart warm up (MMAP) to beat IO
quirks, for RAMDir  you need to tune gc(), choose your poison :)

I argue, in some cases it is very hard to tame IO quirks (e.g. this is
shared resource, you never know what going really on in shared app
setup!). Then, see only what is happening on major merge and all these
efforts with native linux directory to somehow get a grip on that...
If you have spare ram, you are probably safer with RAMDirectory.

>From the theoretical perspective, in ideal case, RAM ought to be
faster than disk (and more expensive). If this is not the case, we did
something wrong.  I have a feeling that this work Mike is doing  with
in memory Codecs (fst TermDictionary, pulsing codec & co) in Lucene 4,
native directory features ... will make RAMDirectory really obsolete
for production setup.


Cheers,
eks




On Wed, Jun 29, 2011 at 6:00 AM, Lance Norskog  wrote:
> Using RAMDirectory really does not help performance. Java garbage
> collection has to work around all of the memory taken by the segments.
> It works out that Solr works better (for most indexes) without using
> the RAMDirectory.
>
>
>
> On Sun, Jun 26, 2011 at 2:07 PM, nipunb  wrote:
>> PS: Sorry if this is a repost, I was unable to see my message in the mailing
>> list - this may have been due to my outgoing email different from the one I
>> used to subscribe to the list with.
>>
>> Overview – Trying to evaluate if keeping the index in memory using
>> RAMDirectoryFactory can help in query performance.I am trying to perform the
>> indexing on the master using solr.StandardDirectoryFactory and make those
>> indexes accesible to the slave using solr.RAMDirectoryFactory
>>
>> Details:
>> We have set-up Solr in a master/slave enviornment. The index is built on the
>> master and then replicated to slaves which are used to serve the query.
>> The replication is done using the in-built Java replication in Solr.
>> On the master, in the  of solrconfig.xml we have
>> >        class="solr.StandardDirectoryFactory"/>
>>
>> On the slave, I tried to use the following in the 
>>
>> >         class="solr.RAMDirectoryFactory"/>
>>
>> My slave shows no data for any queries. In solrconfig.xml it is mentioned
>> that replication doesn’t work when using RAMDirectoryFactory, however this (
>> https://issues.apache.org/jira/browse/SOLR-1379) mentions that you can use
>> it to have the index on disk and then load into memory.
>>
>> To test the sanity of my set-up, I changed solrconfig.xml in the slave to
>> and replicated:
>> >        class="solr.StandardDirectoryFactory"/>
>> I was able to see the results.
>>
>> Shouldn’t RAMDirectoryFactory be used for reading index from disk into
>> memory?
>>
>> Any help/pointers in the right direction would be appreciated.
>>
>> Thanks!
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Using-RAMDirectoryFactory-in-Master-Slave-setup-tp3111792p3111792.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


Re: Solr - search queries not returning results

2011-06-29 Thread Ahmet Arslan

> I believe I am missing something very elementary. The
> following query
> returns zero hits:
> 
> http://localhost:8983/solr/core0/select/?q=testabc

With this URL, you are hitting the RequestHandler defined as  in your core0/conf/solrconfig.xml.

> However, using solritas, it finds many results:
> 
> http://localhost:8983/solr/core0/itas?q=testabc

With this one, you are hitting the one registered as  

> Do you have any idea what the issue may be?

Probably they have different default parameters configured. 

For example (e)dismax versus lucene query parser. lucene query parser searches 
testabc in your default field. dismax searches it in all of the fields defined 
in qf parameter.

You can see the full parameter list by appending &echoParams=all to your search 
URL.


Re: How to Create a weighted function (dismax or otherwise)

2011-06-29 Thread Ahmet Arslan
> I am trying to create a feature that
> allows search results to be displayed by
> this formula sum(weight1*text relevance score, weight2 *
> price). weight1 and
> weight2 are numeric values that can be changed to influence
> the search
> results.
> 
> I am sending the following query params to the Solr
> instance for searching.
> 
> q=red
> defType=dismax
> qf=10^name+2^price

Correct syntax of qf and pf is fieldName^boostFactor, i.e, 
qf=name^10 price^2

However your query is a word, so it won't match in price field. I assume price 
field is numeric. 

You can simulate sum(weight1*text relevance score, weight2 * price).
with bf parameter and FunctionQueries. 

q=red&defTypeedismax&qf=name&bf=product(price,w1/w2)

http://wiki.apache.org/solr/FunctionQuery
http://wiki.apache.org/solr/DisMaxQParserPlugin#bf_.28Boost_Functions.29