RE: SolrJ + Post

2011-10-14 Thread Rohit
Ya the problem is with the length of the URL, with a lot of filters coming
in the length goes beyond the length allowed. But, I guess extending url
length would be a better approcach.

Regards,
Rohit

-Original Message-
From: Sujit Pal [mailto:sujit@comcast.net] 
Sent: 14 October 2011 16:54
To: solr-user@lucene.apache.org
Subject: Re: SolrJ + Post

Not the OP, but I put it in on /one/ of my solr custom handlers that
acts as a proxy to itself (ie the server its part of). It basically
rewrites the incoming query (usually short 50-250 chars at most) to a
set of very long queries and passes them in parallel to the server,
gathers up the results and returns a combo response. 

The logging is not an issue for me since the handler logs the expanded
query before sending it off, but the caching is. Thank you for pointing
it out.

I was doing it because I was running afoul of the limit on the URL size
(and the max boolean clauses as well, but I reset the max for that). But
I just realized that we can probably reset that limit as well as this
page shows:
http://serverfault.com/questions/56691/whats-the-maximum-url-length-in-tomca
t 

So perhaps if the URL length is the reason for the OP's question,
increasing it may be a better option than using POST?

-sujit

On Fri, 2011-10-14 at 09:30 -0700, Walter Underwood wrote:
> Why do you want to use POST? It is the wrong HTTP request type for search
results.
> 
> GET is for retrieving information from the server, POST is for changing
information on the server.
> 
> POST responses cannot be cached (see HTTP spec).
> 
> POST requests do not include the arguments in the log, which makes your
HTTP logs nearly useless for diagnosing problems.
> 
> wunder
> Walter Underwood
> 
> On Oct 14, 2011, at 9:20 AM, Sujit Pal wrote:
> 
> > If you use the CommonsHttpSolrServer from your client (not sure about
> > the other types, this is the one I use), you can pass the method as an
> > argument to its query() method, something like this:
> > 
> > QueryResponse rsp = server.query(params, METHOD.POST);
> > 
> > HTH
> > Sujit
> > 
> > On Fri, 2011-10-14 at 13:29 +, Rohit wrote:
> >> I want to user POST instead of GET while using solrj, but I am unable
to
> >> find a clear example for it. If anyone has implemented the same it
would be
> >> nice to get some insight.
> >> 
> >> 
> >> 
> >> Regards,
> >> 
> >> Rohit
> >> 
> >> Mobile: +91-9901768202
> >> 
> >> About Me:   http://about.me/rohitg
> >> 
> 
> 
> 
> 




Re: Filter Question

2011-10-14 Thread Jan Høydahl
Hi,

Interesting feature. See also https://issues.apache.org/jira/browse/LUCENE-3130 
for a discussion of using TypeAttribtue to (de)boost certain token types such 
as synonyms. Having the ability to remove a token type from the search, we 
could do many kind of searches on the same field, that we currently need 
separate fields for.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 14. okt. 2011, at 09:17, Steven A Rowe wrote:

> Hi Monica,
> 
> AFAIK there is nothing like the filter you've described, and I believe it 
> would be generally useful.  Maybe it could be called StopTermTypesFilter?  
> (Plural on Types to signify that more than one type of term can be stopped by 
> a single instance of the filter.)  
> 
> Such a filter should have an enablePositionIncrements option like StopFilter.
> 
> Steve
> 
>> -Original Message-
>> From: Monica Skidmore [mailto:monica.skidm...@careerbuilder.com]
>> Sent: Thursday, October 13, 2011 1:04 PM
>> To: solr-user@lucene.apache.org; Otis Gospodnetic
>> Subject: RE: Filter Question
>> 
>> Thanks, Otis - yes, this is different from the synonyms filter, which we
>> also use.  For example, if you wanted all tokens that were marked 'lemma'
>> to be removed, you could specify that, and all tokens with any type other
>> than 'lemma' would still be returned.  You could also choose to remove
>> all tokens of types 'lemma' and 'word' (although that would probably be a
>> bad idea!), etc.  Normally, if you don't want a token type, you just
>> don't include/run the filter that produces that type.  However, we have a
>> third-party filter that produces multiple types, and this allows us to
>> select a subset of those types.
>> 
>> I did see the HowToContribute wiki, but I'm relatively new to solr, and I
>> wanted to see if this looked familiar to someone before I started down
>> the contribution path.
>> 
>> Thanks again!
>> 
>>  -Monica
>> 
>> 
>> -Original Message-
>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
>> Sent: Thursday, October 13, 2011 12:37 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Filter Question
>> 
>> Monica,
>> 
>> This is different from Solr's synonyms filter with different synonyms
>> files, one for index-time and the other for query-time expansion (not
>> sure when you'd want that, but it looks like you need this and like
>> this), right?  If so, maybe you can describe what your filter does
>> differently and then follow http://wiki.apache.org/solr/HowToContribute -
>> thanks in advance! :)
>> 
>> Otis
>> 
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene
>> ecosystem search :: http://search-lucene.com/
>> 
>> 
>>> 
>>> From: Monica Skidmore 
>>> To: "solr-user@lucene.apache.org" 
>>> Sent: Thursday, October 13, 2011 11:37 AM
>>> Subject: Filter Question
>>> 
>>> Our Solr implementation includes a third-party filter that adds
>> additional, multiple term types to the token list (beyond "word",
>> etc.).  Most of the time this is exactly what we want, but we felt we
>> could improve our search results by having different tokens on the index
>> and query side.  Since the filter in question was third-party and we
>> didn't have access to source code, we wrote our own filter that will take
>> out tokens based on their term attribute type.
>>> 
>>> We didn't see another filter available that does this - did we overlook
>> it?  And if not, is this something that would be of value if we
>> contribute it back to the Solr community?
>>> 
>>> Monica Skidmore
>>> 
>>> 
>>> 
>>> 



Re: In-document highlighting DocValues?

2011-10-14 Thread Jan Høydahl
Hi,

The Highlighter is way too slow for this customer's particular use case - which 
is veery large documents. We don't need highlighted snippets for now, but we 
need to accurately decide what words (offsets) in the real HTML display of the 
resulting page to highlight. For this we only need offset info, not the 
snippets/fragments from the stored field.

But I have not looked at the Highlighter code. Perhaps we could fork it into a 
new search component which pulls out only the necessary meta info and payloads 
for us and returns it to client?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 13. okt. 2011, at 16:23, Mike Sokolov wrote:

> Is there some reason you don't want to leverage Highlighter to do this work?  
> It has all the necessary code for using the analyzed version of your query so 
> it will only match tokens that really contribute to the search match.
> 
> You might also be interested in LUCENE-2878 (which is still under development 
> on a branch though).  It aims to provide first-class access to payloads and 
> positions during scoring, and this will be very useful for complex 
> highlighting tasks.
> 
> Another possible solution to the OCR problem could be:  generate an XML file 
> with a tag for each word encoding its x,y coords, like :  y="10">This; index that file using XmlCharFilter or 
> HTMLStripCharFilter. Then when you search, use the Solr highlighter to 
> highlight the entire document, and process it using XML tools to find the 
> locations of the matches.
> 
> -Mike
> 
> On 10/10/2011 10:19 AM, Jan Høydahl wrote:
>> Hi,
>> 
>> We index structured documents, with numbered chapters, paragraphs and 
>> sentences. After doing a (rather complex) search, we may get multiple 
>> matches in each result doc. We want to highlight those matches in our 
>> front-end and currently we do a simple string match of the query words 
>> against the raw text.
>> 
>> However, this highlights some words that do not satisfy the original query, 
>> and also does not highlight other words where the match was in a stem, or 
>> synonym or wildcard. We thus need to improve this, and my plan was to 
>> utilize DocValues (Payloads). Would the following work?
>> 
>> 1. For each term in the field "text", index DocValues with info about 
>> chapter#, paragraph#, sentence# and word#.
>>This can be done in our application code, e.g. "foo|1,2,3,4" for chapter 
>> 1, paragraph 2, sentence 3 and word 4.
>> 
>> 2. Then, for a specific document in the result list, retrieve a list of all 
>> matches in field "text", and for each match,
>>retrieve the associated DocValues.
>> 
>> 3. The client application can now use this information to highlight matches, 
>> as well as "jump to next match" etc,
>>and would highlight the correct words only, e.g. it would be able to 
>> highlight "colour" even if the match was on the
>>synonym "color".
>> 
>> Another use case for this technique would be OCR applications where we store 
>> with each term its x,y offsets for where it occurs in
>> the original TIFF image scan.
>> 
>> What is in already in place and what code needs to be written? I don't 
>> currently see how to get a complete list of matches for a particular 
>> document.
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Solr Training - www.solrtraining.com
>> 
>>   



Re: NRT and replication

2011-10-14 Thread Yonik Seeley
On Fri, Oct 14, 2011 at 5:49 PM, Esteban Donato
 wrote:
>  I found soft commits very useful for NRT search requirements.
> However I couldn't figure out how replication works with this feature.
>  I mean, if I have N replicas of an index for load balancing purposes,
> when I soft commit a doc in one of this nodes, is there any way that
> those "in-memory" docs get replicated to the rest of replicas?

Nope.  Index replication isn't really that compatible with NRT.
But the new distributed indexing features we're working on will be!
The parent issue for this effort is SOLR-2358.

-Yonik
http://www.lucene-eurocon.com - The Lucene/Solr User Conference


ANNOUNCE: Stump Hoss @ Lucene EuroCon

2011-10-14 Thread Chris Hostetter


Hey everybody,

Next week, Lucene EuroCon will be invading Barcelona Spain, and I will 
once again be in the hot seat for a session of Stump The Chump.


  http://2011.lucene-eurocon.org/talks/20863

During the session, moderator and former "Chump" Grant Ingersoll will 
present me with tough Lucene/Solr questions submitted by users, to see 
what kind of solutions I can come up with on the spot.  Jan Høydahl, Mark 
Miller, and Yonik Seeley will be on hand as judges to award prizes for the 
questions that "stump" me the most...


 * 1st Prize: $100 (USD) gift certificate or cash prize
 * 2nd Prize: $50 (USD) gift certificate or cash prize
 * 3rd Prize: $25 (USD) gift certificate or cash prize

The goal is to really make me sweat and work hard to think of creative 
solutions to non-trivial problems on the spot -- like when I answer 
questions on the solr-user mailing list, except in a crowded room with 
hundreds of people staring at me and laughing.


But in order to be a success, we need your questions/problems/challenges!

If you had a tough situation with Solr that you managed to solve with a 
creative solution (or haven't solved yet) and are interesting to see what 
type of solution I might come up with under pressure, please email a 
description of your problem to "st...@lucene-eurocon.org" or post it as a 
comment on the the session page:


  http://2011.lucene-eurocon.org/talks/20863

Even if you won't be able to make it to Barcelona, you can still 
participate: email your question (along with a method to contact you) and 
you can watch me squirm later when we post video of the session on online. 
If you can make it to Barcelona: all the more fun to watch live and in 
person (and maybe answer follow up questions).




-Hoss

NRT and replication

2011-10-14 Thread Esteban Donato
Hello guys,

  I found soft commits very useful for NRT search requirements.
However I couldn't figure out how replication works with this feature.
 I mean, if I have N replicas of an index for load balancing purposes,
when I soft commit a doc in one of this nodes, is there any way that
those "in-memory" docs get replicated to the rest of replicas?

Regards,
Esteban


Re: how to add search terms to output of wt=csv?

2011-10-14 Thread simon
There's an open issue -
https://issues.apache.org/jira/browse/SOLR-2731which addresses adding
this kind of metadata to csv output. There's a patch
there which may be useful, and could probably be adapted if needed

-Simon

On Fri, Oct 14, 2011 at 4:37 PM, Fred Zimmerman wrote:

> Hi,
>
> I want to include the search query in the output of wt=csv (or a duplicate
> of it) so that the process that receives this output can do something with
> the search terms. How would I accomplish this?
>
> Fred
>


Re: Error Finding solrconfig.xml

2011-10-14 Thread Chris Hostetter

: Can't find resource 'solrconfig.xml' in classpath or
: '/home/datadmin/public_html/apache-solr/example/solr/./conf/',
: cwd=/usr/local/jakarta/apache-tomcat-5.5.33/bin

a) several of the steps you mention rever to 

/home/datadmin/public_html/apache-solr/example and 
/home/datadmin/public_html/apache-solr/example/solr but at no point have 
you told us wether you actually created either of those directories, or 
what you put in them.

: 3.Edited /usr/local/jakarta/tomcat/webapps/solr/WEB-INF/web.xml   as:
...
: 3. Created solr.xml in /usr/local/jakarta/tomcat/conf/Catalina/localhost as:

did you really do *both* of these? (they are both labeled step #3) ... you 
should only do one of them (i would recommend the later as it doesn't 
require mucking with the war file) and since you set the value of 
"solr/home to two differnet things in each case, i honestly have no idea 
which one will be used (i don't know what the precendence is in tomcat)

if you intend for "/home/datadmin/public_html/apache-solr/example" to be 
your solr home dir, then it should contain either a "solr.xml" (for 
multicore setups) like this one...

https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_3/solr/example/solr/solr.xml

...which points at your various cores, or it should contain a "conf/" 
subdir and a "conf/solrconfig.xml" file like this one...

https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_3/solr/example/solr/conf/solrconfig.xml



-Hoss


how to add search terms to output of wt=csv?

2011-10-14 Thread Fred Zimmerman
Hi,

I want to include the search query in the output of wt=csv (or a duplicate
of it) so that the process that receives this output can do something with
the search terms. How would I accomplish this?

Fred


Re: SolrJ + Post

2011-10-14 Thread Yury Kats
On 10/14/2011 12:11 PM, Rohit wrote:
> I want to query, right now I use it in the following way,
> 
> CommonsHttpSolrServer server = new CommonsHttpSolrServer("URL HERE");
> SolrQuery sq = new SolrQuery();
> sq.add("q",query);
> QueryResponse qr = server.query(sq);

QueryResponse qr = server.query(sq, METHOD.POST);


Re: Stopword filter - refreshing stop word list periodically

2011-10-14 Thread Jithin
What will be the name of this hard coded core? I was re arranging my
directory structure adding a separate directory for code. And it does work
with a single core.

On Fri, Oct 14, 2011 at 11:47 PM, Chris Hostetter-3 [via Lucene] <
ml-node+s472066n3422415...@n3.nabble.com> wrote:

>
> : I am not running in a multi core environment. My application requires
> only a
> : single search schema. Does it make sense to go for a multi core setup in
> : this scenario? Given that we currently have a single core is there any
> : alternative to RELOAD which work in a single core setup?
>
> In recent versions of Solr (I think since 3.1) every Solr installation is
> a multi-core environemnet, Solr just silently uses a hardcoded default
> solr.xml that uses the solr home dir as the instanceDir of the "default
> core" if your solr home dir doesn't already contain a solr.xml.
>
> So even if a single core setup, you should still be able to reload the
> core.
>
>
> -Hoss
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Stopword-filter-refreshing-stop-word-list-periodically-tp3421611p3422415.html
>  To unsubscribe from Stopword filter - refreshing stop word list
> periodically, click 
> here.
>
>



-- 
Thanks
Jithin Emmanuel


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stopword-filter-refreshing-stop-word-list-periodically-tp3421611p3422550.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Instructions for Multiple Server Webapps Configuring with JNDI

2011-10-14 Thread Chris Hostetter

: modified the solr/home accordingly.  I have an empty directory under
: tomcat/webapps named after the solr home directory in the context fragment.

if that empty directory has the same base name as your context fragment 
(ie: "tomcat/webapps/solr0" and "solr0.xml") that may give you problems 
... the entire point of using context fragment files is to define webapps 
independently of a simple directory based hierarchy in tomcat/webapps ... 
if you have a directory there with the same name you create a conflict -- 
which webapp should it use, the empty one, or the one specified by your 
contextt file?

: I expected to fire up tomcat and have it unpack the war file contents into the
: solr home directory specified in the context fragment, but its empty, as is
: the webapps directory.

that's not what the "solr/home" env variable is for at all.  tomcat will 
put the unpacked war where ever it needs/wants to (in theory it could just 
load it in memory) ... the point of the solr/home env variable is for you 
to tell the solr.war where to find the configuration files for this 
context.

Note in particular from the instructions you mentioned on the wiki...

"Copy the example/solr directory from the source to the installation 
directory like /opt/solr/example"


-Hoss


Re: SolrJ stripping Carriage Returns

2011-10-14 Thread mkorthuis
Hmm..  Reason:

When I debug it in eclipse, I am verifying that the value I am setting in
the SolrInputDocument includes '\r\n'.  However it only has '\n' in the
index.  It is just a simple string field in solr.

On Fri, Oct 14, 2011 at 2:23 PM, Chris Hostetter-3 [via Lucene] <
ml-node+s472066n3422431...@n3.nabble.com> wrote:

>
> : We recently updated our Solr and Solr indexing from DIH using Solr 1.4 to
> our
> : own Hadoop import using SolrJ and Solr 3.4.
> ...
> : Any document that has a string field value with a carriage return "\r" is
>
> : having that carriage return stripped before being added to the index.
>  All
> : line breaks "\n" are not being stripped.
> ...
> : This did not occur with the DIH.
> :
> : Thoughts? Is there a way to not have solrJ strip all carriage returns?
>
> What makes you think this is SolrJ?  If it is, you should be able to
> create a ~10 line test of SOlrJ demonstrating this with hard coded date.
>
> I suspect your data is getting cleaned somewhere else in your data flow
> that didn't exist when DIH was fetching it directly.
>
>
>
> -Hoss
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/SolrJ-stripping-Carriage-Returns-tp3420557p3422431.html
>  To unsubscribe from SolrJ stripping Carriage Returns, click 
> here.
>
>



-- 
Michael Korthuis
Platform Architect
Gemvara Inc.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-stripping-Carriage-Returns-tp3420557p3422479.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Excluding docs from results based on matched field

2011-10-14 Thread Chris Hostetter

: Can one configure (e)dismax to add tat + to 1 or more fields, like name 
: in that example, in order to require that clause though?

Otis: this thread has fallen behind it's deja-vu brother in terms of 
useful information...

http://www.lucidimagination.com/search/document/86a9202b97df441d/dismax_question#9c70e382c2859760

-Hoss


Re: SolrJ stripping Carriage Returns

2011-10-14 Thread Chris Hostetter

: We recently updated our Solr and Solr indexing from DIH using Solr 1.4 to our
: own Hadoop import using SolrJ and Solr 3.4.   
...
: Any document that has a string field value with a carriage return "\r" is
: having that carriage return stripped before being added to the index.  All
: line breaks "\n" are not being stripped.  
...
: This did not occur with the DIH.  
: 
: Thoughts? Is there a way to not have solrJ strip all carriage returns?

What makes you think this is SolrJ?  If it is, you should be able to 
create a ~10 line test of SOlrJ demonstrating this with hard coded date.

I suspect your data is getting cleaned somewhere else in your data flow 
that didn't exist when DIH was fetching it directly.



-Hoss


Re: Stopword filter - refreshing stop word list periodically

2011-10-14 Thread Chris Hostetter

: I am not running in a multi core environment. My application requires only a
: single search schema. Does it make sense to go for a multi core setup in
: this scenario? Given that we currently have a single core is there any
: alternative to RELOAD which work in a single core setup?

In recent versions of Solr (I think since 3.1) every Solr installation is 
a multi-core environemnet, Solr just silently uses a hardcoded default 
solr.xml that uses the solr home dir as the instanceDir of the "default 
core" if your solr home dir doesn't already contain a solr.xml.

So even if a single core setup, you should still be able to reload the 
core.


-Hoss


Re: ClassCastException when using FieldAnalysisRequest

2011-10-14 Thread Shane Perry
After looking at this more, it appears that
solr.HTMLStripCharFilterFactory does not return a list which
AnalysisResponseBase is expecting.  I have created a bug ticket
(https://issues.apache.org/jira/browse/SOLR-2834)

On Fri, Oct 14, 2011 at 8:28 AM, Shane Perry  wrote:
> Hi,
>
> Using Solr 3.4.0, I am trying to do a field analysis via the
> FieldAnalysisRequest feature in solrj.  During the process() call, the
> following ClassCastException is thrown:
>
> java.lang.ClassCastException: java.lang.String cannot be cast to 
> java.util.List
>        at 
> org.apache.solr.client.solrj.response.AnalysisResponseBase.buildPhases(AnalysisResponseBase.java:69)
>        at 
> org.apache.solr.client.solrj.response.FieldAnalysisResponse.setResponse(FieldAnalysisResponse.java:66)
>        at 
> org.apache.solr.client.solrj.request.FieldAnalysisRequest.process(FieldAnalysisRequest.java:107)
>
> My code is as follows:
>
> FieldAnalysisRequest request = new FieldAnalysisRequest(myUri).
>  addFieldName(field).
>  setFieldValue(text).
>  setQuery(text);
>
> request.process(myServer);
>
> Is there something I am doing wrong?  Any help would be appreciated.
>
> Sincerely,
>
> Shane
>


Re: multivale field: solr stop to go

2011-10-14 Thread Erick Erickson
Hmmm, you need to post more information,
especially the results of appending &debugQuery=true

But right off the bat, the "string" type is probably not
what you want if your input has more than one
word in it. "string" types are completely unanalyzed,
tokens aren't extracted, no stemming or casing
is done, etc.

Best
Erick

On Fri, Oct 14, 2011 at 3:13 AM, ojalà  wrote:
> I need to extract the last 20 keywords in all my documents, sorted by score.
> The keywords field is multivalued and the Solr schema I have defined like
> this:
>  multiValued="true"/>
> The problem is as follows: my query is ok but when I arrive to index 518
> documents Solr seems to no longer work. But if I remove a document (517) the
> query to Solr continues to operate.
> What could be my problem?
>
> http://solr:8983/solr/select?indent=on&version=2.2&;
> start=0&rows=5&qt=standard& wt=standard&
> fl=id,metadata,score,keywords&sort=score+asc& q=category_id:"a:Original"&
> facet=true& facet.limit=20& facet.mincount=1& facet.sort=true&
> facet.field=keywords
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/multivale-field-solr-stop-to-go-tp3420934p3420934.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: SolrJ + Post

2011-10-14 Thread Sujit Pal
Not the OP, but I put it in on /one/ of my solr custom handlers that
acts as a proxy to itself (ie the server its part of). It basically
rewrites the incoming query (usually short 50-250 chars at most) to a
set of very long queries and passes them in parallel to the server,
gathers up the results and returns a combo response. 

The logging is not an issue for me since the handler logs the expanded
query before sending it off, but the caching is. Thank you for pointing
it out.

I was doing it because I was running afoul of the limit on the URL size
(and the max boolean clauses as well, but I reset the max for that). But
I just realized that we can probably reset that limit as well as this
page shows:
http://serverfault.com/questions/56691/whats-the-maximum-url-length-in-tomcat 

So perhaps if the URL length is the reason for the OP's question,
increasing it may be a better option than using POST?

-sujit

On Fri, 2011-10-14 at 09:30 -0700, Walter Underwood wrote:
> Why do you want to use POST? It is the wrong HTTP request type for search 
> results.
> 
> GET is for retrieving information from the server, POST is for changing 
> information on the server.
> 
> POST responses cannot be cached (see HTTP spec).
> 
> POST requests do not include the arguments in the log, which makes your HTTP 
> logs nearly useless for diagnosing problems.
> 
> wunder
> Walter Underwood
> 
> On Oct 14, 2011, at 9:20 AM, Sujit Pal wrote:
> 
> > If you use the CommonsHttpSolrServer from your client (not sure about
> > the other types, this is the one I use), you can pass the method as an
> > argument to its query() method, something like this:
> > 
> > QueryResponse rsp = server.query(params, METHOD.POST);
> > 
> > HTH
> > Sujit
> > 
> > On Fri, 2011-10-14 at 13:29 +, Rohit wrote:
> >> I want to user POST instead of GET while using solrj, but I am unable to
> >> find a clear example for it. If anyone has implemented the same it would be
> >> nice to get some insight.
> >> 
> >> 
> >> 
> >> Regards,
> >> 
> >> Rohit
> >> 
> >> Mobile: +91-9901768202
> >> 
> >> About Me:   http://about.me/rohitg
> >> 
> 
> 
> 
> 



Re: SolrJ + Post

2011-10-14 Thread Walter Underwood
Why do you want to use POST? It is the wrong HTTP request type for search 
results.

GET is for retrieving information from the server, POST is for changing 
information on the server.

POST responses cannot be cached (see HTTP spec).

POST requests do not include the arguments in the log, which makes your HTTP 
logs nearly useless for diagnosing problems.

wunder
Walter Underwood

On Oct 14, 2011, at 9:20 AM, Sujit Pal wrote:

> If you use the CommonsHttpSolrServer from your client (not sure about
> the other types, this is the one I use), you can pass the method as an
> argument to its query() method, something like this:
> 
> QueryResponse rsp = server.query(params, METHOD.POST);
> 
> HTH
> Sujit
> 
> On Fri, 2011-10-14 at 13:29 +, Rohit wrote:
>> I want to user POST instead of GET while using solrj, but I am unable to
>> find a clear example for it. If anyone has implemented the same it would be
>> nice to get some insight.
>> 
>> 
>> 
>> Regards,
>> 
>> Rohit
>> 
>> Mobile: +91-9901768202
>> 
>> About Me:   http://about.me/rohitg
>> 






Re: SolrJ + Post

2011-10-14 Thread Sujit Pal
If you use the CommonsHttpSolrServer from your client (not sure about
the other types, this is the one I use), you can pass the method as an
argument to its query() method, something like this:

QueryResponse rsp = server.query(params, METHOD.POST);

HTH
Sujit

On Fri, 2011-10-14 at 13:29 +, Rohit wrote:
> I want to user POST instead of GET while using solrj, but I am unable to
> find a clear example for it. If anyone has implemented the same it would be
> nice to get some insight.
> 
>  
> 
> Regards,
> 
> Rohit
> 
> Mobile: +91-9901768202
> 
> About Me:   http://about.me/rohitg
> 
>  
> 



Re: text search and data aggregation, thoughts?

2011-10-14 Thread Esteban Donato
thanks Pravesh for your feedback.  I have 10 million products and 165M
rows of visits accumulated for 2 years.  The data-aggregated needs to
be shown in the search result page along with the product description.

I also felt option 2 was the most suitable but wanted to have a second
view.  The only hesitation here is the overhead in doing 2 queries (1
to products and 1 to visits) for every search that could impact in
performance.

Regards,
Esteban

On Fri, Oct 14, 2011 at 8:35 AM, pravesh  wrote:
> Hi Esteban,
>
> A lot depends on a lot of things: 1) How much volume(total documents) 2)
> size of index 3) How you represent the data-aggregated part in your UI.
>
> Your option-2 seems to be a suitable way to go. This way you tune each cores
> separately. Also the use-cases for updating each document/product in both
> indexes also seems different. One is updated when a product is
> added/updated. Other is updated when a product in viewed/sold from search
> results
>
> Option-1 can be used in case you are showing the data-aggregation stats on
> the search results page only along with each item. If it is shown in the
> item-detail page then option-1 seems better.
>
> Regds
> Pravesh
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/text-search-and-data-aggregation-thoughts-tp3416330p3421361.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


RE: SolrJ + Post

2011-10-14 Thread Rohit
I want to query, right now I use it in the following way,

CommonsHttpSolrServer server = new CommonsHttpSolrServer("URL HERE");
SolrQuery sq = new SolrQuery();
sq.add("q",query);
QueryResponse qr = server.query(sq);

Regards,
Rohit
-Original Message-
From: Yury Kats [mailto:yuryk...@yahoo.com] 
Sent: 14 October 2011 13:51
To: solr-user@lucene.apache.org
Subject: Re: SolrJ + Post

On 10/14/2011 9:29 AM, Rohit wrote:
> I want to user POST instead of GET while using solrj, but I am unable to
> find a clear example for it. If anyone has implemented the same it would
be
> nice to get some insight.

To do what? Submit? Query? How do you use SolrJ now?



Re: Multiple search analyzers on the same field type possible?

2011-10-14 Thread Victor
I've spent today writing my own SynonymFilter and SynonymFilterFactory. And
it works!

I've followed Erick's advice and pre- and postfixed all the words that I
want to stem with a @. So, if I want to stem the word car, I injest it in
the query as @car@.

My adapted synonymfilter recognizes the pre/postfixing, removes the @
characters and continues as usual (which means the synonym filter will do
what it is supposed to be doing). If no "stemming tags" are found, it aborts
the synonym lookup part of the process for that token an returns
immediately.

So: 
car --> car
cars --> cars
@car@ --> car and cars

Mission accomplished, no extra storage needed, current index can stay as it
is, end user can switch between stemming and no stemming when he/she wants
too.

I think I saved a lot of money today.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-search-analyzers-on-the-same-field-type-possible-tp3417898p3422060.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Stopword filter - refreshing stop word list periodically

2011-10-14 Thread Otis Gospodnetic
You could restart your Solr instance.  If you have just 1 Solr instance, that 
means a bit of a downtime.  If you have 2 Solr slaves behind a Load Balancer, 
then you can avoid that downtime.
But I think you could also just configure your 1 Solr core via solr.xml and 
then you can use that RELOAD command.

Otis


Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>
>From: Jithin 
>To: solr-user@lucene.apache.org
>Sent: Friday, October 14, 2011 11:21 AM
>Subject: Re: Stopword filter - refreshing stop word list periodically
>
>I am not running in a multi core environment. My application requires only a
>single search schema. Does it make sense to go for a multi core setup in
>this scenario? Given that we currently have a single core is there any
>alternative to RELOAD which work in a single core setup?
>
>On Fri, Oct 14, 2011 at 6:48 PM, Michael Kuhlmann-4 [via Lucene] <
>ml-node+s472066n3421627...@n3.nabble.com> wrote:
>
>> Am 14.10.2011 15:10, schrieb Jithin:
>> > Hi,
>> > Is it possible to refresh the stop word list periodically say once in 6
>> > hours. Is this already supported in Solr or are there any work arounds.
>> > Kindly help me in understanding this.
>>
>> Hi,
>>
>> you can trigger a reload command to the core admin, assuming you're
>> running a multi core environment (which I'd recommend anyway).
>>
>> Simply add
>> "curl http://host:port
>> /solr/admin/cores?action=RELOAD&core=corename">http://host:port/solr/admin/cores?action=RELOAD&core=corename";
>>
>> to your /etc/crontab file, and set the leading time fields correspondingly.
>>
>>
>> -Kuli
>>
>>
>> --
>>  If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://lucene.472066.n3.nabble.com/Stopword-filter-refreshing-stop-word-list-periodically-tp3421611p3421627.html
>>  To unsubscribe from Stopword filter - refreshing stop word list
>> periodically, click 
>> here.
>>
>>
>
>
>
>-- 
>Thanks
>Jithin Emmanuel
>
>
>--
>View this message in context: 
>http://lucene.472066.n3.nabble.com/Stopword-filter-refreshing-stop-word-list-periodically-tp3421611p3422004.html
>Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Stopword filter - refreshing stop word list periodically

2011-10-14 Thread Jithin
I am not running in a multi core environment. My application requires only a
single search schema. Does it make sense to go for a multi core setup in
this scenario? Given that we currently have a single core is there any
alternative to RELOAD which work in a single core setup?

On Fri, Oct 14, 2011 at 6:48 PM, Michael Kuhlmann-4 [via Lucene] <
ml-node+s472066n3421627...@n3.nabble.com> wrote:

> Am 14.10.2011 15:10, schrieb Jithin:
> > Hi,
> > Is it possible to refresh the stop word list periodically say once in 6
> > hours. Is this already supported in Solr or are there any work arounds.
> > Kindly help me in understanding this.
>
> Hi,
>
> you can trigger a reload command to the core admin, assuming you're
> running a multi core environment (which I'd recommend anyway).
>
> Simply add
> "curl http://host:port
> /solr/admin/cores?action=RELOAD&core=corename">http://host:port/solr/admin/cores?action=RELOAD&core=corename";
>
> to your /etc/crontab file, and set the leading time fields correspondingly.
>
>
> -Kuli
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Stopword-filter-refreshing-stop-word-list-periodically-tp3421611p3421627.html
>  To unsubscribe from Stopword filter - refreshing stop word list
> periodically, click 
> here.
>
>



-- 
Thanks
Jithin Emmanuel


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stopword-filter-refreshing-stop-word-list-periodically-tp3421611p3422004.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Stopword filter - refreshing stop word list periodically

2011-10-14 Thread Otis Gospodnetic
Of course, if you change stop words you may want to reindex your old content, 
so that the new state of stop words is reflected in all documents.
It's not an absolute must to do that, but if you do not do it, you may see 
strange search results that will make you wonder why some documents matched a 
query when they look like they should not have matched, and vice versa.

Otis


Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>
>From: Michael Kuhlmann a
>To: solr-user@lucene.apache.org
>Sent: Friday, October 14, 2011 9:17 AM
>Subject: Re: Stopword filter - refreshing stop word list periodically
>
>Am 14.10.2011 15:10, schrieb Jithin:
>> Hi,
>> Is it possible to refresh the stop word list periodically say once in 6
>> hours. Is this already supported in Solr or are there any work arounds.
>> Kindly help me in understanding this.
>
>Hi,
>
>you can trigger a reload command to the core admin, assuming you're
>running a multi core environment (which I'd recommend anyway).
>
>Simply add
>"curl http://host:port/solr/admin/cores?action=RELOAD&core=corename";
>to your /etc/crontab file, and set the leading time fields correspondingly.
>
>-Kuli
>
>
>

ClassCastException when using FieldAnalysisRequest

2011-10-14 Thread Shane Perry
Hi,

Using Solr 3.4.0, I am trying to do a field analysis via the
FieldAnalysisRequest feature in solrj.  During the process() call, the
following ClassCastException is thrown:

java.lang.ClassCastException: java.lang.String cannot be cast to java.util.List
       at 
org.apache.solr.client.solrj.response.AnalysisResponseBase.buildPhases(AnalysisResponseBase.java:69)
       at 
org.apache.solr.client.solrj.response.FieldAnalysisResponse.setResponse(FieldAnalysisResponse.java:66)
       at 
org.apache.solr.client.solrj.request.FieldAnalysisRequest.process(FieldAnalysisRequest.java:107)

My code is as follows:

FieldAnalysisRequest request = new FieldAnalysisRequest(myUri).
  addFieldName(field).
  setFieldValue(text).
  setQuery(text);

request.process(myServer);

Is there something I am doing wrong?  Any help would be appreciated.

Sincerely,

Shane


Re: SolrJ + Post

2011-10-14 Thread Yury Kats
On 10/14/2011 9:29 AM, Rohit wrote:
> I want to user POST instead of GET while using solrj, but I am unable to
> find a clear example for it. If anyone has implemented the same it would be
> nice to get some insight.

To do what? Submit? Query? How do you use SolrJ now?


Re: Multiple search analyzers on the same field type possible?

2011-10-14 Thread Chantal Ackermann

Hi Victor,

your wages are hopefully more than what costs disk space, nowadays?
I don't want to spoil the fun in thinking of new challenges when it
comes to SOLR, but from a project management point of view I would buy
some more storage and get it done with copyfield and two requesthandlers
that choose the stemmed versus the non-stemmed fields to search on.
(Given that an index is a temporary storage and does not require highest
quality disk RAID systems.)

Well, I'm probably being naive...

To add something valuable to this post:
Maybe you could create two cores that point to the same index. This
might be possible if you use the same index path in both solrconfig.xml?
(I haven't tried it.) Use the exact same schema but with different
synonym files. If one synonym file is empty, and the other contains your
stemming stuff, then querying one core versus querying the other should
have the effect you expect?

No offense,
Chantal


On Fri, 2011-10-14 at 14:36 +0200, Victor wrote:
> Hi Erick,
> 
> First of all, thanks for your posts, I really appreciate this!
> 
> 1) Yes, we have tested alternative stemmers, but I admit that a definite
> decission has not been made yet. Anyway, we definately do not want to create
> a stemmed index because of storage issues and we definately want to be able
> to allow the end-user to turn it on and off. So choosing a different stemmer
> does not solve my problem of wanting to switch between stemming/non-stemming
> without additional indexes.
> 
> 2) Rant granted :) And I definately agree with you. It is always a challenge
> to find a balance between what a customer wants and how far you really want
> to go to in achieving a solution (that does not conflict too much with the
> maintainability of the system).
> 
> But, I do think that the requirements are not that outragious. It seems to
> me reasonable that once you have created an index it would be nice to be
> able to use that index in different ways. After all, the only thing I want
> is apply different query analyzers (mind you, I am not formatting the
> tokens, what could possibly result in index/query token conflicts, I am
> merely expanding query possibilities here by adding synonyms, the rest stays
> the same).
> 
> Another good example could be that you want to index a field that contains
> text in different languages. Would it not be nice then to be able to define
> optimized query analyzers on that field, one for each language? You could
> then access them using the q parameter: q=::,
> where  is the name of the query analyzer to use. It seems to me to
> be a nice feature. Could be a big change though, because I assume that at
> the moment the analyzers have hard-coded names ("index" and "query").
> 
> 3) Yep, I was also looking into this (because other options seemed to be
> vaporizing). Don't know if I'm going to use suffixes or maybe add a trigger
> word like @stem@. Depends on what the scope of the called method is. I
> prefer the trigger word @stem@ variant because I can then just insert that
> without needing to parse the query string to find out what the actual seach
> words are that I need to suffix.
> 
> Cheers and again, thanks for helping me on this,
> Victor
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Multiple-search-analyzers-on-the-same-field-type-possible-tp3417898p3421522.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Stopword filter - refreshing stop word list periodically

2011-10-14 Thread Michael Kuhlmann
Am 14.10.2011 15:10, schrieb Jithin:
> Hi,
> Is it possible to refresh the stop word list periodically say once in 6
> hours. Is this already supported in Solr or are there any work arounds.
> Kindly help me in understanding this.

Hi,

you can trigger a reload command to the core admin, assuming you're
running a multi core environment (which I'd recommend anyway).

Simply add
"curl http://host:port/solr/admin/cores?action=RELOAD&core=corename";
to your /etc/crontab file, and set the leading time fields correspondingly.

-Kuli


Re: Multiple search analyzers on the same field type possible?

2011-10-14 Thread Victor
Hi Erick,

First of all, thanks for your posts, I really appreciate this!

1) Yes, we have tested alternative stemmers, but I admit that a definite
decission has not been made yet. Anyway, we definately do not want to create
a stemmed index because of storage issues and we definately want to be able
to allow the end-user to turn it on and off. So choosing a different stemmer
does not solve my problem of wanting to switch between stemming/non-stemming
without additional indexes.

2) Rant granted :) And I definately agree with you. It is always a challenge
to find a balance between what a customer wants and how far you really want
to go to in achieving a solution (that does not conflict too much with the
maintainability of the system).

But, I do think that the requirements are not that outragious. It seems to
me reasonable that once you have created an index it would be nice to be
able to use that index in different ways. After all, the only thing I want
is apply different query analyzers (mind you, I am not formatting the
tokens, what could possibly result in index/query token conflicts, I am
merely expanding query possibilities here by adding synonyms, the rest stays
the same).

Another good example could be that you want to index a field that contains
text in different languages. Would it not be nice then to be able to define
optimized query analyzers on that field, one for each language? You could
then access them using the q parameter: q=::,
where  is the name of the query analyzer to use. It seems to me to
be a nice feature. Could be a big change though, because I assume that at
the moment the analyzers have hard-coded names ("index" and "query").

3) Yep, I was also looking into this (because other options seemed to be
vaporizing). Don't know if I'm going to use suffixes or maybe add a trigger
word like @stem@. Depends on what the scope of the called method is. I
prefer the trigger word @stem@ variant because I can then just insert that
without needing to parse the query string to find out what the actual seach
words are that I need to suffix.

Cheers and again, thanks for helping me on this,
Victor


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-search-analyzers-on-the-same-field-type-possible-tp3417898p3421522.html
Sent from the Solr - User mailing list archive at Nabble.com.


Morelikethis understanding question

2011-10-14 Thread Vadim Kisselmann
Hello folks,
i have a question about the MLT.

For example my query:

localhost:8983/solr/mlt/?q=gefechtseinsatz+AND+dna&mlt=true&mlt.fl=text&mlt.count=0&mlt.boost=true&mlt.mindf=5&mlt.mintf=5&mlt.minwl=4

*I have 1 Query-RESULT and 13 MLT-docs. The MLT-Result corresponds to
the half of my index.*
In my case i want j*ust this docs, which have at least half of the words
from my Query-RESULT-Document,* they should be very similar.
How should i set my parameters to achieve this?

Thanks and Regards
Vadim


Re: Multiple search analyzers on the same field type possible?

2011-10-14 Thread Erick Erickson
H

A couple of things.
1> Have you looked at alternate stemmers? Porter stemmer is rather
 aggressive. Perhaps a less-agressive stemmer would suit your
 internal users.
2> Try a few things, but if you can't solve it reasonably quickly,
  go back to your internal customer and explain the costs of
  fixing this. Really. You're jumping through hoops because
 results "did not please my internal customer". Can they
  quantify their objections? Or is this just looking at the
  results for random searches and guessing at relevance?
  If the latter, you really, really, really need to get them to
  quantify their objections and I bet you'll find that they can't.
  And you'll forever be trying to tweak results to please
  how they feel about it today. Which will be different from
  how they felt about *the exact same results* yesterday.
  You can go around this loop forever.

  We've (programmers in general) done a rather poor job
  historically of laying out the *costs* of fixing things to
  suit a customer and allowing the various stake-holders
  to make rational decisions. We say "Sure, that can be done"
  and leave out "but it will take a month when we won't
  be able to do X, Y, or Z, and requires more hardware".
  There, rant done

3> I suppose you could think about writing your own filter that
 added the original token and the stemmed token.
 Something like the SynonymFilter but instead of alternate
 versions of the word, you'd have the stemmed version
 and the original version at the same position. Or maybe
 you have the stemmed version and then the original
 version with a special ending character (say $) appended.
 Then you'd have to somehow write a query-time
 analysis chain (or a query parser?) that somehow
 knew enough to use the stemmed or original word (plus $)
 in the query. But I admit I haven't thought this through
 at all. There'd have to be some parameter you passed
 through with the query that controlled whether the
 regular stemming process happened or not... And I
 don't know offhand how that'd work.

 Or reverse that. Append $ to all the stemmed variants.

But really, before going there (which I admit would be more
fun than arguing with your customer), try one of the less
aggressive stemmers. Or see if your other stake-holders
would be better served by not stemming at all. Or

Best
Erick


On Fri, Oct 14, 2011 at 3:22 AM, Victor  wrote:
> Hi Erick,
>
> I work for a very big library and we store huge amounts of data. Indexing
> some of our collections can take days and the index files can get very big.
> We are a non-profit organisation, so we want to provide maximum service to
> our customers but at the same time we are bound to a fixed budget and want
> to keep costs as low as possible (including disk space). Our customers vary
> from academic people that want to do very precise searches to common users
> who want to seach in a more general way. The library now wants to implement
> some form of stemming, but we have had one demo in the past with a stemmer
> that returned results that did not please my internal customer (another
> department).
>
> So my wish list looks like this:
>
> 1) Implement stemming
> 2) Give the end user the possibility to turn stemming on or off for their
> searches
> 3) Have maximum control over the stemmer without the need to reindex if we
> change something there
> 4) Prevent the need for more storage (to keep the operations people happy)
>
> So far I have been able to satisfy 1,2 and 3. I am using a synonyms list at
> query time to apply my stemming. The synonym list I build as follows:
>
> a) load a library (a text file with 1 word per line)
> b) remove stop words from the list
> c) link words that have the same stem
>
> Bullet c) is a little bit more sophisticated, because I do not link words
> that are already part of a pre-defined synonym list that contains
> exceptions.
>
> All this I do to keep maximum control over the behaviour of the stemmer.
> Since this is a demo and it will be used to convince other people in my
> organisation that stemming could be worth implementing, I need to be able to
> adjust its behaviour quickly.
>
> So far processing speed has not been an issue, but disk storage has.
> Generally, at index time we remove as few tokens as possible and our objects
> are complete books, news papers (from 1618 until 1995), etc . So you can
> imagine that our indexes get very, very big.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Multiple-search-analyzers-on-the-same-field-type-possible-tp3417898p3420946.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: text search and data aggregation, thoughts?

2011-10-14 Thread pravesh
Hi Esteban,

A lot depends on a lot of things: 1) How much volume(total documents) 2)
size of index 3) How you represent the data-aggregated part in your UI.

Your option-2 seems to be a suitable way to go. This way you tune each cores
separately. Also the use-cases for updating each document/product in both
indexes also seems different. One is updated when a product is
added/updated. Other is updated when a product in viewed/sold from search
results

Option-1 can be used in case you are showing the data-aggregation stats on
the search results page only along with each item. If it is shown in the
item-detail page then option-1 seems better.

Regds
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/text-search-and-data-aggregation-thoughts-tp3416330p3421361.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Hi symbol library

2011-10-14 Thread Erick Erickson
This is really a Python question, basic
URL escaping. It has nothing to do with
Solr.

WARNING: I don't do Python, but a quick
google search shows:
http://stackoverflow.com/questions/1695183/how-to-percent-encode-url-parameters-in-python

Best
Erick

2011/10/14 Guizhi Shi :
>
> Hi Dear Sir/Madam
> This is George, I m using SOLR,It looks so powerful, I meet a problem, it is 
> like:
> I want to post a request from python script, and pass the http request to my 
> own SOLR server, but if the request contain the symbol, such as & SOLR will 
> responsea error.
> I use the admin page to do same request, find SOLR convert the & to %26, $ to 
> %23
> Is there a library, which contains all mapping for those symbol
> if yes, could you please send me the Library or mapping list
> Thank you
> Best Regards
> George


multivale field: solr stop to go

2011-10-14 Thread ojalà
I need to extract the last 20 keywords in all my documents, sorted by score.
The keywords field is multivalued and the Solr schema I have defined like
this:

The problem is as follows: my query is ok but when I arrive to index 518
documents Solr seems to no longer work. But if I remove a document (517) the
query to Solr continues to operate.
What could be my problem?

http://solr:8983/solr/select?indent=on&version=2.2&;
start=0&rows=5&qt=standard& wt=standard&
fl=id,metadata,score,keywords&sort=score+asc& q=category_id:"a:Original"&
facet=true& facet.limit=20& facet.mincount=1& facet.sort=true&
facet.field=keywords 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/multivale-field-solr-stop-to-go-tp3420934p3420934.html
Sent from the Solr - User mailing list archive at Nabble.com.


Hi symbol library

2011-10-14 Thread Guizhi Shi

Hi Dear Sir/Madam   
This is George, I m using SOLR,It looks so powerful, I meet a problem, it is 
like:
I want to post a request from python script, and pass the http request to my 
own SOLR server, but if the request contain the symbol, such as & SOLR will 
responsea error.
I use the admin page to do same request, find SOLR convert the & to %26, $ to 
%23
Is there a library, which contains all mapping for those symbol
if yes, could you please send me the Library or mapping list
Thank you 
Best Regards
George

Re: upgrading 1.4 to 3.x

2011-10-14 Thread pravesh
Just look into your tomcat logs in more detail.specifically the logs when
tomcat loads the solr application's web context. There you might find some
clues or just post the logs snapshot here.


Regds
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/upgrading-1-4-to-3-x-tp3415044p3421225.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Xsl for query output

2011-10-14 Thread Chantal Ackermann

Hi Jeremy,


The xsl files go into the subdirectory /xslt/ (you have to create that)
in the /conf/ directory of the core that should return the transformed
results.

So, if you have a core /myCore/ that you want to return transformed
results you need to put the example.xsl into:

$SOLR_HOME/myCore/conf/xslt/example.xsl

and in $SOLR_HOME/myCore/conf/solrconfig.xml you add (change the cache
value to whatever appropriate):


   6000


Call this in a query:

http:///solr/myCore/select?q=id:&wt=xslt&tr=example.xsl


Chantal


On Fri, 2011-10-14 at 07:22 +0200, Jeremy Cunningham wrote:
> Thanks for the response but I have seen this page and I had a few
> questions.  
> 
> 1.  Since I am using tomcat, I had to move the example directory into the
> tomcat directory structure.  In the multicore, there is no example.xsl.
> Where do I 
> need to put it? Also, how do I send docs for indexing when running solr
> under tomcat?  
> 
> Thanks,
> Jeremy
> 
> On 10/13/11 3:46 PM, "Lance Norskog"  wrote:
> 
> >http://wiki.apache.org/solr/XsltResponseWriter
> >
> >This is for the single-core example. It is easiest to just go to
> >solr/example, run java -jar start.jar, and hit the URL in the above wiki
> >page. Then poke around in solr/example/solr/conf/xslt. There is no
> >solrconfig.xml change needed.
> >
> >It is generally easiest to use the solr/example 'java -jar start.jar'
> >example to test out features. It is easy to break configuration linkages.
> >
> >Lance
> >
> >On Thu, Oct 13, 2011 at 12:42 PM, Jeremy Cunningham <
> >jeremy.cunningham.h...@statefarm.com> wrote:
> >
> >> I am new to solr and not a web developer.  I am a data warehouse guy
> >>trying
> >> to use solr for the first time.  I am familiar with xsl but I can't
> >>figure
> >> out how to get the example.xsl to be applied to my xml results.  I am
> >> running tomcat and have solr working.  I copied over the solr mulitiple
> >>core
> >> example to the conf directory on my tomcat server. I also added the war
> >>file
> >> and the search is fine.  I can't seem to figure out what I need to add
> >>to
> >> the solrcofig.xml or where ever so that the example.xsl is used.
> >>Basically
> >> can someone tell me where to put the xsl and where to configure its
> >>usage?
> >>
> >> Thanks
> >>
> >
> >
> >
> >-- 
> >Lance Norskog
> >goks...@gmail.com
> 



Re: Multiple search analyzers on the same field type possible?

2011-10-14 Thread Victor
Hi Erick,

I work for a very big library and we store huge amounts of data. Indexing
some of our collections can take days and the index files can get very big.
We are a non-profit organisation, so we want to provide maximum service to
our customers but at the same time we are bound to a fixed budget and want
to keep costs as low as possible (including disk space). Our customers vary
from academic people that want to do very precise searches to common users
who want to seach in a more general way. The library now wants to implement
some form of stemming, but we have had one demo in the past with a stemmer
that returned results that did not please my internal customer (another
department).

So my wish list looks like this:

1) Implement stemming
2) Give the end user the possibility to turn stemming on or off for their
searches
3) Have maximum control over the stemmer without the need to reindex if we
change something there
4) Prevent the need for more storage (to keep the operations people happy)

So far I have been able to satisfy 1,2 and 3. I am using a synonyms list at
query time to apply my stemming. The synonym list I build as follows:

a) load a library (a text file with 1 word per line)
b) remove stop words from the list
c) link words that have the same stem

Bullet c) is a little bit more sophisticated, because I do not link words
that are already part of a pre-defined synonym list that contains
exceptions.

All this I do to keep maximum control over the behaviour of the stemmer.
Since this is a demo and it will be used to convince other people in my
organisation that stemming could be worth implementing, I need to be able to
adjust its behaviour quickly.

So far processing speed has not been an issue, but disk storage has.
Generally, at index time we remove as few tokens as possible and our objects
are complete books, news papers (from 1618 until 1995), etc . So you can
imagine that our indexes get very, very big.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-search-analyzers-on-the-same-field-type-possible-tp3417898p3420946.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Filter Question

2011-10-14 Thread Steven A Rowe
Hi Monica,

AFAIK there is nothing like the filter you've described, and I believe it would 
be generally useful.  Maybe it could be called StopTermTypesFilter?  (Plural on 
Types to signify that more than one type of term can be stopped by a single 
instance of the filter.)  

Such a filter should have an enablePositionIncrements option like StopFilter.

Steve

> -Original Message-
> From: Monica Skidmore [mailto:monica.skidm...@careerbuilder.com]
> Sent: Thursday, October 13, 2011 1:04 PM
> To: solr-user@lucene.apache.org; Otis Gospodnetic
> Subject: RE: Filter Question
> 
> Thanks, Otis - yes, this is different from the synonyms filter, which we
> also use.  For example, if you wanted all tokens that were marked 'lemma'
> to be removed, you could specify that, and all tokens with any type other
> than 'lemma' would still be returned.  You could also choose to remove
> all tokens of types 'lemma' and 'word' (although that would probably be a
> bad idea!), etc.  Normally, if you don't want a token type, you just
> don't include/run the filter that produces that type.  However, we have a
> third-party filter that produces multiple types, and this allows us to
> select a subset of those types.
> 
> I did see the HowToContribute wiki, but I'm relatively new to solr, and I
> wanted to see if this looked familiar to someone before I started down
> the contribution path.
> 
> Thanks again!
> 
>   -Monica
> 
> 
> -Original Message-
> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> Sent: Thursday, October 13, 2011 12:37 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Filter Question
> 
> Monica,
> 
> This is different from Solr's synonyms filter with different synonyms
> files, one for index-time and the other for query-time expansion (not
> sure when you'd want that, but it looks like you need this and like
> this), right?  If so, maybe you can describe what your filter does
> differently and then follow http://wiki.apache.org/solr/HowToContribute -
> thanks in advance! :)
> 
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene
> ecosystem search :: http://search-lucene.com/
> 
> 
> >
> >From: Monica Skidmore 
> >To: "solr-user@lucene.apache.org" 
> >Sent: Thursday, October 13, 2011 11:37 AM
> >Subject: Filter Question
> >
> >Our Solr implementation includes a third-party filter that adds
> additional, multiple term types to the token list (beyond "word",
> etc.).  Most of the time this is exactly what we want, but we felt we
> could improve our search results by having different tokens on the index
> and query side.  Since the filter in question was third-party and we
> didn't have access to source code, we wrote our own filter that will take
> out tokens based on their term attribute type.
> >
> >We didn't see another filter available that does this - did we overlook
> it?  And if not, is this something that would be of value if we
> contribute it back to the Solr community?
> >
> >Monica Skidmore
> >
> >
> >
> >