date:20090429

Re: how to reset the index in solr

2009-04-29 Thread Geetha


Thanks a lot Erik. I would try it and let me know


Erik Hatcher wrote:


On Apr 29, 2009, at 12:19 AM, Geetha wrote:
I need a function (through solr ruby) for ruby that will allow us to 
clear everything


require 'solr'
solr = Solr::Connection.new("http://localhost:8983/solr";)
solr.delete_by_query('*:*')
solr.commit

Erik





--

Best Regards,
** 
*Geetha  S *| System and Software Engineer

email: gee...@angleritech.com 

*
*


*
*Visit us at **Internet World, UK** (**28th-30th Apr 2009**)*
*Click here for FREE TICKETS:** 
*http://www.angleritech.com/company/latest-technology-news-events.html

*



*ANGLER Technologies India - Your Offshore Development Partner -- An ISO 
9001 Company*


Contact us for your high quality Software Outsourcing 
, 
E-Business Products  
and Design Solutions 


/* */

web :www.angleritech.com 
tel   :+91 422 2312707, 2313938
fax   :+91 422 2313936
address   :**1144 Trichy Road, Coimbatore, 641045, India

offices _ _ :  
 India | USA | UK | Canada | Europe | UAE | South Africa | Singapore | 
Hong Kong


* *

*Disclaimer: *The information in the email, files and communication are 
strictly confidential and meant for the intended recipients. It may 
contain proprietary information. If you are not an intended recipient; 
any form of disclosure, copyright, distribution and any other means of 
use of information is unauthorised and subject to legal implications. We 
do not accept any liability for the transmission of incomplete, delayed 
communication and recipients must check this email and any attachments 
for the presence of viruses before downloading them.

Re: Problem adding unicoded docs to Solr through SolrJ

2009-04-29 Thread ahmed baseet

Thanks a lot for your quick and detailed response.
I got the point. But as I've mentioned earlier I've  a string of
rawtext[default encoding] that needs to be encoded in utf-8, so I tried
something stupid but working though. I first converted the whole string to
byte array and then used that byte array to create a new utf-8 encoded sting
like this,

// Encode in Unicode UTF-8
byte [] utfEncodeByteArray = textOnly.getBytes();
String utfString = new String(utfEncodeByteArray,
Charset.forName("UTF-8"));

then passed the utfString to the function for posting to Solr and it works
prefectly.
But is there any intelligent way of doing all this, like straight from
default encoded string to utf-8 encoded string, without going via byte
array.
Thank you very much.

--Ahmed.



On Wed, Apr 29, 2009 at 6:45 PM, Michael Ludwig  wrote:

> ahmed baseet schrieb:
>
>  public void postToSolrUsingSolrj(String rawText, String pageId) {
>>
>
> doc.addField("features", rawText );
>>
>
>  In the above the param rawText is just the html stripped off of all
>> its tags, js, css etc and pageId is the Url for that page. When I'm
>> using this for English pages its working perfectly fine but the
>> problem comes up when I'm trying to index some non-english pages.
>>
>
> Maybe you're constructing a string without specifying the encoding, so
> Java uses your default platform encoding?
>
> String(byte[] bytes)
>  Constructs a new String by decoding the specified array of
>  bytes using the platform's default charset.
>
> String(byte[] bytes, Charset charset)
>  Constructs a new String by decoding the specified array of bytes using
>  the specified charset.
>
>  Now what I did is just extracted the raw text from that html page and
>> manually created an xml page like this
>>
>> 
>> 
>>  
>>UTF2TEST
>>Test with some UTF-8 encoded characters
>>*some tamil unicode text here*
>>   
>> 
>>
>> and posted this from command line using the post.jar file. Now searching
>> gives me the result but unlike last time browser shows the indexed text in
>> tamil itself and not the raw unicode.
>>
>
> Now that's perfect, isn't it?
>
>  I tried doing something like this also,
>>
>
>  // Encode in Unicode UTF-8
>>  utfEncodedText = new String(rawText.getBytes("UTF-8"));
>>
>> but even this didn't help eighter.
>>
>
> No encoding specified, so the default platform encoding is used, which
> is likely not what you want. Consider the following example:
>
> package milu;
> import java.nio.charset.Charset;
> public class StringAndCharset {
>  public static void main(String[] args) {
>byte[] bytes = { 'K', (byte) 195, (byte) 164, 's', 'e' };
>System.out.println(Charset.defaultCharset().displayName());
>System.out.println(new String(bytes));
>System.out.println(new String(bytes,  Charset.forName("UTF-8")));
>  }
> }
>
> Output:
>
> windows-1252
> KÃ¤se (bad)
> Käse (good)
>
> Michael Ludwig
>

Re: Term highlighting with MoreLikeThisHandler?

2009-04-29 Thread Walter Underwood

Think about this for a moment. When you use MoreLikeThis, the query
is a document. How do you highlight a document in another document?

wunder

On 4/29/09 9:21 PM, "Matt Weber"  wrote:

> Any luck on this?  I am experiencing the same issue.  Highlighting
> works fine on all other request handlers, but breaks when I use the
> MoreLikeThisHandler.
> 
> Thanks,
> 
> Matt Weber
> 
> 
> 
> 
> On Apr 28, 2009, at 5:29 AM, Eric Sabourin wrote:
> 
>> Yes... at least I think so.  the highlighting works correctly for me
>> on
>> another request handler... see below the request handler for my
>> morelikethishandler query.
>> Thanks for your help... Eric
>> 
>> 
>>  
>>
>> 
>> 
>> score,id,timestamp,type,textualId,subject,url,server
>>
>> 
>> explicit
>> true
>> list
>>   > name 
>> = 
>> "mlt 
>> .fl">subject,requirements,productName,justification,operation_exact> str>
>>   2
>>   1
>>   2
>> 
>>true
>>1
>>
>>0
>>0
>>regex 
>>regex
>>
>>  
>> 
>> 
>> On Mon, Apr 27, 2009 at 11:30 PM, Otis Gospodnetic <
>> otis_gospodne...@yahoo.com> wrote:
>> 
>>> 
>>> Eric,
>>> 
>>> Have you tried using MLT with parameters described on
>>> http://wiki.apache.org/solr/HighlightingParameters ?
>>> 
>>> 
>>> Otis
>>> --
>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>> 
>>> 
>>> 
>>> - Original Message 
 From: Eric Sabourin 
 To: solr-user@lucene.apache.org
 Sent: Monday, April 27, 2009 10:31:38 AM
 Subject: Term highlighting with MoreLikeThisHandler?
 
 I submit a query to the MoreLikeThisHandler to find documents
 similar to
>>> a
 specified document.  This works and I've configured my request
 handler to
 also return the interesting terms.
 
 Is it possible to have MLT return to me highlight snippets in the
 similar
 documents it returns? I mean generate hl snippets of the interesting
>>> terms?
 If so how?
 
 Thanks... Eric
>>> 
>>> 
>> 
>> 
>> -- 
>> Eric
>> Sent from Halifax, NS, Canada
>

Re: Term highlighting with MoreLikeThisHandler?

2009-04-29 Thread Matt Weber

Any luck on this?  I am experiencing the same issue.  Highlighting  
works fine on all other request handlers, but breaks when I use the  
MoreLikeThisHandler.


Thanks,

Matt Weber




On Apr 28, 2009, at 5:29 AM, Eric Sabourin wrote:

Yes... at least I think so.  the highlighting works correctly for me  
on

another request handler... see below the request handler for my
morelikethishandler query.
Thanks for your help... Eric


 
   


score,id,timestamp,type,textualId,subject,url,server
   

explicit
true
list
  name 
= 
"mlt 
.fl">subject,requirements,productName,justification,operation_exactstr>

  2
  1
  2

   true
   1
   
   0
   0
   regex 
   regex
   
 


On Mon, Apr 27, 2009 at 11:30 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:



Eric,

Have you tried using MLT with parameters described on
http://wiki.apache.org/solr/HighlightingParameters ?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Eric Sabourin 
To: solr-user@lucene.apache.org
Sent: Monday, April 27, 2009 10:31:38 AM
Subject: Term highlighting with MoreLikeThisHandler?

I submit a query to the MoreLikeThisHandler to find documents  
similar to

a
specified document.  This works and I've configured my request  
handler to

also return the interesting terms.

Is it possible to have MLT return to me highlight snippets in the  
similar

documents it returns? I mean generate hl snippets of the interesting

terms?

If so how?

Thanks... Eric






--
Eric
Sent from Halifax, NS, Canada

Re: Unable to import data from database

2009-04-29 Thread Noble Paul നോബിള്‍ नोब्ळ्

I guess this can go in the FAQ section of DIH

On Wed, Apr 29, 2009 at 9:47 PM, Erick Erickson  wrote:
> Thanks for letting us all know the resolution, that may save some other
> poor soul from frustration
>
> Best
> Erick
>
> On Wed, Apr 29, 2009 at 9:31 AM, Ci-man  wrote:
>
>>
>> Found the problem.
>> It is with Microsoft jdbc drivers (jdbc 2.0).
>>
>> With the latest download Microsoft provides two .jar files:
>> sqljdbc.jar
>> sqljdbc4.jar
>>
>> I had copied both into the lib directory. By doing so it used the older
>> drivers (sqljdbc.jar) which do not work with jvm1.6. You get this kind of
>> cryptic message in debug trace:
>>
>> The JDBC Driver version 2.0 does not support JRE 1.4. You must upgrade JRE
>> 1.4 to JRE 5.0 or later when using the JDBC Driver version 2.0. In some
>> cases, you might need to recompile your application because it might not be
>> compatible with JDK 5.0 or later. For more information, see the
>> documentation on Sun Microsystems Web site
>>
>> No further help from MS or boards. I experimented and removed the
>> sqljdbc.jar file from lib directory so that only the sqljdbc4.jar is
>> available. And bingo. Everything is working like a charm.
>>
>> Thanks everyone,
>> -Ci
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Unable-to-import-data-from-database-tp23283852p23295866.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>



-- 
--Noble Paul

understanding facets and tokens

2009-04-29 Thread Simon Stanlake

Hi,
Trying to debug a faceting performance problem. I've pretty much given up but 
was hoping someone could shed some light on my problems.

My index has 80 million documents, all of which are small - one 1000 char text 
field and a bunch of 30-50 char fields. Got 24G ram allocated to the jvm on a 
brand new server.

I have one field in my schema which represents a city name. It is a non 
standardized free text field, so you have problems like the following

HOUSTON
HOUSTON TX
HOUSTON, TX
HOUSTON (TX)

I would like to facet on this field and thought I could apply some tokenizers / 
filters to modify the indexed value to strip out stopwords. To tie it all 
together I created a filter that would concatenate all of the tokens back into 
a single token at the end. Here's my field definition from schema.xml
















The analysis seems to be working as I expected and the index contains the 
values I want. However when I facet on this field the query returns in 
typically around 30s, versus sub-second when I just use a solr.StrField. I 
understand from the lists that the method that solr uses to create the facet 
counts is different depending on whether the field is tokenized vs not 
tokenized, but I thought I could mitigate that somewhat by making sure that 
each field only had one token.

Is there anything else I can do here? Can someone shed some light on why a 
tokenized field takes longer, even if there is only one token per field? I 
suspect I am going to be stuck with implementing custom field translation 
before loading but was hoping I could leverage some of the great filters that 
are built in with solr / lucene. I've played around with fieldcache but so far 
no luck.

BTW love solr / lucene, great job!

Thanks,
Simon

RE: Multiple Queries

2009-04-29 Thread Ankush Goyal

Hey Guys,

Have a novice type question, regarding how to create a query by ORing multiple 
terms.

Currently, the query we are creating is a boosting query using following code:

BoostingQuery boosQuery = new 
BoostingQuery(getHotelIdFilterQuery(hotelIdStr),baseQuery,2.0f);

Wherein, getHotelIdFilterQuery() takes a hotelId and creates a TermQuery 
like--> "hotId:3453"

Then it is combined with the baseQuery in boosQuery to get a query like-->

[hotel.id_t:3453/(+(rev.headline:lakefront^2.0 | 
rev.comments:lakefront^2.0)~0.01 ())^0.0]

Now, I wanted to create a boosQuery with multiple hotelIds ORed with each other 
like-->

[hotel.id_t:342432/hotel.id_t:3453/(+(rev.headline:lakefront^2.0 | 
rev.comments:lakefront^2.0)~0.01 ())^0.0]

So, how do I pass these multiple termQueries ORed with each other to make 
BoostQuery?

Thanks!

-Ankush Goyal

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Tuesday, April 28, 2009 4:05 PM
To: solr-user@lucene.apache.org
Subject: Re: Multiple Queries

Have you considered indexing the reviews along with the hotels right
in the hotel index? That way you would fetch the reviews right along with
the hotels...

Really, this is another way of saying "flatten your data" ...

Your idea of holding all the hotel reviews in memory is also viable,
depending upon
how many there are. you'd pay some startup costs, but that's what caching is
all
about.

Given your current index structure, have you tried collecting the hotel IDs,
and
submitting a query to your review index that just ORs together all the IDs
and
then parsing that rather than calling your review index for one hotel ID at
a time?

Best
Erick

On Tue, Apr 28, 2009 at 4:32 PM, Ankush Goyal wrote:

> Hi,
>
> I have been trying to solve a performance issue: I have an index of hotels
> with their ids and another index of reviews. Now, when someone queries for a
> location, the current process gets all the hotels for that location.
> And, then corresponding to each hotel-id from all the hotel documents, it
> calls the review index to fetch reviews associated with that particular
> hotel and so on it repeats for all the hotels. This process slows down the
> request significantly.
> I need to accumulate reviews according to corresponding hotel-ids, so I
> can't just fetch all the reviews for all the hotel ids and show them. Now, I
> was thinking about fetching all the reviews for all the hotel-ids and then
> parse all those reviews in one go and create a map with hotel-id as key and
> list of reviews as values.
>
> Can anyone comment on whether this procedure would be better or worse, or
> if there's better way of doing this?
>
> --Ankush Goyal
>

Re: Performance and number of search results

2009-04-29 Thread Walter Underwood

Some part of the server-side work is linear in the number of hits.
It has to look up field values for each one of those hits, and that
is linear.

At some level, you've got one lookup for each term in the query and
one lookup for each hit. If you have a handful of terms and
a 1000 hits, the time is probably dominated by the number of hits.

I agree with the advice "get what you need".

wunder

On 4/29/09 5:30 AM, "Michael Ludwig"  wrote:

> Wouter Samaey schrieb:
> 
>> Can someone please comment on the performance impact of the number of
>> search results?
>> Is there a big difference between querying for 1 result, 10, 20 or
>> even 100 ?
> 
> Probably not, but YMMV, as the question is very general.
> 
> Consider that for fast queries the HTTP round trip may well be the
> determining factor. Or XML parsing. If you've stored a lot of data in
> Solr and request all of it to be returned, the difference between 1 and
> 100 results may be the difference between 1 and 100 KB payload.
> 
> If you think it matters, the best thing for you would be to do some
> profiling for your specific scenario.
> 
> The rule of thumb here is probably: Get what you need.
> 
> Michael Ludwig

Facet counts for common terms of the searched field

2009-04-29 Thread Raju444us


I have a requirement. If I search for text field let's say "metal:glass" what
i want is to get the facet counts for all the terms related to "glass" in my
search results.

window(100)  since a window can be glass.
plastic(10)  plastic is a material just like glass
Iron(10)
Paper(15)

Can I use MLT to get this functionality.Please let me know how can I achieve
this.If possible an example query.

Thanks,
Raju
-- 
View this message in context: 
http://www.nabble.com/Facet-counts-for-common-terms-of-the-searched-field-tp23302410p23302410.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Date faceting - howto improve performance

2009-04-29 Thread Shalin Shekhar Mangar

Some basic documentation is in the example schema.xml. Ask away if you have
specific questions.

On Thu, Apr 30, 2009 at 1:00 AM, Marcus Herou wrote:

> Aha!
>
> Hmm , googling wont help me I see. any hints of usages ?
>
> /M
>
>
> On Tue, Apr 28, 2009 at 12:29 AM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
> > Sorry, I'm late in this thread.
> >
> > Did you try using Trie fields (new in 1.4)? The regular date faceting
> won't
> > work out-of-the-box for trie fields I think. But you could use
> facet.query
> > to achieve the same effect. On my simple benchmarks I've found trie
> fields
> > to give a huge improvement in range searches.
> >
> > On Sat, Apr 25, 2009 at 4:24 PM, Marcus Herou <
> marcus.he...@tailsweep.com
> > >wrote:
> >
> > > Hi.
> > >
> > > One of our faceting use-cases:
> > > We are creating trend graphs of how many blog posts that contains a
> > certain
> > > term and groups it by day/week/year etc. with the nice DateMathParser
> > > functions.
> > >
> > > The performance degrades really fast and consumes a lot of memory which
> > > forces OOM from time to time
> > > We think it is due the fact that the cardinality of the field
> > publishedDate
> > > in our index is huge, almost equal to the nr of documents in the index.
> > >
> > > We need to address that...
> > >
> > > Some questions:
> > >
> > > 1. Can a datefield have other date-formats than the default of
> -MM-dd
> > > HH:mm:ssZ ?
> > >
> > > 2. We are thinking of adding a field to the index which have the format
> > > -MM-dd to reduce the cardinality, if that field can't be a date, it
> > > could perhaps be a string, but the question then is if faceting can be
> > used
> > > ?
> > >
> > > 3. Since we now already have such a huge index, is there a way to add a
> > > field afterwards and apply it to all documents without actually
> > reindexing
> > > the whole shebang ?
> > >
> > > 4. If the field cannot be a string can we just leave out the
> > > hour/minute/second information and to reduce the cardinality and
> improve
> > > performance ? Example: 2009-01-01 00:00:00Z
> > >
> > > 5. I am afraid that we need to reindex everything to get this to work
> > > (negates Q3). We have 8 shards as of current, what would the most
> > efficient
> > > way be to reindexing the whole shebang ? Dump the entire database to
> disk
> > > (sigh), create many xml file splits and use curl in a
> > > random/hash(numServers) manner on them ?
> > >
> > >
> > > Kindly
> > >
> > > //Marcus
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Marcus Herou CTO and co-founder Tailsweep AB
> > > +46702561312
> > > marcus.he...@tailsweep.com
> > > http://www.tailsweep.com/
> > > http://blogg.tailsweep.com/
> > >
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>
>
>
> --
> Marcus Herou CTO and co-founder Tailsweep AB
> +46702561312
> marcus.he...@tailsweep.com
> http://www.tailsweep.com/
> http://blogg.tailsweep.com/
>



-- 
Regards,
Shalin Shekhar Mangar.

Distributed Search - only get ids

2009-04-29 Thread Joe Pollard

Solr 1.3: If I am only getting back the document ids from a distributed search 
(e.g., uniqueid is 'id' and the fl parameter only contains 'id'), there seems 
to be some room for optimization in the current code path:


1)  On each shard, grab top N sorted document ids & sort fields)

2)  Merge these into one list of N sorted id fields.

3)  Query each shard for the details of these documents (by id), getting 
back a field list of id only.
It seems to me that step 3 is overhead that can be skipped.

Any thoughts on this/known patches?

Thanks,
-Joe

Re: Date faceting - howto improve performance

2009-04-29 Thread Marcus Herou

Aha!

Hmm , googling wont help me I see. any hints of usages ?

/M


On Tue, Apr 28, 2009 at 12:29 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Sorry, I'm late in this thread.
>
> Did you try using Trie fields (new in 1.4)? The regular date faceting won't
> work out-of-the-box for trie fields I think. But you could use facet.query
> to achieve the same effect. On my simple benchmarks I've found trie fields
> to give a huge improvement in range searches.
>
> On Sat, Apr 25, 2009 at 4:24 PM, Marcus Herou  >wrote:
>
> > Hi.
> >
> > One of our faceting use-cases:
> > We are creating trend graphs of how many blog posts that contains a
> certain
> > term and groups it by day/week/year etc. with the nice DateMathParser
> > functions.
> >
> > The performance degrades really fast and consumes a lot of memory which
> > forces OOM from time to time
> > We think it is due the fact that the cardinality of the field
> publishedDate
> > in our index is huge, almost equal to the nr of documents in the index.
> >
> > We need to address that...
> >
> > Some questions:
> >
> > 1. Can a datefield have other date-formats than the default of -MM-dd
> > HH:mm:ssZ ?
> >
> > 2. We are thinking of adding a field to the index which have the format
> > -MM-dd to reduce the cardinality, if that field can't be a date, it
> > could perhaps be a string, but the question then is if faceting can be
> used
> > ?
> >
> > 3. Since we now already have such a huge index, is there a way to add a
> > field afterwards and apply it to all documents without actually
> reindexing
> > the whole shebang ?
> >
> > 4. If the field cannot be a string can we just leave out the
> > hour/minute/second information and to reduce the cardinality and improve
> > performance ? Example: 2009-01-01 00:00:00Z
> >
> > 5. I am afraid that we need to reindex everything to get this to work
> > (negates Q3). We have 8 shards as of current, what would the most
> efficient
> > way be to reindexing the whole shebang ? Dump the entire database to disk
> > (sigh), create many xml file splits and use curl in a
> > random/hash(numServers) manner on them ?
> >
> >
> > Kindly
> >
> > //Marcus
> >
> >
> >
> >
> >
> >
> >
> > --
> > Marcus Herou CTO and co-founder Tailsweep AB
> > +46702561312
> > marcus.he...@tailsweep.com
> > http://www.tailsweep.com/
> > http://blogg.tailsweep.com/
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/

Re: spellcheck.collate causes StringIndexOutOfBoundsException during startup.

2009-04-29 Thread Eric Sabourin

Koji - I've remove them from my solrconfig.xml and that solved the problem.
Thanks for your help!

On Tue, Apr 28, 2009 at 12:25 PM, Koji Sekiguchi  wrote:

> I see you are using firstSearcher/newSearcher event listener on your
> startup and cause the problem.
> If you don't need them, commented out them in solrconfig.xml.
>
> Koji
>
>
>
> Eric Sabourin wrote:
>
>> I’m using SOLR 1.3.0 (from download, not a  nightly build)
>>
>> apache-tomcat-5.5.27 on Windows  XP.
>>
>>
>>
>> When I add true to my requestHandler
>> in
>> my solrconfig.xml, I get the StringIndexOutOfBoundsException stacktrace
>> below on startup. Removing the element, or setting it to false, causes the
>> exception to no longer occur on startup.
>>
>>
>>
>> Any help is appreciated. Let me know if additional information is
>> required.
>>
>>
>>
>> Eric
>>
>>
>>
>> The exception (from logs):
>>
>> Apr 24, 2009 12:17:53 PM org.apache.solr.servlet.SolrUpdateServlet init
>>
>> INFO: SolrUpdateServlet.init() done
>>
>> Apr 24, 2009 12:17:53 PM org.apache.solr.common.SolrException log
>>
>> SEVERE: java.lang.StringIndexOutOfBoundsException: String index out of
>> range: -5
>>
>>   at
>> java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:800)
>>
>>   at java.lang.StringBuilder.replace(StringBuilder.java:272)
>>
>>   at
>>
>> org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:232)
>>
>>   at
>>
>> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:149)
>>
>>   at
>>
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169)
>>
>>   at
>>
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>
>>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
>>
>>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1228)
>>
>>   at
>>
>> org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:50)
>>
>>   at org.apache.solr.core.SolrCore$4.call(SolrCore.java:1034)
>>
>>   at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
>>
>>   at java.util.concurrent.FutureTask.run(FutureTask.java:123)
>>
>>   at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
>>
>>   at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
>>
>>   at java.lang.Thread.run(Thread.java:595)
>>
>>
>>
>> Apr 24, 2009 12:17:53 PM org.apache.solr.core.SolrCore execute
>>
>>
>>
>> Having the following does not cause the exception:
>>
>>   true
>>
>>   false
>>
>>   
>>
>>   false
>>
>>   
>>
>>   1
>>
>>   default
>>
>>   
>>
>>   
>>
>>
>>
>> With the following the exception occurs on startup.
>>
>>   true
>>
>>   false
>>
>>   
>>
>>   false
>>
>>   
>>
>>   1
>>
>>   default
>>
>>   
>>
>>   true
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>


-- 
Eric

Re: limit on query size?

2009-04-29 Thread Shalin Shekhar Mangar

On Wed, Apr 29, 2009 at 10:42 PM, Harsch, Timothy J. (ARC-SC)[PEROT SYSTEMS]
 wrote:

> Is there a limit on the size ( in bytes ) of a query you send to Solr?
>
> Either through HTTP URL request or through SolrJ?
>

The limit is whatever you have configured (or the default) in your servlet
container for GET (default is 2KB I think) and POST requests. These can be
changed in your container's configuration. In Solrj, GET is the default
method for querying and POST is the default method for indexing. If you are
exceeding the GET limit, you can use POST for queries by passing your own
HttpClient.

The only other limit is if you have enabled remote streaming, in which case
it can be configured in solrconfig.xml in the  section.

>
> What is the behavior if a limit is reached?
>

The limits are enforced by the servlet container so the behavior is specific
to them.

-- 
Regards,
Shalin Shekhar Mangar.

limit on query size?

2009-04-29 Thread Harsch, Timothy J. (ARC-SC)[PEROT SYSTEMS]

Is there a limit on the size ( in bytes ) of a query you send to Solr?

Either through HTTP URL request or through SolrJ?

What is the behavior if a limit is reached?

Highlighting using XML instead of strings?

2009-04-29 Thread Michael Ludwig


http://wiki.apache.org/solr/HighlightingParameters

I can specify the strings to highlight matched text with using
"hl.simple.pre" and "hl.simple.post", for example  and .
The result looks like this:

  Eumel NDR Ländermagazine

However, what if as the result of favouring XML over strings,
I rather want something like this:

  Eumel NDR Ländermagazine

There could be a parameter "hl.xml" which I could use to request
modified XML like this:

  hl.xlm=em
  hl.xlm=b

This would allow smoother processing technologies like XSLT.
Is such a feature available?

Michael Ludwig

Re: Unable to import data from database

2009-04-29 Thread Erick Erickson

Thanks for letting us all know the resolution, that may save some other
poor soul from frustration

Best
Erick

On Wed, Apr 29, 2009 at 9:31 AM, Ci-man  wrote:

>
> Found the problem.
> It is with Microsoft jdbc drivers (jdbc 2.0).
>
> With the latest download Microsoft provides two .jar files:
> sqljdbc.jar
> sqljdbc4.jar
>
> I had copied both into the lib directory. By doing so it used the older
> drivers (sqljdbc.jar) which do not work with jvm1.6. You get this kind of
> cryptic message in debug trace:
>
> The JDBC Driver version 2.0 does not support JRE 1.4. You must upgrade JRE
> 1.4 to JRE 5.0 or later when using the JDBC Driver version 2.0. In some
> cases, you might need to recompile your application because it might not be
> compatible with JDK 5.0 or later. For more information, see the
> documentation on Sun Microsystems Web site
>
> No further help from MS or boards. I experimented and removed the
> sqljdbc.jar file from lib directory so that only the sqljdbc4.jar is
> available. And bingo. Everything is working like a charm.
>
> Thanks everyone,
> -Ci
>
> --
> View this message in context:
> http://www.nabble.com/Unable-to-import-data-from-database-tp23283852p23295866.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Authenticated Indexing Not working

2009-04-29 Thread Allahbaksh Asadullah

Hi,
I followed the procedure given at
http://blog.comtaste.com/2009/02/securing_your_solr_server_on_t.html
Regards,
Allahbaksh

On 4/28/09, Shalin Shekhar Mangar  wrote:
> On Sun, Apr 26, 2009 at 11:04 AM, Allahbaksh Asadullah <
> allahbaks...@gmail.com> wrote:
>
>> HI Otis,
>> I am using HTTPClient for authentication. When I use the server with
>> Authentication for searching it works fine. But when I use it for
>> indexing it throws error.
>>
>
> What is the error? Is it thrown by Solr or your servlet container?
>
> One difference between a search request and update request with Solrj is
> that a search request uses HTTP GET by default but an update request uses
> an
> HTTP POST by default. Perhaps your authentication scheme is not configured
> correctly for POST requests?
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


-- 
Allahbaksh Mohammedali Asadullah,
Software Engineering & Technology Labs,
Infosys Technolgies Limited, Electronic City,
Hosur Road, Bangalore 560 100, India.
(Board: 91-80-28520261 | Extn: 73927 | Direct: 41173927.
Fax: 91-80-28520362 | Mobile: 91-9845505322.

Re: stress tests to DIH and deduplication patch

2009-04-29 Thread Shalin Shekhar Mangar

On Wed, Apr 29, 2009 at 7:44 PM, Marc Sturlese wrote:

>
> Hey there, I am doing some stress tests indexing with DIH.
> I am indexing a mysql DB with 140 rows aprox. I am using also the
> DeDuplication patch.
> I am using tomcat with JVM limit of -Xms2000M -Xmx2000M
> I have indexed 3 times using full-import command without restarting tomcat
> or reloading the core between the indexations.
> I have used jmap and jhat to map heap memory in some moments of the
> indexations.
> Here I show the beginig of the maps (I don't show the lower part of the
> stack because object instance numbers are completely stable in there).
> I have noticed that the number of Term, TermInfo and TermQuery grows
> between
> an indexation and another... is that normal?
>
>
Perhaps you should enable GC logging as well. Also, did you actually run out
of memory or you are interpolating and assuming that it might happen?

-- 
Regards,
Shalin Shekhar Mangar.

Re: function query scoring

2009-04-29 Thread Shalin Shekhar Mangar

On Wed, Apr 29, 2009 at 9:15 PM, Umar Shah  wrote:

> can anyone explan the behavior of function query if there are other
> terms in the query
>
> it seems the value of the function query and actual match score are
> interfering in some manner. what should be expected?
>
>
Yes, if you include a term in the q field, it is scored according to the
tf-idf based lucene scoring. The example in the wiki is incorrect I guess,
it should have used a filter query.

You can see how the query is parsed using debugQuery=on. My guess is that
q=field:value _val_:5 is being evaluated as an or between the term query and
the function query. That is why documents matching the term query are ranked
higher due to the tf-idf score.

-- 
Regards,
Shalin Shekhar Mangar.

Re: function query scoring

2009-04-29 Thread Umar Shah

On Wed, Apr 29, 2009 at 8:34 PM, Andrey Klochkov
 wrote:
> On Wed, Apr 29, 2009 at 6:44 PM, Umar Shah  wrote:
>
>> On Wed, Apr 29, 2009 at 7:16 PM, Andrey Klochkov
>>  wrote:
>> > Hi!
>> >
>> > Base on docs in the wiki I thought that the following query should return
>> > constant score "5" for all "socks" in the index:
>> >
>> > http://localhost:8080/solr/select?q=name:socks _val_:5&fl=name,score
>>
>> the intended query should looklike
>>
>> http://localhost:8080/solr/select?q=_val_:5&fl=name,score&fq=name:socks
>>
>> return all docs with name containing socks with constant score of 5
>>
>
> Well, thanks for the tip.
> But then the examples in wiki are incorrect, look at the bottom of this page
> (General Example):
>
> http://wiki.apache.org/solr/FunctionQuery#head-af37201ea1d04df780e5044ef560b8558998ee24
>

yes so it seems,

can anyone explan the behavior of function query if there are other
terms in the query

it seems the value of the function query and actual match score are
interfering in some manner. what should be expected?


> --
> Andrew Klochkov
>



-- 
Umar Shah
Research Engineer
Wisdomtap Solutions India (P) Ltd.
---
Get WisdomRank for your products @  http://services.wisdomtap.com

View opinions and recommendations about:

Point and Shoot Cameras,  DSLRs Lenses at http://www.wisdomtap.com/camera/

Mobile Phones at http://www.wisdomtap.com/mobile

Access above services from your mobile devices from http://m.wisdomtap.com
--

Re: function query scoring

2009-04-29 Thread Andrey Klochkov

On Wed, Apr 29, 2009 at 6:44 PM, Umar Shah  wrote:

> On Wed, Apr 29, 2009 at 7:16 PM, Andrey Klochkov
>  wrote:
> > Hi!
> >
> > Base on docs in the wiki I thought that the following query should return
> > constant score "5" for all "socks" in the index:
> >
> > http://localhost:8080/solr/select?q=name:socks _val_:5&fl=name,score
>
> the intended query should looklike
>
> http://localhost:8080/solr/select?q=_val_:5&fl=name,score&fq=name:socks
>
> return all docs with name containing socks with constant score of 5
>

Well, thanks for the tip.
But then the examples in wiki are incorrect, look at the bottom of this page
(General Example):

http://wiki.apache.org/solr/FunctionQuery#head-af37201ea1d04df780e5044ef560b8558998ee24

-- 
Andrew Klochkov

Re: function query scoring

2009-04-29 Thread Umar Shah

On Wed, Apr 29, 2009 at 7:16 PM, Andrey Klochkov
 wrote:
> Hi!
>
> Base on docs in the wiki I thought that the following query should return
> constant score "5" for all "socks" in the index:
>
> http://localhost:8080/solr/select?q=name:socks _val_:5&fl=name,score

the intended query should looklike

http://localhost:8080/solr/select?q=_val_:5&fl=name,score&fq=name:socks

return all docs with name containing socks with constant score of 5

>
> But in fact it finds all the products in the index and it seems that
> "socks" products have higher score than others. What I need and what
> function query seems have to do is to find ONLY socks and assign constant
> score to all of them. Isn't it correct? I took  function query with constant
> value just as an example.
>
> --
> Andrew Klochkov
>



-- 
Umar Shah
Research Engineer
Wisdomtap Solutions India (P) Ltd.
---
Get WisdomRank for your products @  http://services.wisdomtap.com

View opinions and recommendations about:

Point and Shoot Cameras,  DSLRs Lenses at http://www.wisdomtap.com/camera/

Mobile Phones at http://www.wisdomtap.com/mobile

Access above services from your mobile devices from http://m.wisdomtap.com
--

Re: ExtractingRequestHandler and SolrRequestHandler issue

2009-04-29 Thread francisco treacy

Well, problem seems to be with

> java -Dsolr.solr.home="/my/path/to/solr" -jar start.jar

Everything runs fine if I copy my xmls to the original conf directory
of the example (example/solr/conf) and I execute like

> java -jar start.jar

Some wrong path to libs somewhere - who knows. Couldn't find related
info on the wiki, so for now I will go about just creating a soft link
to the examples conf dir.

Francisco


2009/4/27 francisco treacy :
> Thanks for your answers. Still no success.
>
>>> These need to be in your Solr home lib, not example/lib.  I sometimes get
>>> confused on this one, too, forgetting that I need to go down a few more
>>> directories.  The example/lib directory is where the Jetty stuff lives,
>>> example/solr/lib is the lib where the plugins go.
>
> Well, actually I need libs in example, cause I'm launching it like so:
>
> java -Dsolr.solr.home="/my/path/to/solr" -jar start.jar
>
> Anyway, I tried copying libraries to solr home lib, this didn't help.
> I keep getting the aforementioned ClassCastException.
>
>>> In fact, if you run "ant
>>> example" from the top level (or contrib/extraction) it should place the JARs
>>> in the right places for the example.
>
> Also, if I try to compile "ant example" it fails with some other
> exception (some mozilla js class not found). I will try some
> workaround here.
>
>  in solr.xml didn't help either.
>
> Should I be using the Jetty provided for the example while I'm in
> development? It has worked great so far, but I'm stuck with
> extraction. Will let you know, but please if any other ideas ping me a
> message.
>
> Thanks
>
> Francisco
>
>
>
> 2009/4/22 Peter Wolanin :
>> I had problems with this when trying to set this up with multiple
>> cores - I had to set the shared lib as:
>>
>> 
>>
>> in example/solr/solr.xml in order for it to find the jars in example/solr/lib
>>
>> -Peter
>>
>> On Wed, Apr 22, 2009 at 11:43 AM, Grant Ingersoll  
>> wrote:
>>>
>>> On Apr 20, 2009, at 12:46 PM, francisco treacy wrote:
>>>
 Additionally, here's what I've got in example/lib:
>>>
>>> These need to be in your Solr home lib, not example/lib.  I sometimes get
>>> confused on this one, too, forgetting that I need to go down a few more
>>> directories.  The example/lib directory is where the Jetty stuff lives,
>>> example/solr/lib is the lib where the plugins go.  In fact, if you run "ant
>>> example" from the top level (or contrib/extraction) it should place the JARs
>>> in the right places for the example.
>>>


 apache-solr-cell-nightly.jar   bcmail-jdk14-132.jar
 commons-lang-2.1.jar       icu4j-3.8.jar         log4j-1.2.14.jar
 poi-3.5-beta5.jar             slf4j-api-1.5.5.jar
 xml-apis-1.0.b2.jar
 apache-solr-core-nightly.jar   bcprov-jdk14-132.jar
 commons-logging-1.0.4.jar  jetty-6.1.3.jar       nekohtml-1.9.9.jar
 poi-ooxml-3.5-beta5.jar       slf4j-jdk14-1.5.5.jar
 xmlbeans-2.3.0.jar
 apache-solr-solrj-nightly.jar  commons-codec-1.3.jar  dom4j-1.6.1.jar
         jetty-util-6.1.3.jar  ooxml-schemas-1.0.jar
 poi-scratchpad-3.5-beta5.jar  tika-0.3.jar
 asm-3.1.jar                    commons-io-1.4.jar
 fontbox-0.1.0-dev.jar      jsp-2.1               pdfbox-0.7.3.jar
 servlet-api-2.5-6.1.3.jar     xercesImpl-2.8.1.jar

 Actually I wasn't very accurate. Following the wiki didn't suffice. I
 had to add other jars, in order to avoid ClassNotFoundExceptions at
 startup. These are

 apache-solr-core-nightly.jar
 apache-solr-solrj-nightly.jar
 slf4j-api-1.5.5.jar
 slf4j-jdk14-1.5.5.jar

 even while using solr nightly war (in example/webapps).

 Perhaps something wrong with jar versions?

 Francisco


 2009/4/20 francisco treacy :
>
> Hi Grant,
>
> Here is the full stacktrace:
>
> 20-Apr-2009 12:36:39 org.apache.solr.common.SolrException log
> SEVERE: java.lang.ClassCastException:
> org.apache.solr.handler.extraction.ExtractingRequestHandler cannot be
> cast to org.apache.solr.request.SolrRequestHandler
>       at
> org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:154)
>       at
> org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:163)
>       at
> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141)
>       at
> org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:171)
>       at org.apache.solr.core.SolrCore.(SolrCore.java:535)
>       at
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:122)
>       at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
>       at
> org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
>       at
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>       at
> org.mortbay.jetty.servlet.ServletH

stress tests to DIH and deduplication patch

2009-04-29 Thread Marc Sturlese


Hey there, I am doing some stress tests indexing with DIH.
I am indexing a mysql DB with 140 rows aprox. I am using also the
DeDuplication patch.
I am using tomcat with JVM limit of -Xms2000M -Xmx2000M
I have indexed 3 times using full-import command without restarting tomcat
or reloading the core between the indexations.
I have used jmap and jhat to map heap memory in some moments of the
indexations.
Here I show the beginig of the maps (I don't show the lower part of the
stack because object instance numbers are completely stable in there).
I have noticed that the number of Term, TermInfo and TermQuery grows between
an indexation and another... is that normal?



FIRST TIME I INDEX... WITH A MILION INDEXED DOCS APROX... HERE INDEXING
PROCESS IS STILL RUNNING
268290 instances of class org.apache.lucene.index.Term
215943 instances of class org.apache.lucene.index.TermInfo
129649 instances of class
org.apache.lucene.index.FreqProxTermsWriter$PostingList
51537 instances of class org.apache.lucene.search.TermQuery
25457 instances of class org.apache.lucene.index.BufferedDeletes$Num
23233 instances of class com.sun.tools.javac.zip.ZipFileIndexEntry
1569 instances of class com.sun.tools.javac.zip.ZipFileIndex$DirectoryEntry
1120 instances of class org.apache.lucene.index.FieldInfo
919 instances of class org.apache.catalina.loader.ResourceEntry 


FIRST TIME I INDEX, COMPLETED (1.4 MILION DOCS INDEXED)
552522 instances of class org.apache.lucene.index.Term
505835 instances of class org.apache.lucene.index.TermInfo
128937 instances of class
org.apache.lucene.index.FreqProxTermsWriter$PostingList
48645 instances of class org.apache.lucene.search.TermQuery
24065 instances of class org.apache.lucene.index.BufferedDeletes$Num
23233 instances of class com.sun.tools.javac.zip.ZipFileIndexEntry
1569 instances of class com.sun.tools.javac.zip.ZipFileIndex$DirectoryEntry
1470 instances of class org.apache.lucene.index.FieldInfo
923 instances of class org.apache.catalina.loader.ResourceEntry
858 instances of class com.sun.tools.javac.util.List 


SECOND TIME I INDEX WITH 50 INDEXED DOCS... HERE INDEX PROCESS IS STILL
RUNNING 
264617 instances of class
org.apache.lucene.index.FreqProxTermsWriter$PostingList
262496 instances of class org.apache.lucene.index.Term
116078 instances of class org.apache.lucene.index.TermInfo
53383 instances of class org.apache.lucene.search.TermQuery
42274 instances of class
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput
30230 instances of class org.apache.lucene.search.TermQuery$TermWeight
26044 instances of class org.apache.lucene.index.BufferedDeletes$Num
23233 instances of class com.sun.tools.javac.zip.ZipFileIndexEntry
15115 instances of class org.apache.lucene.search.BooleanScorer2$Coordinator
15115 instances of class org.apache.lucene.search.ReqExclScorer
7325 instances of class org.apache.lucene.search.ConjunctionScorer$1
1569 instances of class com.sun.tools.javac.zip.ZipFileIndex$DirectoryEntry
1279 instances of class org.apache.lucene.index.FieldInfo
923 instances of class org.apache.catalina.loader.ResourceEntry 


SECOND TIME I INDEX WITH 120 INDEXED DOCS... HERE INDEX PROCESS IS STILL
RUNNING 
574603 instances of class org.apache.lucene.index.Term
423558 instances of class org.apache.lucene.index.TermInfo
141394 instances of class
org.apache.lucene.index.FreqProxTermsWriter$PostingList
106729 instances of class org.apache.lucene.search.TermQuery
54858 instances of class org.apache.lucene.index.BufferedDeletes$Num
25347 instances of class
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput
23233 instances of class com.sun.tools.javac.zip.ZipFileIndexEntry
11587 instances of class org.apache.lucene.search.TermQuery$TermWeight
5793 instances of class org.apache.lucene.search.BooleanScorer2$Coordinator
5793 instances of class org.apache.lucene.search.ReqExclScorer
2922 instances of class org.apache.lucene.search.ConjunctionScorer$1
2170 instances of class org.apache.lucene.index.FieldInfo
1569 instances of class com.sun.tools.javac.zip.ZipFileIndex$DirectoryEntry
923 instances of class org.apache.catalina.loader.ResourceEntry
858 instances of class com.sun.tools.javac.util.List 

SECOND TIME I INDEX, COMPLETED (1.4 MILION DOCS INDEXED)
999753 instances of class org.apache.lucene.index.Term
808190 instances of class org.apache.lucene.index.TermInfo
156511 instances of class org.apache.lucene.search.TermQuery
128975 instances of class
org.apache.lucene.index.FreqProxTermsWriter$PostingList
104396 instances of class org.apache.lucene.index.BufferedDeletes$Num
23233 instances of class com.sun.tools.javac.zip.ZipFileIndexEntry
15401 instances of class
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput
14896 instances of class org.apache.lucene.search.TermQuery$TermWeight
7447 instances of class org.apache.lucene.search.BooleanScorer2$Coordinator
7447 instances of class org.apache.lucene.search.ReqExclScorer
3025 instances of class org.apache.lucene.search.Conjuncti

Re: /replication?command=isReplicating

2009-04-29 Thread Noble Paul നോബിള്‍ नोब्ळ्

nope . details is the only command which can give you this info

On Wed, Apr 29, 2009 at 7:10 PM, sunnyfr  wrote:
>
> Hi,
>
> Just to know if there is a quick way to get the information without hiting
> replication?command=details
> like =isReplicating
>
> Thanks,
> --
> View this message in context: 
> http://www.nabble.com/-replication-command%3DisReplicating-tp23295869p23295869.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul

Re: Advice on custom DIH or other solutions: LuSql

2009-04-29 Thread Glen Newton

The next version of LuSql[1] supports solutions for this kind of
issue: reading from JDBC (which may include a long and compex query)
and then writing the results to a single (flattened) JDBC table that
can subsequently be the source table for Solr. This might be helpful
for your particular issue.

As I am talking about the next version (0.93) of LuSql I should
describe it better:
The first version (0.9) used JDBC as a source, and used Lucene as a
sink. The sink portion was plugable, so different destinations besides
Lucene indexes were possible.

The new version of LuSql now also has a pluggable source, and the
sources/sinks implemented (or will be for the release) are:
Sources:
- JDBC
- Lucene
- BDB (from LuSql)
- Serialized documents (from LuSql)
- http (as client)
- http (as restful server: not done yet)
- RMI client
- Minion[2] (not done yet)
- Terrier[3] (not done yet)

Sinks:
- Lucene
- BDB
- JDBC
- RMI server
- SolrJ
- XML
- Serialized documents
- Minion (not done yet)
- Terrier (not done yet)
- Lemur[4] (not done yet)

So LuSql has evolved from a JDBC-to-Lucene tool, to a more general
tool for the the transformation of document-like (in the Lucene sense
of Document) data objects.

For example, the above use case of the user: for whatever reason, the
JDBC connection is too slow or takes a a long time to complete. Use
LuSql to convert the JDBC into BDB; then use the BDB (which is a fast
local file) either directly, or through LuSql to another sink, say,
like SolrJ to Solr.

LuSql will also be useful to information retrieval researchers, who
may want to quickly compare different IR tools, from the same corpus.

I am finishing up implementation this week, then on to testing, and
the hardest part, updating the documentation. I am looking at 3-4
weeks before an RC1 release.

If you have any questions or suggestions for sources/sinks, just
please contact me.

thanks,

Glen

glen.new...@gmail.com

[1]http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql
[2]https://minion.dev.java.net/
[3]http://ir.dcs.gla.ac.uk/terrier/
[4]http://www.lemurproject.org/lemur/

2009/4/29 Noble Paul നോബിള്‍  नोब्ळ् :
> On Wed, Apr 29, 2009 at 3:24 PM, Wouter Samaey  
> wrote:
>> Hi there,
>>
>> I'm currently in the process of learning more about Solr, and how I
>> can implement it into my project.
>>
>> Since my database is very large and complex, I'm looking into the way
>> of keeping my documents current in Solr. I have read the pages about
>> DIH, and find it usefull, but I may need more logic to filter out
>> documents or manipulate them. In order to use DIH, I'd need to run
>> huge queries and joins...
>>
>> Now, I see several ways of going forward:
>>
>> - customize DIH with a new classes so I can read directly from my
>> RDBMS (will be slow)
>> - let the webapp build an XML, and simply take that as a datasource
>> instead of the RDBMS (less queries, and can use memcached for the
>> heavy stuff)
>> - let the webapp instruct Solr to add, update or remove a document as
>> changes occur in real time instead of the DIH delta queries. For
>> loading a fresh situation, I'll still need to find a solution like the
>> ones above. (webapp drives solr directly, instead of DIH polling)
>>
>> Is there some general advice you can give? I understand every app is
>> different..but this must be an issue many have considered before.
>>
>> Kind regards
>>
>> Wouter Samaey
>>
> The disadvantage of DIH pulling data out of your db could be that
> complex queries take long. The best strategy as I see it is maintain a
> simple temp db where your app can write rows as you generate data.
> Periodically , ask DIH to read from this temp DB and update the index.
> This approach is good even even you wish to rebuild the index
>
>
> --
> --Noble Paul
>

-- 

-

function query scoring

2009-04-29 Thread Andrey Klochkov

Hi!

Base on docs in the wiki I thought that the following query should return
constant score "5" for all "socks" in the index:

http://localhost:8080/solr/select?q=name:socks _val_:5&fl=name,score

But in fact it finds all the products in the index and it seems that
"socks" products have higher score than others. What I need and what
function query seems have to do is to find ONLY socks and assign constant
score to all of them. Isn't it correct? I took  function query with constant
value just as an example.

-- 
Andrew Klochkov

/replication?command=isReplicating

2009-04-29 Thread sunnyfr


Hi, 

Just to know if there is a quick way to get the information without hiting
replication?command=details
like =isReplicating

Thanks, 
-- 
View this message in context: 
http://www.nabble.com/-replication-command%3DisReplicating-tp23295869p23295869.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Unable to import data from database

2009-04-29 Thread Ci-man


Found the problem. 
It is with Microsoft jdbc drivers (jdbc 2.0).

With the latest download Microsoft provides two .jar files:
sqljdbc.jar
sqljdbc4.jar

I had copied both into the lib directory. By doing so it used the older
drivers (sqljdbc.jar) which do not work with jvm1.6. You get this kind of
cryptic message in debug trace:

The JDBC Driver version 2.0 does not support JRE 1.4. You must upgrade JRE
1.4 to JRE 5.0 or later when using the JDBC Driver version 2.0. In some
cases, you might need to recompile your application because it might not be
compatible with JDK 5.0 or later. For more information, see the
documentation on Sun Microsystems Web site

No further help from MS or boards. I experimented and removed the
sqljdbc.jar file from lib directory so that only the sqljdbc4.jar is
available. And bingo. Everything is working like a charm.

Thanks everyone,
-Ci

-- 
View this message in context: 
http://www.nabble.com/Unable-to-import-data-from-database-tp23283852p23295866.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: field type for serialized code?

2009-04-29 Thread Matt Mitchell

Sorry, should of mentioned how I was serializing. In Ruby, I'm using
Marshal.dump. When loading back into ruby via Marshal.load, I get an error
related to the Marshaled version. I'm starting to play with JSON too.

Matt

On Wed, Apr 29, 2009 at 6:42 AM, Erik Hatcher wrote:

> Are you using REXML?  Or libxml?   I'm assuming this is from a Solr/Ruby
> (RSolr?) API call to add the document.
>
>Erik
>
>
> On Apr 28, 2009, at 9:12 PM, Matt Mitchell wrote:
>
>  Hi,
>>
>> I'm attempting to serialize a simple ruby object into a solr.StrField -
>> but
>> it seems that what I'm getting back is munged up a bit, in that I can't
>> de-serialize it. Is there a field type for doing this type of thing?
>>
>> Thanks,
>> Matt
>>
>
>

Re: Faceting - grouping results

2009-04-29 Thread Koji Sekiguchi


I'm not sure this is what you are looking for, but you may try to use
fq parameter? &q=*:*&fq=xxx:A&rows=10 for "at most 10 docs with xxx=A".

http://wiki.apache.org/solr/CommonQueryParameters#head-6522ef80f22d0e50d2f12ec487758577506d6002

Koji


Branca Marco wrote:

Hi,
I have a question about faceting.
I'm trying to retrieve results grouped by a certain field. I know that I can specify the parameter 
"rows" in order to ask Solr to return only "rows" documents.
What I would like to do is to ask Solr to return a certain number of documents 
for each category found in the faceting info.
For example, calling the URL

 
http://[server-ip]:[server-port]/select?q=*:*&facet=on&facet.field=xxx[&SOMETHING_ELSE]

on a set of indexed documents where xxx can assume the following values:
 - A
 - B
 - C

I would like to know what to set in the Solr URL in order to obtain, for 
instance:
 - at most 10 docs with xxx=A
 - at most 10 docs with xxx=B
 - at most 10 docs with xxx=C

Thank you for your help,

Marco Branca
Consultant Sytel Reply S.r.l.
Via Ripamonti,  89 - 20139 Milano
Mobile: (+39) 348 2298186
e-mail: m.bra...@reply.it
Website: www.reply.eu

--
The information transmitted is intended for the person or entity to which it is 
addressed and may contain confidential and/or privileged material. Any review, 
retransmission, dissemination or other use of, or taking of any action in 
reliance upon, this information by persons or entities other than the intended 
recipient is prohibited. If you received this in error, please contact the 
sender and delete the material from any computer.

Re: Problem adding unicoded docs to Solr through SolrJ

2009-04-29 Thread Michael Ludwig


ahmed baseet schrieb:


public void postToSolrUsingSolrj(String rawText, String pageId) {



doc.addField("features", rawText );



In the above the param rawText is just the html stripped off of all
its tags, js, css etc and pageId is the Url for that page. When I'm
using this for English pages its working perfectly fine but the
problem comes up when I'm trying to index some non-english pages.


Maybe you're constructing a string without specifying the encoding, so
Java uses your default platform encoding?

String(byte[] bytes)
  Constructs a new String by decoding the specified array of
  bytes using the platform's default charset.

String(byte[] bytes, Charset charset)
  Constructs a new String by decoding the specified array of bytes using
  the specified charset.


Now what I did is just extracted the raw text from that html page and
manually created an xml page like this



  
UTF2TEST
Test with some UTF-8 encoded characters
*some tamil unicode text here*
   


and posted this from command line using the post.jar file. Now searching
gives me the result but unlike last time browser shows the indexed text in
tamil itself and not the raw unicode.


Now that's perfect, isn't it?


I tried doing something like this also,



// Encode in Unicode UTF-8
 utfEncodedText = new String(rawText.getBytes("UTF-8"));

but even this didn't help eighter.


No encoding specified, so the default platform encoding is used, which
is likely not what you want. Consider the following example:

package milu;
import java.nio.charset.Charset;
public class StringAndCharset {
  public static void main(String[] args) {
byte[] bytes = { 'K', (byte) 195, (byte) 164, 's', 'e' };
System.out.println(Charset.defaultCharset().displayName());
System.out.println(new String(bytes));
System.out.println(new String(bytes,  Charset.forName("UTF-8")));
  }
}

Output:

windows-1252
KÃ¤se (bad)
Käse (good)

Michael Ludwig

Getting junk characters while indexing

2009-04-29 Thread Koushik Mitra

Hi,

We are trying to index a .doc file. However, after indexing the dot( . ) and 
apostrophe( ' ) present in the file, getting converted to junk values.

How to resolve the issue?

Thanks,
Koushik

 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are 
not 
to copy, disclose, or distribute this e-mail or its contents to any other 
person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken 
every reasonable precaution to minimize this risk, but is not liable for any 
damage 
you may sustain as a result of any virus in this e-mail. You should carry out 
your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this 
e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***

Re: how to reset the index in solr

2009-04-29 Thread Erik Hatcher



On Apr 29, 2009, at 12:19 AM, Geetha wrote:
I need a function (through solr ruby) for ruby that will allow us to  
clear everything


require 'solr'
solr = Solr::Connection.new("http://localhost:8983/solr";)
solr.delete_by_query('*:*')
solr.commit

Erik

Re: Unable to import data from database

2009-04-29 Thread Ci-man


Thanks.
I found the interactive debugger .

solr/admin/dataimport.jsp

and I am seeing exceptions in Java that I can dig into
-- 
View this message in context: 
http://www.nabble.com/Unable-to-import-data-from-database-tp23283852p23295859.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to reset the index in solr

2009-04-29 Thread Erik Hatcher



On Apr 28, 2009, at 11:33 PM, Geetha wrote:


Thank you Erik..

Should I write the below code in rake task /lib/tasks/solr.rake?


There's a start to some Solr Rake tasks in solr-ruby's lib/solr/ 
solrtasks.rb.



I am newbie to ruby.


Welcome!   It's a fun fun world to be in :)

Erik

Re: Performance and number of search results

2009-04-29 Thread Michael Ludwig


Wouter Samaey schrieb:


Can someone please comment on the performance impact of the number of
search results?
Is there a big difference between querying for 1 result, 10, 20 or
even 100 ?


Probably not, but YMMV, as the question is very general.

Consider that for fast queries the HTTP round trip may well be the
determining factor. Or XML parsing. If you've stored a lot of data in
Solr and request all of it to be returned, the difference between 1 and
100 results may be the difference between 1 and 100 KB payload.

If you think it matters, the best thing for you would be to do some
profiling for your specific scenario.

The rule of thumb here is probably: Get what you need.

Michael Ludwig

Re: UTF8 compatibility

2009-04-29 Thread Shalin Shekhar Mangar

On Wed, Apr 29, 2009 at 12:45 PM, Muhammed Sameer wrote:

>
> So I tried to run the test_utf8.sh script and got the following output
> {code}
> Solr server is up.
> HTTP GET is accepting UTF-8
> HTTP POST is accepting UTF-8
> HTTP POST defaults to UTF-8
> ERROR: HTTP GET is not accepting UTF-8 beyond the basic multilingual plane
> ERROR: HTTP POST is not accepting UTF-8 beyond the basic multilingual plane
> ERROR: HTTP POST + URL params is not accepting UTF-8 beyond the basic
> multilingual plane
> {code}
>
>
Make sure your tomcat (or whichever container you are using) is setup to
accept UTF-8 for quering. Instructions for tomcat at
http://wiki.apache.org/solr/SolrTomcat#head-20147ee4d9dd5ca83ed264898280ab60457847c4
-- 
Regards,
Shalin Shekhar Mangar.

Re: UTF8 compatibility

2009-04-29 Thread Michael Ludwig


Muhammed Sameer schrieb:


We run post.jar periodically ie after every 15mins to commit the
changes, Is this approach correct ?


Sounds reasonable to me.


SimplePostTool: WARNING: Make sure your XML documents are encoded in
UTF-8, other encodings are not currently supported


That's just to remind you not to try and post documents in another
encoding. This seems to be a limitation of the SimplePostTool, not of
Solr. I guess the reason is that in order for Solr to work quickly and
reliably, it relies on the Content-Type of the request to determine the
encoding. If, for example, you send XML encoded in ISO-8859-1, you have
to specify that in two places:

* XML declaration: 
* HTTP header: Content-Type: text/xml; charset=ISO-8859-1

The SimplePostTool, however, being just what the name says, may not
bother to read the encoding from the document and bring the HTTP content
type header in line. Instead, it explicitly requests UTF-8, probably in
the interest of simplicity.

Well, that's just my theory. Can anyone confirm?


So I tried to run the test_utf8.sh script and got the following output
{code}
Solr server is up.
HTTP GET is accepting UTF-8
HTTP POST is accepting UTF-8
HTTP POST defaults to UTF-8
ERROR: HTTP GET is not accepting UTF-8 beyond the basic multilingual plane
ERROR: HTTP POST is not accepting UTF-8 beyond the basic multilingual plane
ERROR: HTTP POST + URL params is not accepting UTF-8 beyond the basic 
multilingual plane
{code}

Are these errors normal or do I need to change something ?


I'm seeing the same output, don't worry, just some tests. It is possible
to have Solr index documents containing characters outside of the BMP
(Basic Multilingual Plane), which can be verified posting something like
this:


  
1001
BMP plus 1 𐀀
  


Maybe the test script output says that such characters cannot be used
for querying. Hardly relevant if you consider that the BMP comprises
even languages such as Telugu, Bopomofo and French.

Best,

Michael Ludwig

Problem adding unicoded docs to Solr through SolrJ

2009-04-29 Thread ahmed baseet

Hi All,
I'm trying to automate the process of posting xml s to Solr using Solrj.
Essentially I'm extracting the text from a given Url, then creating a
solrDoc and posting the same using the following function,

public void postToSolrUsingSolrj(String rawText, String pageId) {
String url = "http://localhost:8983/solr";;
CommonsHttpSolrServer server;

try {
// Get connection to Solr server
  server = new CommonsHttpSolrServer(url);

// Set XMLResponseParser : Reqd for older version of Solr 1.3
server.setParser(new XMLResponseParser());

server.setSoTimeout(1000);  // socket read timeout
  server.setConnectionTimeout(100);
  server.setDefaultMaxConnectionsPerHost(100);
  server.setMaxTotalConnections(100);
  server.setFollowRedirects(false);  // defaults to false
  // allowCompression defaults to false.
  // Server side must support gzip or deflate for this to have
any effect.
  server.setAllowCompression(true);
  server.setMaxRetries(1); // defaults to 0.  > 1 not
recommended.

// WARNING : this will delete all pre-existing Solr index
//server.deleteByQuery( "*:*" );// delete everything!

SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", pageId );
doc.addField("features", rawText );


// Add the docs to Solr Server
server.add(doc);

// Do commit the changes
server.commit();

}catch (Exception e) {}
}

In the above the param rawText is just the html stripped off of all its
tags, js, css etc and pageId is the Url for that page. When I'm using this
for English pages its working perfectly fine but the problem comes up when
I'm trying to index some non-english pages. For them, say pages in tamil,
the encoding Unicode/Utf-8 seems to create some problem, because after
indexing some non-english pages when I'm trying to search those from solr
admin search interface, it gives the result but the content is not showing
in that language i.e tamil rather it just displays just some characters, i
think in unicode. The same thing worked fine for pages in English.

Now what I did is just extracted the raw text from that html page and
manually created an xml page like this



  
UTF2TEST
Test with some UTF-8 encoded characters
*some tamil unicode text here*
   


and posted this from command line using the post.jar file. Now searching
gives me the result but unlike last time browser shows the indexed text in
tamil itself and not the raw unicode. So this clearly shows that the string
that I'm using to create the solrDoc seems to have some encoding issues,
right? Or something else? I tried doing something like this also,

// Encode in Unicode UTF-8
 utfEncodedText = new String(rawText.getBytes("UTF-8"));

but even this didn't help eighter.
Its seems some silly problem some where, which I'm not able to catch. :-)

I appreciate if some one can point me the bug...

Thanks,
Ahmed.

Re: Addition of new field to Solr schema.xml not getting reflected properly

2009-04-29 Thread ahmed baseet

I added some new documents,  and for these docs I can use the new field,
right? Though to reflect the changes for all docs I need to delete the old
index and build a new one.
As I mentioned earlier after a couple of restarts its worked. Still don't
know whats the issue. :-)

Thanks,
Ahmed.

On Wed, Apr 29, 2009 at 4:13 PM, Erik Hatcher wrote:

> Did you reindex your documents after making changes and restarting?  The
> types of changes you're making require reindexing.
>
>Erik
>
>
> On Apr 29, 2009, at 2:13 AM, ahmed baseet wrote:
>
>  Hi All,
>> I'm trying to add a new field to Solr, so I stopped the tomcat[I'm working
>> on Windows] using the "Configure Tomcat" menu of Tomcat, then added the
>> following field
>> 
>> After restarting Tomcat, I couldn't see the changes, so I did the restart
>> couple of times, and then the schema showed me the changes. Now I tried to
>> change the type to
>> "text" from the current "string". I did that and restarted tomcat many
>> times
>> but the changes are still not getting reflected. I've used Solr1.2 earlier
>> on linux, and every time I just had to touch the web.xml in webapp
>> directory
>> thereby forcing tomcat to restart itself for changes in schema.xml to take
>> effect, but in windows I'm not able to figure out whats the issue. Is
>> there
>> anything wrong, I'm I supposed to restart tomcat in some other way instead
>> of using the "Configure tomcat" menu. Has anyone faced similar issues with
>> solr1.3 on windows? Any suggestion would be appreciated.
>>
>> Thanks,
>> Ahmed.
>>
>
>

Re: Addition of new field to Solr schema.xml not getting reflected properly

2009-04-29 Thread Erik Hatcher

Did you reindex your documents after making changes and restarting?   
The types of changes you're making require reindexing.


Erik

On Apr 29, 2009, at 2:13 AM, ahmed baseet wrote:


Hi All,
I'm trying to add a new field to Solr, so I stopped the tomcat[I'm  
working
on Windows] using the "Configure Tomcat" menu of Tomcat, then added  
the

following field

After restarting Tomcat, I couldn't see the changes, so I did the  
restart
couple of times, and then the schema showed me the changes. Now I  
tried to

change the type to
"text" from the current "string". I did that and restarted tomcat  
many times
but the changes are still not getting reflected. I've used Solr1.2  
earlier
on linux, and every time I just had to touch the web.xml in webapp  
directory
thereby forcing tomcat to restart itself for changes in schema.xml  
to take
effect, but in windows I'm not able to figure out whats the issue.  
Is there
anything wrong, I'm I supposed to restart tomcat in some other way  
instead
of using the "Configure tomcat" menu. Has anyone faced similar  
issues with

solr1.3 on windows? Any suggestion would be appreciated.

Thanks,
Ahmed.

Re: Advice on custom DIH or other solutions

2009-04-29 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Wed, Apr 29, 2009 at 3:24 PM, Wouter Samaey  wrote:
> Hi there,
>
> I'm currently in the process of learning more about Solr, and how I
> can implement it into my project.
>
> Since my database is very large and complex, I'm looking into the way
> of keeping my documents current in Solr. I have read the pages about
> DIH, and find it usefull, but I may need more logic to filter out
> documents or manipulate them. In order to use DIH, I'd need to run
> huge queries and joins...
>
> Now, I see several ways of going forward:
>
> - customize DIH with a new classes so I can read directly from my
> RDBMS (will be slow)
> - let the webapp build an XML, and simply take that as a datasource
> instead of the RDBMS (less queries, and can use memcached for the
> heavy stuff)
> - let the webapp instruct Solr to add, update or remove a document as
> changes occur in real time instead of the DIH delta queries. For
> loading a fresh situation, I'll still need to find a solution like the
> ones above. (webapp drives solr directly, instead of DIH polling)
>
> Is there some general advice you can give? I understand every app is
> different..but this must be an issue many have considered before.
>
> Kind regards
>
> Wouter Samaey
>
The disadvantage of DIH pulling data out of your db could be that
complex queries take long. The best strategy as I see it is maintain a
simple temp db where your app can write rows as you generate data.
Periodically , ask DIH to read from this temp DB and update the index.
This approach is good even even you wish to rebuild the index


-- 
--Noble Paul

Re: field type for serialized code?

2009-04-29 Thread Erik Hatcher

Are you using REXML?  Or libxml?   I'm assuming this is from a Solr/ 
Ruby (RSolr?) API call to add the document.


Erik

On Apr 28, 2009, at 9:12 PM, Matt Mitchell wrote:


Hi,

I'm attempting to serialize a simple ruby object into a  
solr.StrField - but
it seems that what I'm getting back is munged up a bit, in that I  
can't

de-serialize it. Is there a field type for doing this type of thing?

Thanks,
Matt

Advice on custom DIH or other solutions

2009-04-29 Thread Wouter Samaey

Hi there,

I'm currently in the process of learning more about Solr, and how I
can implement it into my project.

Since my database is very large and complex, I'm looking into the way
of keeping my documents current in Solr. I have read the pages about
DIH, and find it usefull, but I may need more logic to filter out
documents or manipulate them. In order to use DIH, I'd need to run
huge queries and joins...

Now, I see several ways of going forward:

- customize DIH with a new classes so I can read directly from my
RDBMS (will be slow)
- let the webapp build an XML, and simply take that as a datasource
instead of the RDBMS (less queries, and can use memcached for the
heavy stuff)
- let the webapp instruct Solr to add, update or remove a document as
changes occur in real time instead of the DIH delta queries. For
loading a fresh situation, I'll still need to find a solution like the
ones above. (webapp drives solr directly, instead of DIH polling)

Is there some general advice you can give? I understand every app is
different..but this must be an issue many have considered before.

Kind regards

Wouter Samaey

Re: boost qf weight between 0 and 10

2009-04-29 Thread sunnyfr


How can I get the weight of a field and use it in bf ?? 
thanks a lot


sunnyfr wrote:
> 
> Hi Hoss, 
> thanks for this answser, and is there a way to get the weight of a field ? 
> like that and use it in the bf? queryWeight
> 
> 
>   0.14232224 = (MATCH) weight(text:chien^0.2 in 9412049), product of:
> 0.0813888 = queryWeight(text:chien^0.2), product of:
>   0.2 = boost
>   6.5946517 = idf(docFreq=55585, numDocs=14951742)
>   0.061708186 = queryNorm
> 
> 
> thanks 
> 
> 
> hossman wrote:
>> 
>> 
>> : I don't get really, I try to boost a field according to another one but
>> I've
>> : a huge weight when I'm using qf boost like :
>> : 
>> : /select?qt=dismax&fl=*&q="obama
>> : meeting"&debugQuery=true&qf=title&bf=product(title,stat_views)
>> 
>> bf is a boost function -- you are using a product fucntion to multiply
>> the 
>> "title" field by the stat_views" field ... this doesn't make sense to me?
>> 
>> i'm assuming the "title" field contains text (the rest of your score 
>> explanation confirms this).  when you try to do a math function on a 
>> string based field it deals with the "ordinal" value -- the higher the 
>> string is lexigraphically compared to all other docs ,the higher the 
>> ordinal value.
>> 
>> i have no idea what's in your stat_views field -- but i can't imagine any 
>> way in which multipling it by the ordinal value of your text field would 
>> make sense...
>> 
>> :   5803675.5 = (MATCH)
>> FunctionQuery(product(ord(title),sint(stat_views))),
>> : product of:
>> : 9.5142968E7 = product(ord(title)=1119329,sint(stat_views)=85)
>> : 1.0 = boost
>> : 0.06099952 = queryNorm
>> 
>> : But this is not equilibrate between this boost in qf and bf, how can I
>> do ?
>> 
>> when it comes to function query, you're on your own to figure out an 
>> appropriate query boost to blanace the scores out -- when you use a 
>> product function the scores are going to get huge like this unless you 
>> balance it somehow (and that ord(title) is just making this massively 
>> worse)
>> 
>> 
>> -Hoss
>> 
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/boost-qf-weight-between-0-and-10-tp22081396p23293956.html
Sent from the Solr - User mailing list archive at Nabble.com.

Performance and number of search results

2009-04-29 Thread Wouter Samaey

Hello,

Can someone please comment on the performance impact of the number of
search results?
Is there a big difference between querying for 1 result, 10, 20 or even 100 ?

Thanks in advance

Wouter Samaey

UTF8 compatibility

2009-04-29 Thread Muhammed Sameer


Salaam,

I have a question, its in two parts actually and are related

We run post.jar periodically ie after every 15mins to commit the changes, Is 
this approach correct ?

When I run this I get the following message
{code}
SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, 
other encodings are not currently supported
SimplePostTool: COMMITting Solr index changes..
{code}

So I tried to run the test_utf8.sh script and got the following output
{code}
Solr server is up.
HTTP GET is accepting UTF-8
HTTP POST is accepting UTF-8
HTTP POST defaults to UTF-8
ERROR: HTTP GET is not accepting UTF-8 beyond the basic multilingual plane
ERROR: HTTP POST is not accepting UTF-8 beyond the basic multilingual plane
ERROR: HTTP POST + URL params is not accepting UTF-8 beyond the basic 
multilingual plane
{code}

Are these errors normal or do I need to change something ?

Thanks for your time.

Regards,
Muhammed Sameer

49 matches

Mail list logo