from:"\"Lance Norskog\""

Re: Some indexing requests to Solr fail

2010-03-31 Thread Lance Norskog

'waitFlush' means 'wait until the data from this commit is completely
written to disk'.  'waitSearcher' means 'wait until Solr has
completely finished loading up the new index from what it wrote to
disk'.

Optimize rearranges the entire disk footprint of the disk. It needs a
separate amount of free disk space in the same partition. Usually
people run optimize overnight, not during active production hours.
There is a way to limit the optimize pass so that it makes the index
'more optimized': the maxSegments parameter:

http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22commit.22_and_.22optimize.22

On Wed, Mar 31, 2010 at 10:04 AM, Jon Poulton  wrote:
> Hi there,
> Thanks for the reply!
>
> Our backend code is currently set to commit every time it sends over a
> batch of documents - so it depends on how big the batch is and how
> often edits occur - probably too often. I've looked at the code, and
> the SolrJ commit() method takes two parameters - one is called
> waitSearcher, and another waitFlush. They aren't really documented too
> well, but I assume that the waitSearcher bool (currently set to false)
> may be part of the problem.
>
> I am considering removing the code that calls the commit() method
> altogether and relying on the settings for DirectUpdateHandler2 to
> determine when commits actually get done. That way we can tweak it on
> the Solr side without having to recompile and redeploy our main app
> (or by having to add new settings and code to handle them to our main
> app).
>
> Out of curiosity; how are people doing optimize() calls? Are you doing
> them immediately after every commit(), or periodically as part of a job?
>
> Jon
>
> On 31 Mar 2010, at 05:11, Lance Norskog wrote:
>
>> How often do you commit? New searchers are only created after a
>> commit. You notice that handleCommit is in the stack trace :) This
>> means that commits are happening too often for the amount of other
>> traffic currently happening, and so it can't finishing creating the
>> searcher before the next commit starts the next searcher.
>>
>> The "service unavailable" messages are roughly the same problem: these
>> commits might be timing out because the other end is too busy doing
>> commits.  You might try using autocommit instead: commits can happen
>> every N documents, every T seconds, or both. This keeps the commit
>> overhead to a controlled amount and commits should stay behind warming
>> up previous searchers.
>>
>> On Tue, Mar 30, 2010 at 7:15 AM, Jon Poulton 
>> wrote:
>>> Hi there,
>>> We have a setup in which our main application (running on a
>>> separate Tomcat instance on the same machine) uses SolrJ calls to
>>> an instance of Solr running on the same box. SolrJ is used both for
>>> indexing and searching Solr. Searching seems to be working fine,
>>> but quite frequently we see the following stack trace in our
>>> application logs:
>>>
>>> org.apache.solr.common.SolrException: Service Unavailable
>>> Service Unavailable
>>> request: http://localhost:8070/solr/unify/update/javabin
>>>  at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request
>>> (CommonsHttpSolrServer.java:424)
>>>  at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request
>>> (CommonsHttpSolrServer.java:243)
>>>  at
>>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process
>>> (AbstractUpdateRequest.java:105)
>>>  at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:
>>> 86)
>>>  at vyre.content.rabida.index.RemoteIndexingThread.sendIndexRequest
>>> (RemoteIndexingThread.java:283)
>>>  at vyre.content.rabida.index.RemoteIndexingThread.commitBatch
>>> (RemoteIndexingThread.java:195)
>>>  at vyre.util.thread.AbstractBatchProcessor.commit
>>> (AbstractBatchProcessor.java:93)
>>>  at vyre.util.thread.AbstractBatchProcessor.run
>>> (AbstractBatchProcessor.java:117)
>>>  at java.lang.Thread.run(Thread.java:619)
>>>
>>> Looking in the Solr logs, there does not appear to be any problems.
>>> The host and port number are correct, its just sometimes our
>>> content gets indexed (visible in the solr logs), and sometimes it
>>> doesn't (nothing visible in solr logs). I'm not sure what could be
>>> causing this problem, but I can hazard a couple of guesses; is
>>> there any upper llimit on the size of a javabin request, or any
>>> point at which the service would decide that the POST was too
>>>

Re: Shred queries on EmbeddedSolrServer

2010-03-31 Thread Lance Norskog

You can create and destroy cores over the HTTP interface:

http://www.lucidimagination.com/search/document/CDRG_ch08_8.2.5

But you are right, the Embedded Solr API does not support Distributed
Search across multiple cores. See:

org.apache.solr.handler.component.SearchHandler.submit()  which very
definitly only does HTTP requests.

https://issues.apache.org/jira/browse/SOLR-1858 requests this feature.

On Wed, Mar 31, 2010 at 3:51 AM, Claudio Atzori
 wrote:
> In my application I need to create and destroy indexes via java code, so to
> bypass the http requests I'm using the EmbeddedSolrServer, and I am creating
> different SolrCore(s) one per every index I need.
> Now the point is that a requirement of my application is the capability to
> perfom a query on a specific index, on a subset of indexes, or on every
> index.
>
> I have been looking to the "shred" parameter:
>
> <http://localhost:8080/solr/core1/select?shards=localhost:8080/solr/core1,localhost:8080/solr/core2&q=some>http://localhost:8080/solr/core1/select?shards=localhost:8080/solr/core1,localhost:8080/solr/core2&q=some
> query...
>
> ...and ok, but my solr core instances doesn't expose an http interface, so
> how can I shred a query between all my solr cores?
>
> Thanks in advance,
> Claudio
>



-- 
Lance Norskog
goks...@gmail.com

Re: Some indexing requests to Solr fail

2010-03-30 Thread Lance Norskog

atalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
>       at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
>       at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
>       at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>       at java.lang.Thread.run(Thread.java:619)
>
> This appears to be an unrelated problem, as the timing is different from the 
> rejected indexing requests. As there is a large number of concurrent 
> searching and indexing going on constantly, I'm guessing that I've set the 
> number of "maxWarmingSearches" too low, and that as two searchers are warming 
> up, further indexing causes more searches to be warmed, violating this 
> maximum value - does this sound like a reasonable conclusion?
>
> Thanks in advance for any help.
>
> Jon
>



-- 
Lance Norskog
goks...@gmail.com

Re: how to create this highlighter behaviour

2010-03-30 Thread Lance Norskog

The regular expression fragmenter might help you here.

http://wiki.apache.org/solr/HighlightingParameters#hl.fragmenter

On Mon, Mar 29, 2010 at 4:38 PM, Joe Calderon  wrote:
> hello *,  ive been using the highlighter and been pretty happy with
> its results, however theres an edge case im not sure how to fix
>
> for query: amazing grace
>
> the record matched and highlighted is
> amazing rendition of amazing grace
>
> is there any way to only highlight amazing grace without using phrase
> queries, can i modify the highlighter components to only use terms
> once and to favor contiguous sections?
>
> i dont want to enforce phrase queries as sometimes i do want terms out
> of order highlighter but i only want each term matched highlighted
> once
>
>
> does this make sense?
>



-- 
Lance Norskog
goks...@gmail.com

Re: Experiences with SOLR-1797 ?

2010-03-29 Thread Lance Norskog

There was only one report of the problem.

I just read the patch and original source and it looks right; in
concurrent programming these are "famous last words" :)

2010/3/29 Daniel Nowak :
> Hello,
>
> has anyone some experiences with this patch of SOLR-1797 
> (http://issues.apache.org/jira/browse/SOLR-1797) ?
>
> Best Regards
>
>
> Daniel Nowak
> Senior Developer
>
> Rocket Internet GmbH  |  Saarbrücker Straße 20/21  |  10405 Berlin  | 
> Deutschland
>
> tel: +49 30 / 559 554 66  |  fax: +49 30 / 559 554 67  |  skype: 
> daniel.s.nowak
>
> mail: daniel.no...@rocket-internet.de
>
> Geschäftsführer: Frank Biedka, Dr. Florian Heinemann, Uwe Horstmann, Felix 
> Jahn, Arnt Jeschke, Dr. Philipp Kreibohm
>
> Eingetragen beim Amtsgericht Berlin, HRB 109262 USt-ID DE256469659
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Solr not returning all documents?

2010-03-29 Thread Lance Norskog

Yes, this should work. It will be very slow.

There is a special hack by which you can say sort=_docid_+asc (or
+desc). _docid_ is a magic field name that avoids sorting the results.
Pulling documents at row # 1 million should be only a little slower
than pulling documents at row #0.

On Mon, Mar 29, 2010 at 12:37 AM, Adrian Pemsel  wrote:
> Hi,
>
> As part of our application I have written a reindex task that runs through
> all documents in a core one by one (using *:*, a start offset and a row
> limit of 1) and adds them to a new core (potentially with a new schema).
> However, while working well for small sets this approach somehow does not
> seem to work for larger data sets. The Reindex task counts its offset into
> the old core, this count stops at about 118000 and no more documents are
> returned. However, numDocs says there are around 582000 documents in the old
> core.
> Am I making a wrong assumption in believing I should get all documents like
> this?
>
> Thanks,
>
> Adrian
>



-- 
Lance Norskog
goks...@gmail.com

Re: Including Tika-extracted docs in a document?

2010-03-29 Thread Lance Norskog

Look at the 'rootEntity' attribute in the DataImportHandler, both the
description and the examples:

http://wiki.apache.org/solr/DataImportHandler#Schema_for_the_data_config

It is active for all entities. It means that you can run several
operations in the outer entities, then have all of their fields come
together in an inner entity. You have to say 'rootEntity="false"'
inwards until the last entity before your main document. (No, that is
not a clear explanation.)

This would let you create multi-valued fields, one value from each
input document. Otherwise, this is a hard one.

On Fri, Mar 26, 2010 at 10:37 PM, Don Werve  wrote:
> Is it possible to perform Tika extraction on multiple files that are indexed
> as part of a single document?
>

-- 
Lance Norskog
goks...@gmail.com

Re: multicore embedded swap / reload etc.

2010-03-29 Thread Lance Norskog

The code snippet you give tells how to access existing cores that are
registered in the top-level solr.xml file. The Wiki pages tells how
these cores are configured

The Wiki pages also discusses dynamic operations on multiple cores.
SolrJ should be able to do these as well (but I am not a SolrJ
expert).

On Fri, Mar 26, 2010 at 12:39 PM, Nagelberg, Kallin
 wrote:
> Thanks everyone,
> I was following the solrj wiki which says:
>
>
> """
> If you want to use MultiCore features, then you should use this:
>
>
>    File home = new File( "/path/to/solr/home" );
>    File f = new File( home, "solr.xml" );
>    CoreContainer container = new CoreContainer();
>    container.load( "/path/to/solr/home", f );
>
>    EmbeddedSolrServer server = new EmbeddedSolrServer( container, "core name 
> as defined in solr.xml" );
>    ...
> """
>
> I'm just a little confused with the disconnect between that and what I see 
> about managing multiple cores here: http://wiki.apache.org/solr/CoreAdmin . 
> If someone could provide some high-level directions it would be greatly 
> appreciated.
>
> Thanks,
> -Kallin Nagelberg
>
>
> -Original Message-
> From: Mark Miller [mailto:markrmil...@gmail.com]
> Sent: Friday, March 26, 2010 7:54 AM
> To: solr-user@lucene.apache.org
> Subject: Re: multicore embedded swap / reload etc.
>
> Embedded supports MultiCore  - it's the direct core connection thing
> that supports one.
>
> - Mark
>
> http://www.lucidimagination.com (mobile)
>
> On Mar 26, 2010, at 7:38 AM, Erik Hatcher 
> wrote:
>
>> But wait... embedded Solr doesn't support multicore, does it?  Just
>> off memory, I think it's fixed to a single core.
>>
>>    Erik
>>
>> On Mar 25, 2010, at 10:31 PM, Lance Norskog wrote:
>>
>>> All operations through the SolrJ work exactly the same against the
>>> Solr web app and embedded Solr. You code the calls to update cores
>>> with the same SolrJ APIs either way.
>>>
>>> On Wed, Mar 24, 2010 at 2:19 PM, Nagelberg, Kallin
>>>  wrote:
>>>> Hi,
>>>>
>>>> I've got a situation where I need to reindex a core once a day. To
>>>> do this I was thinking of having two cores, one 'live' and one
>>>> 'staging'. The app is always serving 'live', but when the daily
>>>> index happens it goes into 'staging', then staging is swapped into
>>>> 'live'. I can see how to do this sort of thing over http, but I'm
>>>> using an embedded solr setup via solrJ. Any suggestions on how to
>>>> proceed? I could just have two solrServer's built from different
>>>> coreContainers, and then swap the references when I'm ready, but I
>>>> wonder if there is a better approach. Maybe grab a hold of the
>>>> CoreAdminHandler?
>>>>
>>>> Thanks,
>>>> Kallin Nagelberg
>>>>
>>>
>>>
>>>
>>> --
>>> Lance Norskog
>>> goks...@gmail.com
>>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Complex relational values

2010-03-29 Thread Lance Norskog

If 'item' is the unique document level, then this can be done with:
unique id: your own design
searchable text fields:
foo_x:
foo_y:
bar_x:
bar_y:

The query becomes:
foo_x:[100 TO *] AND foo_y:[500 TO *]

Note that to search the other fields with dismax, and foo* with the
standard query parser, you'll need to combine the two with the crazy
multi-parser syntax.

On Fri, Mar 26, 2010 at 10:49 AM, Kumaravel Kandasami
 wrote:
> I would represent each "item" element as a document, and each attribute as
> the fields of the document.
>
> if the field names are not known upfront, you could create 'dynamic fields'.
>
>
>
>
> Kumar    _/|\_
> www.saisk.com
> ku...@saisk.com
> "making a profound difference with knowledge and creativity..."
>
>
> On Fri, Mar 26, 2010 at 12:37 PM, Phil Messenger  wrote:
>
>> Hi,
>>
>> I need to store structured information in an index entry for use when
>> filtering. As XML, this could be expressed as:
>>
>> 
>>        
>>        
>>                
>>                
>>        
>> 
>>
>> I want to be able to *filter* search results according to the data in the
>> "item" tags - eg. show all index entries which match the expression
>> "type=foo && x > 100 & y > 500"
>>
>> Having a multivalued field for type, x and y doesn't seem to work here as
>> I need to maintain the relationship between a type/x/y.
>>
>> I'm not sure how to approach this problem. Is writing a custom field type
>> the
>> preferred approach?
>>
>> thanks,
>>
>> Phil.
>>
>>
>



-- 
Lance Norskog
goks...@gmail.com

Re: solr highlighting

2010-03-29 Thread Lance Norskog

No problem: wrapping and unwrapping escaped text can be very confusing.

On Fri, Mar 26, 2010 at 6:31 AM, Niraj Aswani  wrote:
> Hi Lance,
>
> apologies.. please ignore my previous mail.  I'll have a look at the
> PatternReplaceFilter.
>
> Thanks,
> Niraj
>
> Niraj Aswani wrote:
>>
>> Hi Lance,
>>
>> Yes, that is once solution but wouldn't it stop people searching for
>> something like "> characters at the index time, one would have to write a query like
>> "<choice".  Am I right?
>>
>> Thanks,
>> Niraj
>>
>> Lance Norskog wrote:
>>>
>>> To display html-markup in an html page, it has to be in entity-encoded
>>> form. So, encode the <> as entities in your input application, and
>>> have it indexed and stored in this format. Then, the  are
>>> inserted as normal. This gives you the html text displayable in an
>>> html page, with all words highlightable. And add gt/lt etc. as
>>> stopwords.
>>>
>>> At this point you have the element names, attribute names and values,
>>> and text parts searchable and highlightable. If you only want the HTML
>>> syntax parts shown, the PatternReplaceFilter is your friend: with
>>> regex patterns you can pull out those values and ignore the text
>>> parts.
>>>
>>> The analysis.jsp page will make it much much easier to debug this.
>>>
>>> Good luck!
>>>
>>> On Thu, Mar 25, 2010 at 8:21 AM, Niraj Aswani 
>>> wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> I am using the following two parameters to highlight the hits.
>>>>
>>>> "hl.simple.pre=" + URLEncoder.encode("")
>>>> "hl.simple.post=" + URLEncoder.encode("")
>>>>
>>>> This seems to work.  However, there is a bit of trouble when the text
>>>> itself
>>>> contains html markup.
>>>>
>>>> For example, I have indexed a document with the following text in it.
>>>> ===
>>>> something here...
>>>> xyz
>>>> something here..
>>>> ===
>>>>
>>>> When I search for the keyword choice, what it does is, it inserts
>>>> ""
>>>> just before the word choice and "" immediately after the word
>>>> choice. It results into something like below:
>>>>
>>>> <choice minOccurs="1"
>>>> maxOccurs="unbounded">xyzchoice>
>>>>
>>>>
>>>> I would like it to be something like:
>>>>
>>>> <choice minOccurs="1"
>>>> maxOccurs="unbounded">xyz/choice>
>>>>
>>>> Is there any way to do it such that the highlight content is encoded as
>>>> HTML
>>>> but the prefix and suffix are not?
>>>>
>>>> Thanks,
>>>> Niraj
>>>>
>>>>
>>>>
>>>> When I issue a query, it returns all the corret
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: SOLR-1316 How To Implement this autosuggest component ???

2010-03-29 Thread Lance Norskog

SOLR-1316 uses a much faster data structure (Ternary Search Tree), not
a Lucene index. Using Ngram-based tools like the spellchecker, or your
implementation is inherently slower.

Netflix, for example, uses a dedicated TST server farm (their own
implementation of TST) to do auto-complete.

On Fri, Mar 26, 2010 at 3:32 AM, stockii  wrote:
>
> hey thx.
>
> i think the component runs so far, but i don´t see what it brings me.
>
> my first autocompletion-solution was with EdgeNGram ... and its exactly the
> same result ...
>
> can anyone, plese show me the advantages of the Issue-1316 ?!
> --
> View this message in context: 
> http://n3.nabble.com/SOLR-1316-How-To-Implement-this-patch-autoComplete-tp506492p661787.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

-- 
Lance Norskog
goks...@gmail.com

Re: Solrj doesn't tell if PDF was actually parsed by Tika

2010-03-29 Thread Lance Norskog

Thanks!

You can search for the document after you index it.

On Fri, Mar 26, 2010 at 1:55 AM, Abdelhamid  ABID  wrote:
> Well done : https://issues.apache.org/jira/browse/SOLR-1847
>
> meanwhile, is there any workaround ?
>
> On 3/26/10, Lance Norskog  wrote:
>>
>> Please file a bug for this on the JIRA.
>>
>> https://issues.apache.org/jira/secure/Dashboard.jspa
>>
>>
>> On Thu, Mar 25, 2010 at 7:21 AM, Abdelhamid  ABID 
>> wrote:
>> > Hi,
>> > When posting pdf files using solrj the only response we get from Solr is
>> > only server response status, but never know whether
>> > pdf was actually parsed or not, checking the log I found that some Tika
>> > wasn't able
>> > to succeed with some pdf files because of content nature (texts in images
>> > only) or are corrupted:
>> >
>> >     25 mars 2010 14:54:07 org.apache.pdfbox.util.PDFStreamEngine
>> > processOperator
>> >     INFO: unsupported/disabled operation: EI
>> >
>> >     25 mars 2010 14:54:02 org.apache.pdfbox.filter.FlateFilter decode
>> >     GRAVE: Stop reading corrupt stream
>> >
>> >
>> > The question is how can I catch these kinds of exceptions through Solrj ?
>> >
>> > --
>> > Elsadek
>> >
>>
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>
>
>
> --
> Abdelhamid ABID
> Software Engineer- J2EE / WEB / ESB MULE
>



-- 
Lance Norskog
goks...@gmail.com

Re: How can I do this in Solr?

2010-03-25 Thread Lance Norskog

You can create a field 'staff' with field values AAA_manager and
BBB_coordinator. This preserves the database relationship.

In general, think of a Solr index as one database table: you have to
flatten (denormalize) a standard database schema.

2010/3/25 scott chu :
> I have a input xml data file & it has a  'Reporters' tag looks like this:
>
> 
>    
>        AAA
>        manager
>    
>    
>        BBB
>        coordinator
>    
> 
>
> You see name & title are paired. As I know, Solr only support a field with 
> mutliple value of primitive type, e.g. string. But in my case, it's a field 
> with mutiple value of another paired name-title values. How can I configure 
> Solr to deal with this case?
>
> Best Regards,
>
> Scott Chu
>



-- 
Lance Norskog
goks...@gmail.com

Re: How to add a new field to existing document (append a new field to already existing document)

2010-03-25 Thread Lance Norskog

There is no feature in Lucene that allows it to append new fields to
an existing document. You have to re-index the entire thing.

On Thu, Mar 25, 2010 at 6:56 PM, bbarani  wrote:
>
> Hi,
>
> I have a peculiar situation,
>
> I am having my DIH dataconfig file as below
>
>  ---> object
>  cachekey=y.id cachevalue=x.id> --> object properties
>
> Now since I am using Cachedsql entity processor I am getting in to out of
> memory exception very often so to tackle the issue I thought of adding a
> filter criteria to my entity Y in such a way that only few properties are
> indexed at one point of time.
>
> I thought of running these queries again with another 2 properties after my
> first indexing is complete. My issue now is that the I want the properties
> to get appended  (in each document) after each indexing is complete. As of
> now the documents itself is getting replace instead I want to append new
> fields to the documents. I searched the nabble forum to find out a solution
> for this issue but couldnt find any useful suggestions / tips. It would be
> great if someone can provide pointers / suggestions to overcome this issue.
>
> Thanks,
> B
> --
> View this message in context: 
> http://n3.nabble.com/How-to-add-a-new-field-to-existing-document-append-a-new-field-to-already-existing-document-tp596093p596093.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Lance Norskog
goks...@gmail.com

Re: multiple binary documents into a single solr document - Vignette/OpenText integration

2010-03-25 Thread Lance Norskog

Do you want to index the text in the attachments?

If so, you probably are better off creating a unique document for the
mail body and each attachment. A field in the document could give the
id of the main email document. The main email document could contain a
multivalued field giving all of the attachment ids.

On Thu, Mar 25, 2010 at 10:14 AM, Chris Hostetter
 wrote:
>
> : > I tried calling the addFile() twice (one call for each file) and no
> : > error but nothing getting indexed as well.
>        ...
> : Write your own RequestHandler that uses the existing 
> ExtractingRequestHandler
> : to actually parse the streams, and then you combine the results arbitrarily 
> in
> : your handler, eventually sending an AddUpdateCommand to the update 
> processor.
> : You can obtain both the update processor and SolrCell instance from
> : req.getCore().
>
> The key bit being: yes you contain attach multiple files to your request,
> and yes the SolrQueryRequest abstraction can handle that (it appears as
> two "ContentStreams" to the RequestHandler) but the existing
> ExtractingRequestHandler assumes there will only be one ContentStream and
> constructsa one document for it -- the API isn't really designed arround
> the idea of how to generate a single SolrInputDOcument from multipole
> COntentStreams (where would you get the "title" from? etc...)
>
> There was talk about trying to generalize this, but i don't think anyone
> else has looked into it much.  Here's one refrence, but i definitely
> remember a more recent thread about this idea...
>
> http://n3.nabble.com/ExtractingRequestHandler-and-XmlUpdateHandler-tt492202.html#a492211
>
>
>
> -Hoss
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Threads blocking on solr slave servers

2010-03-25 Thread Lance Norskog

gt;
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
> at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
> at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>
> at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
> at java.lang.Thread.run(Thread.java:595)
>
> Thanks
> Dipti
>



-- 
Lance Norskog
goks...@gmail.com

Re: expungeDeletes on commit in Dataimport

2010-03-25 Thread Lance Norskog

Oops- solrconfig.xml does not include an option for autocommit to use
expungeDeletes. You will have to do a  operation directly.

On Thu, Mar 25, 2010 at 8:18 PM, Lance Norskog  wrote:
> You can do autoCommit in solrconfig.xml. This runs regular commits
> independently of the DataImportHandler.
>
> On Thu, Mar 25, 2010 at 9:44 AM, Ruben Chadien  
> wrote:
>> Hi
>>
>> I know this has been discussed before, but is there any way do 
>> expungeDeletes=true when the DataImportHandler does the commit.
>> I am using the deleteDocByQuery in a Transformer when doing a delta-import 
>> and as discussed before the documents are not deleted until restart.
>>
>> Also, how do i know in a Transformer if its running a Delta or Full Import , 
>> i tries looking at Context. currentProcess() but that gives me "FULL_DUMP" 
>> when doing a delta import...?
>>
>> Thanks!
>> Ruben Chadien
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>



-- 
Lance Norskog
goks...@gmail.com

Re: expungeDeletes on commit in Dataimport

2010-03-25 Thread Lance Norskog

You can do autoCommit in solrconfig.xml. This runs regular commits
independently of the DataImportHandler.

On Thu, Mar 25, 2010 at 9:44 AM, Ruben Chadien  wrote:
> Hi
>
> I know this has been discussed before, but is there any way do 
> expungeDeletes=true when the DataImportHandler does the commit.
> I am using the deleteDocByQuery in a Transformer when doing a delta-import 
> and as discussed before the documents are not deleted until restart.
>
> Also, how do i know in a Transformer if its running a Delta or Full Import , 
> i tries looking at Context. currentProcess() but that gives me "FULL_DUMP" 
> when doing a delta import...?
>
> Thanks!
> Ruben Chadien



-- 
Lance Norskog
goks...@gmail.com

Re: solr highlighting

2010-03-25 Thread Lance Norskog

To display html-markup in an html page, it has to be in entity-encoded
form. So, encode the <> as entities in your input application, and
have it indexed and stored in this format. Then, the  are
inserted as normal. This gives you the html text displayable in an
html page, with all words highlightable. And add gt/lt etc. as
stopwords.

At this point you have the element names, attribute names and values,
and text parts searchable and highlightable. If you only want the HTML
syntax parts shown, the PatternReplaceFilter is your friend: with
regex patterns you can pull out those values and ignore the text
parts.

The analysis.jsp page will make it much much easier to debug this.

Good luck!

On Thu, Mar 25, 2010 at 8:21 AM, Niraj Aswani  wrote:
> Hi,
>
> I am using the following two parameters to highlight the hits.
>
> "hl.simple.pre=" + URLEncoder.encode("")
> "hl.simple.post=" + URLEncoder.encode("")
>
> This seems to work.  However, there is a bit of trouble when the text itself
> contains html markup.
>
> For example, I have indexed a document with the following text in it.
> ===
> something here...
> xyz
> something here..
> ===
>
> When I search for the keyword choice, what it does is, it inserts ""
> just before the word choice and "" immediately after the word
> choice. It results into something like below:
>
> <choice minOccurs="1"
> maxOccurs="unbounded">xyzchoice>
>
>
> I would like it to be something like:
>
> <choice minOccurs="1"
> maxOccurs="unbounded">xyz/choice>
>
> Is there any way to do it such that the highlight content is encoded as HTML
> but the prefix and suffix are not?
>
> Thanks,
> Niraj
>
>
>
> When I issue a query, it returns all the corret
>

-- 
Lance Norskog
goks...@gmail.com

Re: Solrj doesn't tell if PDF was actually parsed by Tika

2010-03-25 Thread Lance Norskog

Please file a bug for this on the JIRA.

https://issues.apache.org/jira/secure/Dashboard.jspa

On Thu, Mar 25, 2010 at 7:21 AM, Abdelhamid  ABID  wrote:
> Hi,
> When posting pdf files using solrj the only response we get from Solr is
> only server response status, but never know whether
> pdf was actually parsed or not, checking the log I found that some Tika
> wasn't able
> to succeed with some pdf files because of content nature (texts in images
> only) or are corrupted:
>
>     25 mars 2010 14:54:07 org.apache.pdfbox.util.PDFStreamEngine
> processOperator
>     INFO: unsupported/disabled operation: EI
>
>     25 mars 2010 14:54:02 org.apache.pdfbox.filter.FlateFilter decode
>     GRAVE: Stop reading corrupt stream
>
>
> The question is how can I catch these kinds of exceptions through Solrj ?
>
> --
> Elsadek
>



-- 
Lance Norskog
goks...@gmail.com

Re: multicore embedded swap / reload etc.

2010-03-25 Thread Lance Norskog

All operations through the SolrJ work exactly the same against the
Solr web app and embedded Solr. You code the calls to update cores
with the same SolrJ APIs either way.

On Wed, Mar 24, 2010 at 2:19 PM, Nagelberg, Kallin
 wrote:
> Hi,
>
> I've got a situation where I need to reindex a core once a day. To do this I 
> was thinking of having two cores, one 'live' and one 'staging'. The app is 
> always serving 'live', but when the daily index happens it goes into 
> 'staging', then staging is swapped into 'live'. I can see how to do this sort 
> of thing over http, but I'm using an embedded solr setup via solrJ. Any 
> suggestions on how to proceed? I could just have two solrServer's built from 
> different coreContainers, and then swap the references when I'm ready, but I 
> wonder if there is a better approach. Maybe grab a hold of the 
> CoreAdminHandler?
>
> Thanks,
> Kallin Nagelberg
>



-- 
Lance Norskog
goks...@gmail.com

Re: Impossible Boost Query?

2010-03-25 Thread Lance Norskog

The RandomValueSource class is available as a sort value, but it is
not available as a function. If it was, you could include the function
as part of the relevance but not all of it.

On Wed, Mar 24, 2010 at 9:41 AM, blargy  wrote:
>
> This sound a little closer to what I want but I don't want fully randomized
> results.
>
> How exactly does this field work? Is it more than just a simple random sort
> (order by rand())? What would be nice is if I could randomize documents
> within a certain score percentage of each other. Is this available?
>
> Thanks
>
>
>
> Lance Norskog-2 wrote:
>>
>> Also, there is a 'random' type which generates random numbers. This
>> might help you also.
>>
>> On Tue, Mar 23, 2010 at 7:18 PM, Lance Norskog  wrote:
>>> At this point (and for almost 3 years :) field collapsing is a source
>>> patch. You have to check out the Solr trunk from the Apache subversion
>>> server, apply the patch with the 'patch' command, and build the new
>>> Solr with 'ant'.
>>>
>>> On Tue, Mar 23, 2010 at 4:13 PM, blargy  wrote:
>>>>
>>>> Thanks but Im not quite show on how to apply the patch. I just use the
>>>> packaged solr-1.4.0.war in my deployment (no compiling, etc). Is there a
>>>> way
>>>> I can patch the war file?
>>>>
>>>> Any instructions would be greatly appreciated. Thanks
>>>>
>>>>
>>>> Otis Gospodnetic wrote:
>>>>>
>>>>> You'd likely want to get the latest patch and trunk and try applying.
>>>>>
>>>>> Otis
>>>>> 
>>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>>>> Hadoop ecosystem search :: http://search-hadoop.com/
>>>>>
>>>>>
>>>>>
>>>>> - Original Message 
>>>>>> From: blargy 
>>>>>> To: solr-user@lucene.apache.org
>>>>>> Sent: Tue, March 23, 2010 6:10:22 PM
>>>>>> Subject: Re: Impossible Boost Query?
>>>>>>
>>>>>>
>>>>> Maybe a better question is... how can I install this and will it work
>>>>>> with
>>>>> 1.4?
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> blargy wrote:
>>>>>>
>>>>>> Possibly.
>>>>>> How can I install this as a contrib or do I need to actually
>>>>>> perform the
>>>>>> patch?
>>>>>>
>>>>>>
>>>>>> Otis Gospodnetic wrote:
>>>>>>>
>>>>>>
>>>>>>> Would Field Collapsing from SOLR-236 do the job for
>>>>>> you?
>>>>>>>
>>>>>>> Otis
>>>>>>> 
>>>>>>> Sematext ::
>>>>>> href="http://sematext.com/"; target=_blank >http://sematext.com/ ::
>>>>>> Solr -
>>>>>> Lucene - Nutch
>>>>>>> Hadoop ecosystem search ::
>>>>>> href="http://search-hadoop.com/"; target=_blank
>>>>>> >http://search-hadoop.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>> - Original Message 
>>>>>>>> From: blargy <
>>>>>> ymailto="mailto:zman...@hotmail.com";
>>>>>> href="mailto:zman...@hotmail.com";>zman...@hotmail.com>
>>>>>>>>
>>>>>> To:
>>>>>> href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org
>>>>>>>>
>>>>>> Sent: Tue, March 23, 2010 2:39:48 PM
>>>>>>>> Subject: Impossible Boost
>>>>>> Query?
>>>>>>>>
>>>>>>>>
>>>>>>> I was wondering if this is
>>>>>> even possible. I'll try to explain what I'm
>>>>>>>> trying
>>>>>>>
>>>>>> to do to the best of my ability.
>>>>>>>
>>>>>>> Ok, so our site has a
>>>>>> bunch
>>>>>>>> of products that are sold by any number of
>>>>>>>
>>>>>> sellers. Currently when I search
>>>>>>>>

Re: HTTP Status 500 - null java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:249)

2010-03-23 Thread Lance Norskog

That area of the Lucene code throws NullPEs and ArrayIndex bugs, but
they are all caused by corrupt indexes. They should be caught and
wrapped.

On Tue, Mar 23, 2010 at 4:33 PM, Chris Hostetter
 wrote:
>
> : I am doing a really simple query on my index (it's running in tomcat):
> :
> : http://host:8080/solr_er_07_09/select/?q=hash_id:123456
>        ...
>
> details please ...
>
>    http://wiki.apache.org/solr/UsingMailingLists
>
> ... what version of solr? lucene? tomcat?
>
> : I built the index on a different machine than the one I am doing the
>
> ...ditto for that machine.
>
> are you sure hte md5 checksums match for both copies of the index (ie: did
> it get corrupted when you copied it)
>
> what does CheckIndex say about hte index?
>
> : query on though the configuration is exactly the same. I can do the same
> : query using solrj (I have an app doing that) and it works fine.
>
> that seems highly bizzare ... are you certain it's the exact same query?
> what does the tomcat log say about hte two requests?
>
>
>
> -Hoss
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Impossible Boost Query?

2010-03-23 Thread Lance Norskog

Also, there is a 'random' type which generates random numbers. This
might help you also.

On Tue, Mar 23, 2010 at 7:18 PM, Lance Norskog  wrote:
> At this point (and for almost 3 years :) field collapsing is a source
> patch. You have to check out the Solr trunk from the Apache subversion
> server, apply the patch with the 'patch' command, and build the new
> Solr with 'ant'.
>
> On Tue, Mar 23, 2010 at 4:13 PM, blargy  wrote:
>>
>> Thanks but Im not quite show on how to apply the patch. I just use the
>> packaged solr-1.4.0.war in my deployment (no compiling, etc). Is there a way
>> I can patch the war file?
>>
>> Any instructions would be greatly appreciated. Thanks
>>
>>
>> Otis Gospodnetic wrote:
>>>
>>> You'd likely want to get the latest patch and trunk and try applying.
>>>
>>> Otis
>>> 
>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>> Hadoop ecosystem search :: http://search-hadoop.com/
>>>
>>>
>>>
>>> - Original Message 
>>>> From: blargy 
>>>> To: solr-user@lucene.apache.org
>>>> Sent: Tue, March 23, 2010 6:10:22 PM
>>>> Subject: Re: Impossible Boost Query?
>>>>
>>>>
>>> Maybe a better question is... how can I install this and will it work
>>>> with
>>> 1.4?
>>>
>>> Thanks
>>>
>>>
>>> blargy wrote:
>>>>
>>>> Possibly.
>>>> How can I install this as a contrib or do I need to actually
>>>> perform the
>>>> patch?
>>>>
>>>>
>>>> Otis Gospodnetic wrote:
>>>>>
>>>>
>>>>> Would Field Collapsing from SOLR-236 do the job for
>>>> you?
>>>>>
>>>>> Otis
>>>>> 
>>>>> Sematext ::
>>>> href="http://sematext.com/"; target=_blank >http://sematext.com/ :: Solr -
>>>> Lucene - Nutch
>>>>> Hadoop ecosystem search ::
>>>> href="http://search-hadoop.com/"; target=_blank
>>>> >http://search-hadoop.com/
>>>>>
>>>>>
>>>>>
>>>>
>>>>> - Original Message 
>>>>>> From: blargy <
>>>> ymailto="mailto:zman...@hotmail.com";
>>>> href="mailto:zman...@hotmail.com";>zman...@hotmail.com>
>>>>>>
>>>> To:
>>>> href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org
>>>>>>
>>>> Sent: Tue, March 23, 2010 2:39:48 PM
>>>>>> Subject: Impossible Boost
>>>> Query?
>>>>>>
>>>>>>
>>>>> I was wondering if this is
>>>> even possible. I'll try to explain what I'm
>>>>>> trying
>>>>>
>>>> to do to the best of my ability.
>>>>>
>>>>> Ok, so our site has a
>>>> bunch
>>>>>> of products that are sold by any number of
>>>>>
>>>> sellers. Currently when I search
>>>>>> for some product I get back
>>>> all products
>>>>> matching that search term but the
>>>>>>
>>>> problem is there may be multiple products
>>>>> sold by the same seller
>>>> that are
>>>>>> all closely related, therefore their
>>>> scores
>>>>> are related. So basically the
>>>>>> search ends up
>>>> with results that are all
>>>>> closely clumped together by the same
>>>>
>>>>>> seller but I would much rather prefer
>>>>> to distribute
>>>> these results across
>>>>>> sellers (given each seller a fair shot
>>>> to
>>>>> sell their goods).
>>>>>
>>>>> Is there
>>>>
>>>>>> any way to add some boost query for example that will
>>>> start
>>>>> weighing products
>>>>>> lower when their seller has
>>>> already been listed a few
>>>>> times. For example,
>>>>>> right
>>>> now I have
>>>>>
>>>>> Product foo by Seller A
>>>>> Product
>>>> foo by Seller
>>>>>> A
>>>>> Product foo by Seller A
>>&g

Re: Impossible Boost Query?

2010-03-23 Thread Lance Norskog

At this point (and for almost 3 years :) field collapsing is a source
patch. You have to check out the Solr trunk from the Apache subversion
server, apply the patch with the 'patch' command, and build the new
Solr with 'ant'.

On Tue, Mar 23, 2010 at 4:13 PM, blargy  wrote:
>
> Thanks but Im not quite show on how to apply the patch. I just use the
> packaged solr-1.4.0.war in my deployment (no compiling, etc). Is there a way
> I can patch the war file?
>
> Any instructions would be greatly appreciated. Thanks
>
>
> Otis Gospodnetic wrote:
>>
>> You'd likely want to get the latest patch and trunk and try applying.
>>
>> Otis
>> 
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Hadoop ecosystem search :: http://search-hadoop.com/
>>
>>
>>
>> - Original Message 
>>> From: blargy 
>>> To: solr-user@lucene.apache.org
>>> Sent: Tue, March 23, 2010 6:10:22 PM
>>> Subject: Re: Impossible Boost Query?
>>>
>>>
>> Maybe a better question is... how can I install this and will it work
>>> with
>> 1.4?
>>
>> Thanks
>>
>>
>> blargy wrote:
>>>
>>> Possibly.
>>> How can I install this as a contrib or do I need to actually
>>> perform the
>>> patch?
>>>
>>>
>>> Otis Gospodnetic wrote:
>>>>
>>>
>>>> Would Field Collapsing from SOLR-236 do the job for
>>> you?
>>>>
>>>> Otis
>>>> 
>>>> Sematext ::
>>> href="http://sematext.com/"; target=_blank >http://sematext.com/ :: Solr -
>>> Lucene - Nutch
>>>> Hadoop ecosystem search ::
>>> href="http://search-hadoop.com/"; target=_blank
>>> >http://search-hadoop.com/
>>>>
>>>>
>>>>
>>>
>>>> - Original Message 
>>>>> From: blargy <
>>> ymailto="mailto:zman...@hotmail.com";
>>> href="mailto:zman...@hotmail.com";>zman...@hotmail.com>
>>>>>
>>> To:
>>> href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org
>>>>>
>>> Sent: Tue, March 23, 2010 2:39:48 PM
>>>>> Subject: Impossible Boost
>>> Query?
>>>>>
>>>>>
>>>> I was wondering if this is
>>> even possible. I'll try to explain what I'm
>>>>> trying
>>>>
>>> to do to the best of my ability.
>>>>
>>>> Ok, so our site has a
>>> bunch
>>>>> of products that are sold by any number of
>>>>
>>> sellers. Currently when I search
>>>>> for some product I get back
>>> all products
>>>> matching that search term but the
>>>>>
>>> problem is there may be multiple products
>>>> sold by the same seller
>>> that are
>>>>> all closely related, therefore their
>>> scores
>>>> are related. So basically the
>>>>> search ends up
>>> with results that are all
>>>> closely clumped together by the same
>>>
>>>>> seller but I would much rather prefer
>>>> to distribute
>>> these results across
>>>>> sellers (given each seller a fair shot
>>> to
>>>> sell their goods).
>>>>
>>>> Is there
>>>
>>>>> any way to add some boost query for example that will
>>> start
>>>> weighing products
>>>>> lower when their seller has
>>> already been listed a few
>>>> times. For example,
>>>>> right
>>> now I have
>>>>
>>>> Product foo by Seller A
>>>> Product
>>> foo by Seller
>>>>> A
>>>> Product foo by Seller A
>>>>
>>> Product foo by Seller B
>>>> Product foo by Seller
>>>>>
>>> B
>>>> Product foo by Seller B
>>>> Product foo by Seller
>>> C
>>>> Product foo by Seller
>>>>> C
>>>> Product foo
>>> by Seller C
>>>>
>>>> where each result is very close in score. I
>>>
>>>>> would like something like this
>>>>
>>>> Product
>>> foo by Seller A
>>>> Product foo by
>>>>> Seller B
>>>>
>>> Product foo by Seller C
>>>> Product foo by Seller A
>>>> Product
>>> foo by
>>>>> Seller B
>>>> Product foo by Seller C
>>>>
>>> 
>>>>
>>>> basically distributing the
>>>>>
>>> results over the sellers. Is something like this
>>>> possible? I don't
>>> care if
>>>>> the solution involves a boost query or not. I
>>> just
>>>> want some way to
>>>>> distribute closely related
>>> documents.
>>>>
>>>> Thanks!!!
>>>> --
>>>> View
>>> this
>>>>> message in context:
>>>>> href="
>>> href="http://old.nabble.com/Impossible-Boost-Query--tp28005354p28005354.html";
>>> target=_blank
>>> >http://old.nabble.com/Impossible-Boost-Query--tp28005354p28005354.html";
>>>
>>>>> target=_blank
>>>>> >
>>> href="http://old.nabble.com/Impossible-Boost-Query--tp28005354p28005354.html";
>>> target=_blank
>>> >http://old.nabble.com/Impossible-Boost-Query--tp28005354p28005354.html
>>>>
>>> Sent
>>>>> from the Solr - User mailing list archive at
>>> Nabble.com.
>>>>
>>>>
>>>
>>>
>>
>> --
>> View this
>>> message in context:
>>> href="http://old.nabble.com/Impossible-Boost-Query--tp28005354p28007880.html";
>>> target=_blank
>>> >http://old.nabble.com/Impossible-Boost-Query--tp28005354p28007880.html
>> Sent
>>> from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
> --
> View this message in context: 
> http://old.nabble.com/Impossible-Boost-Query--tp28005354p28008495.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: SOLR-1316 How To Implement this autosuggest component ???

2010-03-23 Thread Lance Norskog

You need 'ant' to do builds.  At the top level, do:
ant clean
ant example

These will build everything and set up the example/ directory. After that, run:
ant test-core

to run all of the unit tests and make sure that the build works. If
the autosuggest patch has a test, this will check that the patch went
in correctly.

Lance

On Tue, Mar 23, 2010 at 7:42 AM, stocki  wrote:
>
> okay,
> i do this..
>
> but one file are not right updatet 
> Index: trunk/src/java/org/apache/solr/util/HighFrequencyDictionary.java
> (from the suggest.patch)
>
> i checkout it from eclipse, apply patch, make an new solr.war ... its the
> right way ??
> i thought that is making a war i didnt need to make an build.
>
> how do i make an build ?
>
>
>
>
> Alexey-34 wrote:
>>
>>> Error loading class 'org.apache.solr.spelling.suggest.Suggester'
>> Are you sure you applied the patch correctly?
>> See http://wiki.apache.org/solr/HowToContribute#Working_With_Patches
>>
>> Checkout Solr trunk source code (
>> http://svn.apache.org/repos/asf/lucene/solr/trunk ), apply patch,
>> verify that everything went smoothly, build solr and use built version
>> for your tests.
>>
>> On Mon, Mar 22, 2010 at 9:42 PM, stocki  wrote:
>>>
>>> i patch an nightly build from solr.
>>> patch runs, classes are in the correct folder, but when i replace
>>> spellcheck
>>> with this spellchecl like in the comments, solr cannot find the classes
>>> =(
>>>
>>> 
>>>    
>>>      suggest
>>>      >> name="classname">org.apache.solr.spelling.suggest.Suggester
>>>      >> name="lookupImpl">org.apache.solr.spelling.suggest.jaspell.JaspellLookup
>>>      text
>>>      american-english
>>>    
>>>  
>>>
>>>
>>> --> SCHWERWIEGEND: org.apache.solr.common.SolrException: Error loading
>>> class
>>> 'org.ap
>>> ache.solr.spelling.suggest.Suggester'
>>>
>>>
>>> why is it so ??  i think no one has so many trouble to run a patch
>>> like
>>> me =( :D
>>>
>>>
>>> Andrzej Bialecki wrote:
>>>>
>>>> On 2010-03-19 13:03, stocki wrote:
>>>>>
>>>>> hello..
>>>>>
>>>>> i try to implement autosuggest component from these link:
>>>>> http://issues.apache.org/jira/browse/SOLR-1316
>>>>>
>>>>> but i have no idea how to do this !?? can anyone get me some tipps ?
>>>>
>>>> Please follow the instructions outlined in the JIRA issue, in the
>>>> comment that shows fragments of XML config files.
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Andrzej Bialecki     <><
>>>>   ___. ___ ___ ___ _ _   __
>>>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>>>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>>>> http://www.sigram.com  Contact: info at sigram dot com
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/SOLR-1316-How-To-Implement-this-autosuggest-component-tp27950949p27990809.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: 
> http://old.nabble.com/SOLR-1316-How-To-Implement-this-patch-autoComplete-tp27950949p28001938.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: synonyms problem

2010-03-22 Thread Lance Norskog

How large is the document, and how often does 'aberrant' appear in it?
Are the other words also in the document?

What is the full analysis stack? There might be interactions between
the SynonymFilter and other filters.

What does the admin/analysis.jsp page show? Does it throw OutOfMemory also?

Does stemming turn two of the terms into the same term?

On Mon, Mar 22, 2010 at 7:48 AM, Armando Ota  wrote:
> Have you tried increasing memory size ?
>
> we had some out of memory problems when we used default memory size ..
>
> Kind regards
>
> Armando
>
> michaelnazaruk wrote:
>>
>> Hi all! I have a little problem with synonyms:
>> when I set my synonyms.txt file such as:
>>
>> aberrant=>abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical
>> it's all right! But if I set this file such as
>>
>> aberrant,abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical
>> I get exception that not enough memory
>>
>>
>

-- 
Lance Norskog
goks...@gmail.com

Re: Features not present in Solr

2010-03-22 Thread Lance Norskog

About Text Analysis: "Natural Language Processing" is the more usual
term. Finding parts of speech, isolating people's names, etc.

On Mon, Mar 22, 2010 at 12:27 PM, Israel Ekpo  wrote:
> On Mon, Mar 22, 2010 at 3:16 PM, Lance Norskog  wrote:
>
>> Web crawling.
>
>
> I don't think Solr was designed with Web Crawling in mind. Nutch would be
> more better suited for that, I believe.
>
>
>> Text analysis.
>>
>
> This is a bit vague.
>
> Please elaborate further. There is a lot of analysis (stemming, stop-word
> removal, character transformation etc) that takes place already though
> implicitly based on what fields you define and use in the schema.
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>
>
>> Distributed index management.
>> A fanatical devotion to the Pope.
>>
>> There a probably a lot of features already available in Solr out of the box
> that most of those other "enterprise level" applications do not have yet.
>
> You would also be surprised to learn that a lot of them use Lucene under the
> covers and are actually trying to re-implement what is already available in
> Solr.
>
>
>> On Sun, Mar 21, 2010 at 11:19 PM, MitchK  wrote:
>> >
>> > Srikanth,
>> >
>> > I don't know anything about Endeca, so I can't compare Solr to it.
>> > However, I know Solr is powerful. Very powerful.
>> > So, maybe you should tell us more about your needs to get a good answer.
>> >
>> > As a response to your second question: You should not expect that Solr is
>> > a database. It is an index-server. A database makes your data save. If
>> there
>> > goes something wrong - which is always possible - Solr gives no
>> warranties.
>> > Maybe someone other can tell you more about this topic.
>> >
>> > - Mitch
>> >
>> >
>> > Srikanth B wrote:
>> >>
>> >> Hello
>> >>
>> >> We are in the process of researching on Solr features. I am looking for
>> >> two
>> >> things
>> >>         1. Features not available in Solr but present in other products
>> >> like
>> >> Endeca
>> >>         2. What one shouldn't not expect from Solr
>> >>
>> >> Any thoughts ?
>> >>
>> >> Thanks in advance
>> >> Srikanth
>> >>
>> >>
>> >
>> > --
>> > View this message in context:
>> http://old.nabble.com/Features-not-present-in-Solr-tp27966315p27982734.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>> >
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>
>
>
> --
> "Good Enough" is not good enough.
> To give anything less than your best is to sacrifice the gift.
> Quality First. Measure Twice. Cut Once.
> http://www.israelekpo.com/
>



-- 
Lance Norskog
goks...@gmail.com

Re: Query interface

2010-03-22 Thread Lance Norskog

There are several response formats available for Solr:

http://wiki.apache.org/solr/QueryResponseWriter

Also, XSLT scripts and Velocity scripts are available for
pre-processing output formats.

On Mon, Mar 22, 2010 at 9:00 AM, Armando Ota  wrote:
> Hey ...
>
> Thank you very much .. been strugling with this for hours now :(
>
> Will have to change the feature .. somehow :D
>
> Kind regards
>
> Armando
>
> Abdelhamid ABID wrote:
>>
>> Hi,
>> I think there isn't better than using XSLT as a mean to query solr and
>> render results.
>> Within an xslt file you would combine search form with search results in
>> one
>> place, by this way you free the server from the heavy duty tasks of xslt
>> transformation and let the client -which is in the most cases a browser-
>> do
>> the work.
>>
>> On 3/22/10, Gora Mohanty  wrote:
>>
>>>
>>> On Mon, 22 Mar 2010 15:26:41 +0100
>>> Sebastian Funk  wrote:
>>>
>>>
>>>>
>>>> hey there,
>>>>
>>>> i've been using solr for some time now and set everything up the
>>>> way it's supposed to..
>>>> now for the user interface: simply writing a javascript (or
>>>> something else) website that passes the query-URL to solr and
>>>> interprets the XML given as a result. is that the easiest way?
>>>> i've noticed some problems with umlauts etc.. when using jetty or
>>>> tomcat as a server..
>>>>
>>>> is there another way to query solr and retrieve the results?
>>>>
>>>
>>> [...]
>>>
>>> Many modern frameworks (I certainly know of Ruby on Rails, and
>>> Django), have Solr integrated via an application. I really like
>>> Django Haystack for how it offers an easy way to get started with
>>> various search back-ends, with a very Django-ish feel to the
>>> interface: http://haystacksearch.org/
>>>
>>> Regards,
>>>
>>> Gora
>>>
>>>
>>
>>
>>
>>
>



-- 
Lance Norskog
goks...@gmail.com

Re: DIH - Categories not indexed ????

2010-03-22 Thread Lance Norskog

Whoops, yes it is in the wiki. A link from the admin page would be welcome.

On Mon, Mar 22, 2010 at 12:37 PM, Lance Norskog  wrote:
> There is a very cool debugger for the DataImportHandler:
>
> http://www.lucidimagination.com/search/document/CDRG_ch06_6.4.9?q=dataimport
> debug jsp
>
> It is not mentioned on the wiki, nor are there any links to it in the
> Solr admin console.
>
> On Mon, Mar 22, 2010 at 8:36 AM, stocki  wrote:
>>
>> Helloo.
>>
>> i have the same database like in this example:
>> http://wiki.apache.org/solr/DataImportHandler?highlight=(dih)#Full_Import_Example
>>
>> this is my data-config.xml
>>
>> 
>>    >            query="select id, shop_id, is_active, order_index,
>> shop_item_number, manufacturer, name, ean, isbn, modified from shop_items">
>>       
>>       
>>       
>>       
>>       
>>       
>>       
>>       
>>           
>>           > dateTimeFormat="-MM-'hh:mm:ss'Z'" />
>>
>>                
>>                        > name="shop_category_id" />
>>
>>                        
>>                                
>>                        
>>                
>>
>>    
>>  
>>
>>
>> i have absolute no idea why solr didnt index the category name and
>> category_id...
>>
>> one product can have more than one values.
>>
>> please help meee someone .. ^^ ;)
>>
>> --
>> View this message in context: 
>> http://old.nabble.com/DIH---Categories-not-indexed--tp27988126p27988126.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>



-- 
Lance Norskog
goks...@gmail.com

Re: DIH - Categories not indexed ????

2010-03-22 Thread Lance Norskog

There is a very cool debugger for the DataImportHandler:

http://www.lucidimagination.com/search/document/CDRG_ch06_6.4.9?q=dataimport
debug jsp

It is not mentioned on the wiki, nor are there any links to it in the
Solr admin console.

On Mon, Mar 22, 2010 at 8:36 AM, stocki  wrote:
>
> Helloo.
>
> i have the same database like in this example:
> http://wiki.apache.org/solr/DataImportHandler?highlight=(dih)#Full_Import_Example
>
> this is my data-config.xml
>
> 
>                query="select id, shop_id, is_active, order_index,
> shop_item_number, manufacturer, name, ean, isbn, modified from shop_items">
>       
>       
>       
>       
>       
>       
>       
>       
>           
>            dateTimeFormat="-MM-'hh:mm:ss'Z'" />
>
>                
>                         name="shop_category_id" />
>
>                        
>                                
>                        
>                
>
>    
>  
>
>
> i have absolute no idea why solr didnt index the category name and
> category_id...
>
> one product can have more than one values.
>
> please help meee someone .. ^^ ;)
>
> --
> View this message in context: 
> http://old.nabble.com/DIH---Categories-not-indexed--tp27988126p27988126.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Features not present in Solr

2010-03-22 Thread Lance Norskog

Web crawling.
Text analysis.
Distributed index management.
A fanatical devotion to the Pope.

On Sun, Mar 21, 2010 at 11:19 PM, MitchK  wrote:
>
> Srikanth,
>
> I don't know anything about Endeca, so I can't compare Solr to it.
> However, I know Solr is powerful. Very powerful.
> So, maybe you should tell us more about your needs to get a good answer.
>
> As a response to your second question: You should not expect that Solr is
> a database. It is an index-server. A database makes your data save. If there
> goes something wrong - which is always possible - Solr gives no warranties.
> Maybe someone other can tell you more about this topic.
>
> - Mitch
>
>
> Srikanth B wrote:
>>
>> Hello
>>
>> We are in the process of researching on Solr features. I am looking for
>> two
>> things
>>         1. Features not available in Solr but present in other products
>> like
>> Endeca
>>         2. What one shouldn't not expect from Solr
>>
>> Any thoughts ?
>>
>> Thanks in advance
>> Srikanth
>>
>>
>
> --
> View this message in context: 
> http://old.nabble.com/Features-not-present-in-Solr-tp27966315p27982734.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com

Chunked streaming upload to Solr

2010-03-21 Thread Lance Norskog

I would like to upload data to Solr for indexing, in chunks, in one
HTTP POST request. Is this possible? What exactly should I set as the
client socket parameters?

What I'm getting is that with the default parameters, the first write
adds a Content-Length matching the size of the first chunk. Solr reads
that as the entire upload. Apparently the right way to handle this
with an HTTP request parameter "Transfer-Encoding" set to "chunked".
(I don't know the total size of the upload.) This results in the HTTP
parser blowing up. Here is the stack trace:

Mar 21, 2010 8:35:18 PM sun.reflect.NativeMethodAccessorImpl invoke0
WARNING: handle failed
java.io.IOException: bad chunk char: 115
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:687)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
log4j:WARN Detected problem with connection: java.net.SocketException:
Unexpected end of file from server
log4j:WARN Detected problem with connection: java.net.SocketException:
Unexpected end of file from server
log4j:WARN Detected problem with connection: java.net.SocketException:
Unexpected end of file from server


Has anyone made this work?

Thanks,

-- 
Lance Norskog
goks...@gmail.com

Re: Indexing CLOB Column in Oracle

2010-03-17 Thread Lance Norskog

This could be the problem: the "text" field in the example schema is
indexed, but not stored. If you query the index with "text:monkeys" it
will find records with "monkeys", but the text field will not appear
in the returned XML because it was not stored.

On Wed, Mar 17, 2010 at 11:17 AM, Neil Chaudhuri
 wrote:
> For those who might encounter a similar issue, merging what I had into a 
> single entity and using getClobVal() did the trick.
>
> In other words:
>
> 
>        
>            
>            
>            
>        
> 
>
> Thanks.
>
>
>
> -Original Message-
> From: Craig Christman [mailto:cchrist...@caci.com]
> Sent: Wednesday, March 17, 2010 11:23 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Indexing CLOB Column in Oracle
>
> To convert an XMLTYPE to CLOB use the getClobVal() method like this:
>
> SELECT d.XML.getClobVal() FROM DOC d WHERE d.ARCHIVE_ID = '${doc.ARCHIVE_ID}'
>
>
> -Original Message-
> From: Shawn Heisey [mailto:s...@elyograg.org]
> Sent: Tuesday, March 16, 2010 7:37 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Indexing CLOB Column in Oracle
>
> Disclaimer:  My Oracle experience is miniscule at best.  I am also a
> beginner at Solr, so grab yourself the proverbial grain of salt.
>
> I googled a bit on CLOB.  One page I found mentioned setting up a view
> to return the data type you want.  Can you use the functions described
> on these pages in either the Solr query or a view?
>
> http://www.oradev.com/dbms_lob.jsp
> http://www.dba-oracle.com/t_dbms_lob.htm
> http://www.praetoriate.com/dbms_packages/ddp_dbms_lob.htm
>
> I also was trying to find a way to convert from xmltype directly to a
> string in a query, but that quickly got way over my level of
> understanding.  I saw hints that it is possible, though.
>
> Shawn
>
> On 3/16/2010 4:59 PM, Neil Chaudhuri wrote:
>> Since my original thread was straying to a new topic, I thought it made 
>> sense to create a new thread of discussion.
>>
>> I am using the DataImportHandler to index 3 fields in a table: an id, a 
>> date, and the text of a document. This is an Oracle database, and the 
>> document is an XML document stored as Oracle's xmltype data type, which is 
>> an instance of oracle.sql.OPAQUE. Still, it is nothing more than a fancy 
>> clob.
>>
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Solr Performance Issues

2010-03-17 Thread Lance Norskog

allocated 4GB to Solr, so the rest of the 4GB is free for the OS
>> > > disk
>> > > > caching.
>> > > >
>> > > > I think that at any point of time, there can be a maximum of > of
>> > > > threads> concurrent requests, which happens to make sense btw (does
>> > it?).
>> > > >
>> > > > As I increase the number of threads, the load average shown by top
>> goes
>> > > up
>> > > > to as high as 80%. But if I keep the number of threads low (~10), the
>> > > load
>> > > > average never goes beyond ~8). So probably thats the number of
>> requests
>> > I
>> > > > can expect Solr to serve concurrently on this index size with this
>> > > > hardware.
>> > > >
>> > > > Can anyone give a general opinion as to how much hardware should be
>> > > > sufficient for a Solr deployment with an index size of ~43GB,
>> > containing
>> > > > around 2.5 million documents? I'm expecting it to serve at least 20
>> > > > requests
>> > > > per second. Any experiences?
>> > > >
>> > > > Thanks
>> > > >
>> > > > On Fri, Mar 12, 2010 at 12:47 AM, Tom Burton-West <
>> > tburtonw...@gmail.com
>> > > > >wrote:
>> > > >
>> > > > >
>> > > > > How much of your memory are you allocating to the JVM and how much
>> > are
>> > > > you
>> > > > > leaving free?
>> > > > >
>> > > > > If you don't leave enough free memory for the OS, the OS won't have
>> a
>> > > > large
>> > > > > enough disk cache, and you will be hitting the disk for lots of
>> > > queries.
>> > > > >
>> > > > > You might want to monitor your Disk I/O using iostat and look at
>> the
>> > > > > iowait.
>> > > > >
>> > > > > If you are doing phrase queries and your *prx file is significantly
>> > > > larger
>> > > > > than the available memory then when a slow phrase query hits Solr,
>> > the
>> > > > > contention for disk I/O with other queries could be slowing
>> > everything
>> > > > > down.
>> > > > > You might also want to look at the 90th and 99th percentile query
>> > times
>> > > > in
>> > > > > addition to the average. For our large indexes, we found at least
>> an
>> > > > order
>> > > > > of magnitude difference between the average and 99th percentile
>> > > queries.
>> > > > > Again, if Solr gets hit with a few of those 99th percentile slow
>> > > queries
>> > > > > and
>> > > > > your not hitting your caches, chances are you will see serious
>> > > contention
>> > > > > for disk I/O..
>> > > > >
>> > > > > Of course if you don't see any waiting on i/o, then your bottleneck
>> > is
>> > > > > probably somewhere else:)
>> > > > >
>> > > > > See
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1
>> > > > > for more background on our experience.
>> > > > >
>> > > > > Tom Burton-West
>> > > > > University of Michigan Library
>> > > > > www.hathitrust.org
>> > > > >
>> > > > >
>> > > > >
>> > > > > >
>> > > > > > On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel <
>> > > siddhantg...@gmail.com
>> > > > > > >wrote:
>> > > > > >
>> > > > > > > Hi everyone,
>> > > > > > >
>> > > > > > > I have an index corresponding to ~2.5 million documents. The
>> > index
>> > > > size
>> > > > > > is
>> > > > > > > 43GB. The configuration of the machine which is running Solr is
>> -
>> > > > Dual
>> > > > > > > Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB
>> > > > cache,
>> > > > > > 8GB
>> > > > > > > RAM, and 250 GB HDD.
>> > > > > > >
>> > > > > > > I'm observing a strange trend in the queries that I send to
>> Solr.
>> > > The
>> > > > > > query
>> > > > > > > times for queries that I send earlier is much lesser than the
>> > > queries
>> > > > I
>> > > > > > > send
>> > > > > > > afterwards. For instance, if I write a script to query solr
>> 5000
>> > > > times
>> > > > > > > (with
>> > > > > > > 5000 distinct queries, most of them containing not more than
>> 3-5
>> > > > words)
>> > > > > > > with
>> > > > > > > 10 threads running in parallel, the average times for queries
>> > goes
>> > > > from
>> > > > > > > ~50ms in the beginning to ~6000ms. Is this expected or is there
>> > > > > > something
>> > > > > > > wrong with my configuration. Currently I've configured the
>> > > > > > queryResultCache
>> > > > > > > and the documentCache to contain 2048 entries (hit ratios for
>> > both
>> > > is
>> > > > > > close
>> > > > > > > to 50%).
>> > > > > > >
>> > > > > > > Apart from this, a general question that I want to ask is that
>> is
>> > > > such
>> > > > > a
>> > > > > > > hardware enough for this scenario? I'm aiming at achieving
>> around
>> > > 20
>> > > > > > > queries
>> > > > > > > per second with the hardware mentioned above.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > >
>> > > > > > > Regards,
>> > > > > > >
>> > > > > > > --
>> > > > > > > - Siddhant
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > - Siddhant
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > View this message in context:
>> > > > >
>> > http://old.nabble.com/Solr-Performance-Issues-tp27864278p27868456.html
>> > > > > Sent from the Solr - User mailing list archive at Nabble.com.
>> > > > >
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > - Siddhant
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > - Siddhant
>> >
>>
>
>
>
> --
> - Siddhant
>



-- 
Lance Norskog
goks...@gmail.com

Re: Replication failed due to HTTP PROXY?

2010-03-17 Thread Lance Norskog

; )
>        at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
> 67)
>        at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
> a:216)
>        at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
> 81)
>        at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
> 12)
>        at
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>
>        at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:268)
>        at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
>        at
> org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:43
> 1)
>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>        at
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487
> )
>        at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
> 67)
>        at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
> a:216)
>        at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
> 81)
>        at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
> 12)
>        at
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>
>        at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:268)
>        at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
> r.java:264)
>        at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
> Handler.java:1089)
>        at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
> 65)
>        at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
> a:216)
>        at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
> 81)
>        at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
> 12)
>        at
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>
>        at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand
> lerCollection.java:211)
>        at
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.
> java:114)
>        at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1
> 39)
>        at org.mortbay.jetty.Server.handle(Server.java:285)
>        at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:50
> 2)
>        at
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpCo
> nnection.java:821)
>        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
>        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
>        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
>        at
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.
> java:226)
>        at
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool
> .java:442)
> Caused by: java.net.SocketTimeoutException: connect timed out
>        at java.net.PlainSocketImpl.socketConnect(Native Method)
>        at java.net.PlainSocketImpl.doConnect(Unknown Source)
>        at java.net.PlainSocketImpl.connectToAddress(Unknown Source)
>        at java.net.PlainSocketImpl.connect(Unknown Source)
>        at java.net.SocksSocketImpl.connect(Unknown Source)
>        at java.net.Socket.connect(Unknown Source)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>        at java.lang.reflect.Method.invoke(Unknown Source)
>        at
> org.apache.commons.httpclient.protocol.ReflectionSocketFactory.create
> Socket(ReflectionSocketFactory.java:140)
>        ... 58 more
> Mar 17, 2010 8:39:17 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=null path=null params={command=details} status=0 QTime=5016
> Mar 17, 2010 8:42:04 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
>
>
> Thanks,
> Barani
> --
> View this message in context: 
> http://old.nabble.com/Replication-failed-due-to-HTTP-PROXY--tp27933577p27933577.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Exception encountered during replication on slave....Any clues?

2010-03-17 Thread Lance Norskog

The localhost URLs have no port numbers.

Is there a more complete error in the logs?

On Wed, Mar 17, 2010 at 9:15 AM, JavaGuy84  wrote:
>
> Hi William,
>
> We are facing the same issue as yourself.. just thought of checking if you
> had already resolve this issue?
>
> Thanks,
> Barani
>
>
> William Pierce-3 wrote:
>>
>> Folks:
>>
>> I am seeing this exception in my logs that is causing my replication to
>> fail.    I start with  a clean slate (empty data directory).  I index the
>> data on the postingsmaster using the dataimport handler and it succeeds.
>> When the replication slave attempts to replicate it encounters this error.
>>
>> Dec 7, 2009 9:20:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
>> SEVERE: Master at: http://localhost/postingsmaster/replication is not
>> available. Index fetch failed. Exception: Invalid version or the data in
>> not in 'javabin' format
>>
>> Any clues as to what I should look for to debug this further?
>>
>> Replication is enabled as follows:
>>
>> The postingsmaster solrconfig.xml looks as follows:
>>
>> 
>>     
>>       
>>       commit
>>       
>>       
>>     
>>   
>>
>> The postings slave solrconfig.xml looks as follows:
>>
>> 
>>     
>>         
>>         > name="masterUrl">http://localhost/postingsmaster/replication
>>         
>>         00:05:00
>>      
>>   
>>
>>
>> Thanks,
>>
>> - Bill
>>
>>
>>
>>
>
> --
> View this message in context: 
> http://old.nabble.com/Exception-encountered-during-replication-on-slaveAny-clues--tp26684769p27933575.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: XML data in solr field

2010-03-17 Thread Lance Norskog

You can use dynamic fields (wildcard field names) to add any and all
element names. You would have to add a suffix to every element name in
your preparation, but you will not have to add all of the element
names to your schema.

On Wed, Mar 17, 2010 at 7:04 AM, Walter Underwood  wrote:
> Have you considered an XML database? Because this is exactly what they are 
> designed to do.
>
> eXist is open source, or you can use Mark Logic (my employer), which is much 
> faster and more scalable. We do give out free academic and community licenses 
> for Mark Logic.
>
> wunder
>
> On Mar 16, 2010, at 11:04 PM, Nair, Manas wrote:
>
>> Thankyou Tommy. But the real problem here is that the xml is dynamic and the 
>> element names will be different in different docs which means that there 
>> will be a lot of field names to be added in schema if I were to index those 
>> xml nodes separately.
>> Is it possible to have nested indexing (xml within xml) in solr without the 
>> overhead of adding all those inner xml nodes as actual fields in solr schema?
>>
>> Manas
>>
>> 
>>
>> From: Tommy Chheng [mailto:tommy.chh...@gmail.com]
>> Sent: Tue 3/16/2010 5:05 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: XML data in solr field
>>
>>
>>
>>
>>  Do you have the option of just importing each xml node as a
>> field/value when you add the document?
>>
>> That'll let you do the search easily. If you need to store the raw XML,
>> you can use an extra field.
>>
>> Tommy Chheng
>> Programmer and UC Irvine Graduate Student
>> Twitter @tommychheng
>> http://tommy.chheng.com <http://tommy.chheng.com/>
>>
>>
>> On 3/16/10 12:59 PM, Nair, Manas wrote:
>>> Hello Experts,
>>>
>>> I need help on this issue of mine. I am unsure if this scenario is possible.
>>> I have a field in my solr document named, the value of which is a 
>>> xml string as below. This xml structure is within the inputxml field value. 
>>> I needed help on searching this xml structure i.e. if I search  for Venue, 
>>> I should get "Radio City Music Hall" as the result and not the complete tag 
>>> like. Is this supported in solr?? If 
>>> it is, how can this be implemented??
>>>
>>> 
>>> 
>>> http://bit.ly/Rndab"; />
>>> 
>>> 
>>> 
>>>
>>> Any help is appreciated. I donot need the tag name in the result, instead I 
>>> need the tag value.
>>>
>>> Thanks in advance,
>>> Manas Nair
>>>
>
>
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Will Solr fit our needs?

2010-03-17 Thread Lance Norskog

Another option is the ExternalFileField:

http://www.lucidimagination.com/search/document/CDRG_ch04_4.4.4?q=ExternalFileField

This lets you store the current prices for all items in a separate
file. You can only use it in a function query, that is. But it does
allow you to maintain one Solr index, which is very very worthwhile.

On Wed, Mar 17, 2010 at 4:19 AM, Geert-Jan Brits  wrote:
> If you dont' plan on filtering/ sorting and/or faceting on fast-changing
> fields it would be better to store them outside of solr/lucene in my
> opinion.
>
> If you must: for indexing-performance reasons you will probably end up with
> maintaining seperate indices (1 for slow-changing/static fields and 1 for
> fast-changing-fields) .
> You frequently commit the fast-changing -index to incorporate the changes
> in current_price. Afterwards you have 2 options I believe:
>
> 1. use parallelreader to query the seperate indices directly. Afaik, this is
> not (completely) integrated in Solr... I wouldn't recommend it.
> 2. after you commit the fast-changing-index, merge with the static-index.
> You're left with 1 fresh index, which you can push to your slave-servers.
> (all this in regular interverals)
>
> Disadvatages:
> - In any way, you must be very careful with maintaining multiple parallel
> indexes with the purpose of treating them as one. For instance document
> inserts must be done exactly in the same order, otherwise the indices go
> 'out-of-sync' and are unusable.
> - higher maintenance
> - there is always a time-window in which the current_price values are stale.
> If that's within reqs that's ok.
>
> The other path, which I recommend, would be to store the current_price
> outside of solr (like you're currently doing) but instead of using a
> relational db, try looking into persistent key-value stores. Many of them
> exist and a lot of progress has been made in the last couple of years. For
> simple key-lookups (what you need as far as I can tell) they really blow
> every relational db out of the water (considering the same hardware of
> course)
>
> We're currently using Tokyo Cabinet with the server-frontend Tokyo Tyrant
> and seeing almost a 5x increased in lookup performance compared to our
> previous kv-store memcachedDB which is based on BerkelyDB. Memcachedb was
> already several times faster than our mysql-setup (although not optimally
> tuned) .
>
> to sum things up: use the best tools for what they were meant to do.
>
> - index/search --> solr/ lucene without a doubt.
>
> - kv-lookup --> consensus is still forming, and a lot of players (with a lot
> of different types of functionality) but if all you need is simple
> key-value-lookup, I would go for Tokyo Cabinet (TC) / Tyrant at the moment.
>  Please note that TC and competitors aren't just some code/ hobby projects
> but are usually born out of a real need at huge websites / social networks
> such as TC which is born from mixi  (big social network in Japan) . So at
> least you're in good company..
>
> for kv-stores I would suggest to begin your research at:
> http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/
> (beginning
> 2009)
> http://randomfoo.net/2009/04/20/some-notes-on-distributed-key-stores (half
> 2009)
> and get a feel of the kv-playing field.
>
> Hope this (pretty long) post helps,
> Geert-Jan
>
>
> 2010/3/17 Krzysztof Grodzicki 
>
>> Hi Mortiz,
>>
>> You can take a look on the project ZOIE -
>> http://code.google.com/p/zoie/. I think it's that what are you looking
>> for.
>>
>> br
>> Krzysztof
>>
>> On Wed, Mar 17, 2010 at 9:49 AM, Moritz Mädler 
>> wrote:
>> > Hi List,
>> >
>> > we are running a marketplace which has about a comparable functionality
>> like ebay (auctions, fixed-price items etc).
>> > The items are placed on the market by users who want to sell their goods.
>> >
>> > Currently we are using Sphinx as an indexing engine, but, as Sphinx
>> returns only document ids we have to make a
>> > database-query to fetch the data to display. This massively decreases
>> performance as we have to do two requests to
>> > display data.
>> >
>> > I heard that Solr is able to return a complete dataset and we hope a
>> switch to Solr can boost perfomance.
>> > A critical question is left and i was not able to find a solution for it
>> in the docs: Is it possible to update attributes directly in the
>> > index?
>> > An example for better illustration:
>> > We have an index which holds all the auctions (containing auctionid,
>> auction title) with its current prices(field: current_price). When a user
>> places a new bid,
>> > is it possible to update the attribute 'current_price' directly in the
>> index so that we can fetch the current_price from Solr and not from the
>> database?
>> >
>> > I hope you understood my problem. It would be kind if someone can point
>> me to the right direction.
>> >
>> > Thanks alot!
>> >
>> > Moritz
>>
>



-- 
Lance Norskog
goks...@gmail.com

Re: XPath Processing Applied to Clob

2010-03-17 Thread Lance Norskog

The XPath parser in the DIH is a limited implementation. The unit test
program is the only enumeration (that I can find) of what it handles:

http://svn.apache.org/repos/asf/lucene/solr/trunk/contrib/dataimporthandler/src/test/java/org/apache/solr/handler/dataimport/TestXPathRecordReader.java

//BODY in fact is not allowed, and should throw an Exception. Or at
least some kind of error message. Perhaps there is one in the logs?


On Wed, Mar 17, 2010 at 2:45 PM, Neil Chaudhuri
 wrote:
> Incidentally, I tried adding this:
>
> 
> 
>         dataField="d.text" forEach="/MESSAGE">
>                  
>        
> 
>
> But this didn't seem to change anything.
>
> Any insight is appreciated.
>
> Thanks.
>
>
>
> From: Neil Chaudhuri
> Sent: Wednesday, March 17, 2010 3:24 PM
> To: solr-user@lucene.apache.org
> Subject: XPath Processing Applied to Clob
>
> I am using the DataImportHandler to index 3 fields in a table: an id, a date, 
> and the text of a document. This is an Oracle database, and the document is 
> an XML document stored as Oracle's xmltype data type. Since this is nothing 
> more than a fancy CLOB, I am using the ClobTransformer to extract the actual 
> XML. However, I don't want to index/store all the XML but instead just the 
> XML within a set of tags. The XPath itself is trivial, but it seems like the 
> XPathEntityProcessor only works for XML file content rather than the output 
> of a Transformer.
>
> Here is what I currently have that fails:
>
>
> 
>
>        
>
>            
>
>            
>
>            
>             forEach="/MESSAGE" url="${doc.text}">
>                
>
>            
>
>        
>
> 
>
>
> Is there an easy way to do this without writing my own custom transformer?
>
> Thanks.
>



-- 
Lance Norskog
goks...@gmail.com

Re: field length normalization

2010-03-16 Thread Lance Norskog

You need to change your similarity object to be more sensitive at the
short end. This is a patch about how to do this:

http://issues.apache.org/jira/browse/LUCENE-2187

It involves Lucene coding.

On Fri, Mar 12, 2010 at 3:19 AM, muneeb  wrote:
>
>  Ah I see.
> Thanks very much Jay for your explanation, it really helped a lot.
>
> I guess I have to deal with this in some other way, since I am working with
> short titles and I really want short titles to appear at top. Can you
> suggest anything to bring titles with length 3 to appear before titles with
> length 4 (given they have similar scores)?
>
> Thanks,
>
>
> Jay Hill wrote:
>>
>> The fieldNorm is computed like this: fieldNorm = lengthNorm *
>> documentBoost
>> * documentFieldBoosts
>>
>> and the lengthNorm is: lengthNorm  =  1/(numTermsInField)**.5
>> [note that the value is encoded as a single byte, so there is some
>> precision
>> loss]
>>
>> So the values are not pre-set for the lengthNorm, but for some counts the
>> fieldLength value winds up being the same because of the precision los.
>> Here
>> is a list of lengthNorm values for 1 to 10 term fields:
>>
>> # of terms    lengthNorm
>>    1          1.0
>>    2         .625
>>    3         .5
>>    4         .5
>>    5         .4375
>>    6         .375
>>    7         .375
>>    8         .3125
>>    9         .3125
>>   10         .3125
>>
>> That's why, in your example, the lengthNorm for 3 and 4 is the same.
>>
>> -Jay
>> http://www.lucidimagination.com
>>
>>
>>
>>
>>
>> On Thu, Mar 11, 2010 at 9:50 AM, muneeb  wrote:
>>
>>>
>>>
>>> :
>>> : Did you reindex after setting omitNorms to false? I'm not sure whether
>>> or
>>> : not it is needed, but it makes sense.
>>>
>>> Yes i deleted the old index and reindexed it.
>>> Just to add another fact, that the titlles length is less than 10. I am
>>> not
>>> sure if solr has pre-set values for length normalizations, because for
>>> titles with 3 as well as 4 terms the fieldNorm is coming up as 0.5 (in
>>> the
>>> debugQuery section).
>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/field-length-normalization-tp27862618p27867025.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: 
> http://old.nabble.com/field-length-normalization-tp27862618p27874123.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: problem during benchmarking solr query

2010-03-16 Thread Lance Norskog

Use a + sign or %20 for the space. The URL standard uses a plus to mean a space.

On Tue, Mar 16, 2010 at 6:06 PM, KshamaPai  wrote:
>
> Hi,
> Am using autobench to benchmark solr with the query
> http://localhost:8983/solr/select/?q=body:hotel AND
> _val_:"recip(hsin(0.7113258,-1.291311553,lat_rad,lng_rad,30),1,1,0)"^100
>
> But if i specify the same in the autobench command as
> autobench --file bar1.tsv --high_rate 100 --low_rate 20 --rate_step 20
> --host1 localhost --single_host --port1 8983 --num_conn 10 --num_call 10
> --uri1 /solr/select/?q=body:hotel AND
> _val_:"recip(hsin(0.7113258,-1.291311553,lat_rad,lng_rad,30),1,1,0)"^100
>
> it is taking body:hotel as uri but not _val_ part ,which i think is because
> of the space after hotel. Even if i try  escaping  this in autobench using
> '\' it ll give parse error in solr.
>
> Can any one suggest me how do i handle this?so that entire query is
> considered as uri  and also solr respond with appropriate reply.
> thank you.
>
>
> --
> View this message in context: 
> http://old.nabble.com/problem-during-benchmarking-solr-query-tp27926801p27926801.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: APR setup

2010-03-16 Thread Lance Norskog

That would be a Tomcat question :)

On Tue, Mar 16, 2010 at 8:36 PM, blargy  wrote:
>
> [java] INFO: The APR based Apache Tomcat Native library which allows optimal
> performance in production environments was not found on the
> java.library.path:
> .:/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java
>
> What the heck is this and why is it recommended for production settings?
> Anyone?
>
> --
> View this message in context: 
> http://old.nabble.com/APR-setup-tp27927553p27927553.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Trouble Implementing Extracting Request Handler

2010-03-16 Thread Lance Norskog

org/apache/solr/util/plugin/SolrCoreAware in the stack trace refers to
an interface in the main Solr jar.

I think this means that putting all of the libs in
apache-tomcat-6.0.20/lib is a mistake: the classloader finds
ExtractingRequestHandler in
apache-tomcat-6.0.20/lib/apache-solr-cell-1.4.1-dev.jar, but that it
wants the above interface. The main Solr jar is not available somehow.
Since the solr-cell jar is in multiple places, we don't know exactly
how Tomcat finds it.

I suggest that you go back to a clean, empty Tomcat, and the original
Solr distribution. Copy the solr war file to the right directory in
Tomcat. Get Solr talking to your solr/ directory
(-Dsolr.solr.home=path). Now, check if the  directives in the
solrconfig.xml are right.



On Tue, Mar 16, 2010 at 4:19 PM, Steve Reichgut  wrote:
> Lance,
>
> I tried that but no luck. Just in case the relative paths were causing a
> problem, I also tried using absolute paths but neither seemed to help.
> First, I tried adding ** as the full
> directory so it would hopefully include everything. When that didn't work, I
> tried adding paths directly to the two Tika jar files in the Lib directory
> like this:
> * *and
> **
>
> Am I including them incorrectly somehow?
>
> Steve
>
> On 3/16/2010 3:38 PM, Lance Norskog wrote:
>>
>> NoClassDefFoundError usually means that the class was found, but it
>> needs other classes and those were not found. That is, Solr finds the
>> ExtractingRequestHandler jar but cannot find the Tika jars.
>>
>> In example/solr/conf/slrconfig.xml, there are several '> dir="path"/>' elements. These give classpath directories and jar files
>> to include when loading classes (and resource files). Try adding the
>> paths for your Tika jars as  directives.
>>
>> On Mon, Mar 15, 2010 at 9:02 PM, Steve Reichgut
>>  wrote:
>>
>>>
>>> Sure. I've attached two docs that have the stack trace and the full list
>>> of
>>> .jar files.
>>>
>>> On 3/15/2010 8:34 PM, Lance Norskog wrote:
>>>
>>>>
>>>> Please post the complete stack trace. Also, it will help if you make a
>>>> full listing of all .jar files in the example/ directory.
>>>>
>>>> On Mon, Mar 15, 2010 at 7:12 PM, Steve Reichgut
>>>>  wrote:
>>>>
>>>>
>>>>>
>>>>> Thanks Lance. That helped ( we are using Solr-1.4). We've run into a
>>>>> follow-on error though. It is giving the following error:
>>>>> ClassNotFoundException: org.apache.solr.util.plugin.SolrCoreAware
>>>>>
>>>>> Did we miss something else in the setup?
>>>>>
>>>>> Steve
>>>>>
>>>>> Is there something else we haven't copied
>>>>>
>>>>> On 3/15/2010 6:12 PM, Lance Norskog wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> This assumes you use the Solr-1.4 release or the Solr-1.5-dev trunk.
>>>>>>
>>>>>> The ExtractingRequestHandler libraries are in contrib/extracting/lib
>>>>>>
>>>>>> You need to make a directory example/solr/lib and copy into it the
>>>>>> apache-solr-cell jar from dist/ and all of the libraries from
>>>>>> contrib/extracting/lib. The Wiki page has not been updated for the
>>>>>> Solr 1.4 release. I just added a TODO to this effect.
>>>>>>
>>>>>> On 3/12/10, Steve Reichgut      wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Hi Grant,
>>>>>>> Thanks for the feedback. In reading the Wiki, it recommended that you
>>>>>>> copy everything from example/solr/libs directory into a /libs
>>>>>>> directory
>>>>>>> in your instance. I went into my example/solr directory and only see
>>>>>>> two
>>>>>>> directories - "bin" and "conf". There is no "libs" directory. Where
>>>>>>> else
>>>>>>> can I get the contents of what should be in "libs"?
>>>>>>>
>>>>>>> Steve
>>>>>>>
>>>>>>> On 3/12/2010 2:15 PM, Grant Ingersoll wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> On Mar 12, 2010, at 2:20 PM, Steve Reichgut wrote:
&

Re: Moving From Oracle Text Search To Solr

2010-03-16 Thread Lance Norskog

The DataImportHandler has tools for this. It will fetch rows from
Oracle and allow you to unpack columns as XML with  Xpaths.

http://wiki.apache.org/solr/DataImportHandler
http://wiki.apache.org/solr/DataImportHandler#Usage_with_RDBMS
http://wiki.apache.org/solr/DataImportHandler#XPathEntityProcessor

On Tue, Mar 16, 2010 at 2:25 PM, Neil Chaudhuri
 wrote:
> That is a great article, David.
>
> For the moment, I am trying an all-Solr approach, but I have run into a small 
> problem. The documents are stored as XML CLOB's using Oracle's OPAQUE object. 
> Is there any facility to unpack this into the actual text? Or must I execute 
> that in the SQL query?
>
> Thanks.
>
>
> -Original Message-
> From: Smiley, David W. [mailto:dsmi...@mitre.org]
> Sent: Tuesday, March 16, 2010 4:45 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Moving From Oracle Text Search To Solr
>
> If you do stay with Oracle, please report back to the list how that went.  In 
> order to get decent filtering and faceting performance, I believe you will 
> need to use "bitmapped indexes" which Oracle and some other databases support.
>
> You may want to check out my article on this subject: 
> http://www.packtpub.com/article/text-search-your-database-or-solr
>
> ~ David Smiley
> Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
>
>
> On Mar 16, 2010, at 4:13 PM, Neil Chaudhuri wrote:
>
>> Certainly I could use some basic SQL count(*) queries to achieve faceted 
>> results, but I am not sure of the flexibility, extensibility, or scalability 
>> of that approach. And from what I have read, Oracle Text doesn't do faceting 
>> out of the box.
>>
>> Each document is a few MB, and there will be millions of them. I suppose it 
>> depends on how I index them. I am pretty sure my current approach of using 
>> Hibernate to load all rows, constructing Solr POJO's from them, and then 
>> passing the POJO's to the embedded server would lead to a OOM error. I 
>> should probably look into the other options.
>>
>> Thanks.
>>
>>
>> -Original Message-
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: Tuesday, March 16, 2010 3:58 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Moving From Oracle Text Search To Solr
>>
>> Why do you think you'd hit OOM errors? How big is "very large"? I've
>> indexed, as a single document, a 26 volume encyclopedia of civil war
>> records..
>>
>> Although as much as I like the technology, if I could get away without using
>> two technologies, I would. Are you completely sure you can't get what you
>> want with clever Oracle querying?
>>
>> Best
>> Erick
>>
>> On Tue, Mar 16, 2010 at 3:20 PM, Neil Chaudhuri <
>> nchaudh...@potomacfusion.com> wrote:
>>
>>> I am working on an application that currently hits a database containing
>>> millions of very large documents. I use Oracle Text Search at the moment,
>>> and things work fine. However, there is a request for faceting capability,
>>> and Solr seems like a technology I should look at. Suffice to say I am new
>>> to Solr, but at the moment I see two approaches-each with drawbacks:
>>>
>>>
>>> 1)      Have Solr index document metadata (id, subject, date). Then Use
>>> Oracle Text to do a content search based on criteria. Finally, query the
>>> Solr index for all documents whose id's match the set of id's returned by
>>> Oracle Text. That strikes me as an unmanageable Boolean query.  (e.g.
>>> id:4ORid:33432323OR...).
>>>
>>> 2)      Remove Oracle Text from the equation and use Solr to query document
>>> content based on search criteria. The indexing process though will almost
>>> certainly encounter an OutOfMemoryError given the number and size of
>>> documents.
>>>
>>>
>>>
>>> I am using the embedded server and Solr Java APIs to do the indexing and
>>> querying.
>>>
>>>
>>>
>>> I would welcome your thoughts on the best way to approach this situation.
>>> Please let me know if I should provide additional information.
>>>
>>>
>>>
>>> Thanks.
>>>
>
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: DIH request parameters

2010-03-16 Thread Lance Norskog

They are a namespace like other namespaces and are useable in
attributes, just like in the DB query string examples.

As to defaults, you can declare those in the 
declarations in solrconfig.xml. Examples of this (search for
"defaults") in the wiki page.

On Tue, Mar 16, 2010 at 7:05 AM, Lukas Kahwe Smith  wrote:
> Hi,
>
> According to the wiki its possible to pass parameters to the DIH:
> http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters
>
> I assume they are just being replaced via simple string replacements, which 
> is exactly what I need. Can they also be in all places, even attributes (for 
> example to pass in the password)?
>
> Furthermore is there some way to define default values for these request 
> parameters in case no value is passed in?
>
> regards,
> Lukas Kahwe Smith
> m...@pooteeweet.org
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Trouble Implementing Extracting Request Handler

2010-03-16 Thread Lance Norskog

NoClassDefFoundError usually means that the class was found, but it
needs other classes and those were not found. That is, Solr finds the
ExtractingRequestHandler jar but cannot find the Tika jars.

In example/solr/conf/slrconfig.xml, there are several '' elements. These give classpath directories and jar files
to include when loading classes (and resource files). Try adding the
paths for your Tika jars as  directives.

On Mon, Mar 15, 2010 at 9:02 PM, Steve Reichgut  wrote:
> Sure. I've attached two docs that have the stack trace and the full list of
> .jar files.
>
> On 3/15/2010 8:34 PM, Lance Norskog wrote:
>>
>> Please post the complete stack trace. Also, it will help if you make a
>> full listing of all .jar files in the example/ directory.
>>
>> On Mon, Mar 15, 2010 at 7:12 PM, Steve Reichgut
>>  wrote:
>>
>>>
>>> Thanks Lance. That helped ( we are using Solr-1.4). We've run into a
>>> follow-on error though. It is giving the following error:
>>> ClassNotFoundException: org.apache.solr.util.plugin.SolrCoreAware
>>>
>>> Did we miss something else in the setup?
>>>
>>> Steve
>>>
>>> Is there something else we haven't copied
>>>
>>> On 3/15/2010 6:12 PM, Lance Norskog wrote:
>>>
>>>>
>>>> This assumes you use the Solr-1.4 release or the Solr-1.5-dev trunk.
>>>>
>>>> The ExtractingRequestHandler libraries are in contrib/extracting/lib
>>>>
>>>> You need to make a directory example/solr/lib and copy into it the
>>>> apache-solr-cell jar from dist/ and all of the libraries from
>>>> contrib/extracting/lib. The Wiki page has not been updated for the
>>>> Solr 1.4 release. I just added a TODO to this effect.
>>>>
>>>> On 3/12/10, Steve Reichgut    wrote:
>>>>
>>>>
>>>>>
>>>>> Hi Grant,
>>>>> Thanks for the feedback. In reading the Wiki, it recommended that you
>>>>> copy everything from example/solr/libs directory into a /libs directory
>>>>> in your instance. I went into my example/solr directory and only see
>>>>> two
>>>>> directories - "bin" and "conf". There is no "libs" directory. Where
>>>>> else
>>>>> can I get the contents of what should be in "libs"?
>>>>>
>>>>> Steve
>>>>>
>>>>> On 3/12/2010 2:15 PM, Grant Ingersoll wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> On Mar 12, 2010, at 2:20 PM, Steve Reichgut wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Now that I have configured my Solr instance for standard indexing, I
>>>>>>> wanted to start indexing PDF's, MS Doc's, etc. When I tried to test
>>>>>>> it
>>>>>>> with a simple PDF file, I got the following error:
>>>>>>>
>>>>>>>    org.apache.solr.common.SolrException: lazy loading error
>>>>>>>    Caused by: org.apache.solr.common.SolrException: Error loading
>>>>>>> class
>>>>>>>    'org.apache.solr.handler.extraction.ExtractingRequestHandler'
>>>>>>>
>>>>>>> Based on the error, it appeared that the problem is caused by certain
>>>>>>> components not being installed or installed correctly. Since I am not
>>>>>>> a
>>>>>>> Java guy, I had my Java person try to install the
>>>>>>> ExtractingRequestHandler to no avail. He had said that he was having
>>>>>>> real
>>>>>>> trouble finding good documentation on how to install and enable this
>>>>>>> handler.
>>>>>>>
>>>>>>> Could anyone point me to good documentation on how to
>>>>>>> install/troubleshoot this?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> http://wiki.apache.org/solr/ExtractingRequestHandler
>>>>>>
>>>>>> Essentially, you need to make sure the ERH stuff is in Solr/lib before
>>>>>> starting.
>>>>>>
>>>>>> -Grant
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Trouble Implementing Extracting Request Handler

2010-03-15 Thread Lance Norskog

Please post the complete stack trace. Also, it will help if you make a
full listing of all .jar files in the example/ directory.

On Mon, Mar 15, 2010 at 7:12 PM, Steve Reichgut  wrote:
> Thanks Lance. That helped ( we are using Solr-1.4). We've run into a
> follow-on error though. It is giving the following error:
> ClassNotFoundException: org.apache.solr.util.plugin.SolrCoreAware
>
> Did we miss something else in the setup?
>
> Steve
>
> Is there something else we haven't copied
>
> On 3/15/2010 6:12 PM, Lance Norskog wrote:
>>
>> This assumes you use the Solr-1.4 release or the Solr-1.5-dev trunk.
>>
>> The ExtractingRequestHandler libraries are in contrib/extracting/lib
>>
>> You need to make a directory example/solr/lib and copy into it the
>> apache-solr-cell jar from dist/ and all of the libraries from
>> contrib/extracting/lib. The Wiki page has not been updated for the
>> Solr 1.4 release. I just added a TODO to this effect.
>>
>> On 3/12/10, Steve Reichgut  wrote:
>>
>>>
>>> Hi Grant,
>>> Thanks for the feedback. In reading the Wiki, it recommended that you
>>> copy everything from example/solr/libs directory into a /libs directory
>>> in your instance. I went into my example/solr directory and only see two
>>> directories - "bin" and "conf". There is no "libs" directory. Where else
>>> can I get the contents of what should be in "libs"?
>>>
>>> Steve
>>>
>>> On 3/12/2010 2:15 PM, Grant Ingersoll wrote:
>>>
>>>>
>>>> On Mar 12, 2010, at 2:20 PM, Steve Reichgut wrote:
>>>>
>>>>
>>>>
>>>>>
>>>>> Now that I have configured my Solr instance for standard indexing, I
>>>>> wanted to start indexing PDF's, MS Doc's, etc. When I tried to test it
>>>>> with a simple PDF file, I got the following error:
>>>>>
>>>>>    org.apache.solr.common.SolrException: lazy loading error
>>>>>    Caused by: org.apache.solr.common.SolrException: Error loading class
>>>>>    'org.apache.solr.handler.extraction.ExtractingRequestHandler'
>>>>>
>>>>> Based on the error, it appeared that the problem is caused by certain
>>>>> components not being installed or installed correctly. Since I am not a
>>>>> Java guy, I had my Java person try to install the
>>>>> ExtractingRequestHandler to no avail. He had said that he was having
>>>>> real
>>>>> trouble finding good documentation on how to install and enable this
>>>>> handler.
>>>>>
>>>>> Could anyone point me to good documentation on how to
>>>>> install/troubleshoot this?
>>>>>
>>>>>
>>>>
>>>> http://wiki.apache.org/solr/ExtractingRequestHandler
>>>>
>>>> Essentially, you need to make sure the ERH stuff is in Solr/lib before
>>>> starting.
>>>>
>>>> -Grant
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Trouble Implementing Extracting Request Handler

2010-03-15 Thread Lance Norskog

This assumes you use the Solr-1.4 release or the Solr-1.5-dev trunk.

The ExtractingRequestHandler libraries are in contrib/extracting/lib

You need to make a directory example/solr/lib and copy into it the
apache-solr-cell jar from dist/ and all of the libraries from
contrib/extracting/lib. The Wiki page has not been updated for the
Solr 1.4 release. I just added a TODO to this effect.

On 3/12/10, Steve Reichgut  wrote:
> Hi Grant,
> Thanks for the feedback. In reading the Wiki, it recommended that you
> copy everything from example/solr/libs directory into a /libs directory
> in your instance. I went into my example/solr directory and only see two
> directories - "bin" and "conf". There is no "libs" directory. Where else
> can I get the contents of what should be in "libs"?
>
> Steve
>
> On 3/12/2010 2:15 PM, Grant Ingersoll wrote:
>> On Mar 12, 2010, at 2:20 PM, Steve Reichgut wrote:
>>
>>
>>> Now that I have configured my Solr instance for standard indexing, I
>>> wanted to start indexing PDF's, MS Doc's, etc. When I tried to test it
>>> with a simple PDF file, I got the following error:
>>>
>>>org.apache.solr.common.SolrException: lazy loading error
>>>Caused by: org.apache.solr.common.SolrException: Error loading class
>>>'org.apache.solr.handler.extraction.ExtractingRequestHandler'
>>>
>>> Based on the error, it appeared that the problem is caused by certain
>>> components not being installed or installed correctly. Since I am not a
>>> Java guy, I had my Java person try to install the
>>> ExtractingRequestHandler to no avail. He had said that he was having real
>>> trouble finding good documentation on how to install and enable this
>>> handler.
>>>
>>> Could anyone point me to good documentation on how to
>>> install/troubleshoot this?
>>>
>> http://wiki.apache.org/solr/ExtractingRequestHandler
>>
>> Essentially, you need to make sure the ERH stuff is in Solr/lib before
>> starting.
>>
>> -Grant
>>
>>
>>
>
>


-- 
Lance Norskog
goks...@gmail.com

Re: Multi valued fields

2010-03-14 Thread Lance Norskog

This could be done with a function query, except that the function I
would use does not exist.  There is no function that returns the
number of values that exist for a field. If there were, you could say:

-field:A OR (field:A and function() > 1)

I don't know the Lucene data structures well, but I suspect this would
be incredibly expensive to calculate.

On 3/11/10, Jean-Sebastien Vachon  wrote:
> Hi All,
>
> I'd like to know if it is possible to do the following on a multi-value
> field:
>
> Given the following data:
>
> document A:  field1   = [ A B C D]
> document B:  field 1  = [A B]
> document C:  field 1  = [A]
>
> Can I build a query such as :
>
>   -field: A
>
> which will return all documents that do not have "exclusive" A in the their
> field's values. By exclusive I mean that I don't want documents that only
> have A in their list of values. In my sample case, the query would return
> doc A and B.
> Because they both have other values in field1.
>
> It this kind of query possible with Solr/Lucene?
>
> Thanks
>
>
>
>


-- 
Lance Norskog
goks...@gmail.com

Re: Warning : no lockType configured for...

2010-03-14 Thread Lance Norskog

Doing an exhaustive scan of this problem, I did find this one hole:

This constructor is not deprecated, but it uses a super() call that is
deprecated. Also, this constructor is not used anywhere. I nominate it
for deprecation as well.

SolrIndexWriter.java, around line 170
  /**
   *
   */
  public SolrIndexWriter(String name, String path, DirectoryFactory
dirFactory, boolean create, IndexSchema schema) throws IOException {
super(getDirectory(path, dirFactory, null), false,
schema.getAnalyzer(), create);
init(name, schema, null);
  }


On 3/9/10, Chris Hostetter  wrote:
>
> : Ok I think I know where the problem is
>   ...
> : It's  the constructor used by SolrCore  in r772051
>
> Ughhh... so to be clear: you haven't been using Solr 1.4 at any point in
> this thread?
>
> that explains why no one else could recreate the problem you were
> describing.
>
> For future refrence: if you aren't using the most recently
> released version of Solr when you post a question about a possible bug,
> please make that very clear right up at the top of your message, and if
> you think you've found a bug, pelase make sure to test against the most
> recently released version to see if it's already been fixed.
>
> : PS : should I fill some kind of bug report even if everything is ok now ?
> (I'm
> : asking because I didn't see anything related to this problem in JIRA, so
> maybe
> : if you want to keep a trace...)
>
> If you can recreate the problem using Solr 1.3, then feel free to file a
> bug, noting that it was only a problem in 1.3, but has already been fixed
> in 1.4 ... but we don't usually bother tracking bugs against arbitrary
> unlreased points from the trunk (unless they are current).  I'm sure there
> are lots of bugs that existed only transiently as features were being
> fleshed out.
>
>
> -Hoss
>
>


-- 
Lance Norskog
goks...@gmail.com

Re: some hyphenated words not found

2010-03-14 Thread Lance Norskog

Look at the terms in the index with the analysis.jsp file, or with Luke.

The different here is that love-lorn is a separate phrase, but
life-long has a comma after it. Try inserting a space before the
comma.

On 3/14/10, george young  wrote:
> I have a nearly generic out-of-box installation of solr.  When I
> search on a short text document containing a few hyphenated words, I
> get hits on *some* of the words, but not all.  I'm quite puzzled as to
> why.  I've checked that the text is only plain ascii.  How can I find
> out what's wrong?  In the file below, solr finds life-long, but not
> love-lorn.
>
> Here's the file:
> This is a small sample document just to insure that a type *.doc can
> be accessed by X Documentation.
> It is sung to the moon by a love-lorn loon,
> who fled from the mocking throng O!
> It’s the song of a merryman, moping mum,
> whose soul was sad and whose glance was glum. Misery me — lack-a-day-dee!
> He sipped no sup, and he craved no crumb,
> As he sighed for the love of a ladye!
> Who sipped no sup, and who craved no crumb,
> As he sighed for the love of a ladye.
> Heighdy! heighdy! Misery me — lack-a-day-dee!
> He sipped no sup, and he craved no crumb,
> As he sighed for the love of a ladye!
>
> I have a song to sing, O!
> Sing me your song, O!
>
> It is sung with the ring
> Of the songs maids sing
> Who love with a love life-long, O!
> It's the song of a merrymaid, peerly proud,
> Who loved a lord, and who laughed aloud
> At the moan of the merryman, moping mum,
> Whose soul was sad, and whose glance was glum,
> Who sipped no sup, and who craved no crumb,
> As he sighed for the love of a ladye!
> Heighdy! heighdy!
> Misery me — lack-a-day-dee!
> He sipped no sup, and he craved no crumb,
> As he sighed for the love of a ladye!
>
>
> --
> georgeryo...@gmail.com
>


-- 
Lance Norskog
goks...@gmail.com

Re: Index an entire Phrase and not it's constituent parts?

2010-03-13 Thread Lance Norskog

CommonGrams is a tool for this. It makes "is a" into a token, but then
"is" and "a" are still removed as stopwords.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.CommonGramsFilterFactory

On 3/13/10, Christopher Ball  wrote:
> Thank you for the idea Mitch, but it just doesn't seem right that I should
> have to revert to Scoring when what I really need seems so fundamental.
>
> Logically, what I want is a "phrase filter factory" that would match on
> phrases listed in a file, like stopwords, but in this case index the match
> and then discard the words of the phrase from the stream before passing it
> on to the next filter given the phrases are imbedded in paragraphs which
> have other valid index material.
>
> So an analyzer would look something like:
>
>   
> 
> 
> 
> 
> 
>   
>
> Of course, one riddle that this leaves us how to match a tokenized stream. .
> . so maybe I need to also write my own tokenizer. Just seems like this would
> have been a previously desired and solved problem.
>
> Or may be I should try solr.KeepWordFilterFactory if it can deal with
> phrases . . ?
>
> I'm stumped =(
>
> -Original Message-
> From: MitchK [mailto:mitc...@web.de]
> Sent: Saturday, March 13, 2010 8:12 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Index an entire Phrase and not it's constituent parts?
>
>
> Christopher,
>
> maybe the SynonymFilter can help you to solve your problem.
>
> Let me try to explain:
> If you create an extra field in the index for your use-case, you can boost
> matches of them in a special way.
>
> The next step is creating an extra synonym-file.
> as much as => SpecialPhrase1
> in amount of => SpecialPhrase2
> ... and so on...
>
> If an user wants to query for something like "as much as I love you" you can
> do some boosting on matches from the SpecialPhrase-field and you are able to
> response results from both: the normal StopWordFiltered data and the
> SpecialPhrase-data.
>
> If this fits your needs, please let me know.
>
> Kind regards
> - Mitch
> --
> View this message in context:
> http://old.nabble.com/Index-an-entire-Phrase-and-not-it%27s-constituent-part
> s--tp27785521p27887564.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
>


-- 
Lance Norskog
goks...@gmail.com

Re: Cant commit on 125 GB index

2010-03-13 Thread Lance Norskog

Commit actions are in the jetty log. I don't have a script to pull
them out in a spread-sheet-able form, but that would be useful.

On 3/13/10, Frederico Azeiteiro  wrote:
> Yes, the http request is timing out even when using values of 10m.
>
> Normally the commit takes about 10s. I did an optimize (it took 6h) and it
> looks good for now...
>
> 59m? well i didn't wait that long, i restarted the solr instance and tried
> again.
>
> I'll try to use autocommit on a near future.
>
> Using autocommit how can i check how many commits are happening at the
> moment, when they started to? Is there a way to control and konw what  is
> happening behind the scenes in "real time"?
>
> I'm using solr 1.4 with jetty.
>
> 
>
> De: Lance Norskog [mailto:goks...@gmail.com]
> Enviada: sáb 13-03-2010 23:31
> Para: solr-user@lucene.apache.org
> Assunto: Re: Cant commit on 125 GB index
>
>
>
> What is timing out? The external HTTP request? Commit times are a
> sawtooth and slowly increase. My record is 59 minutes, but I was doing
> benchmarking.
>
> On Thu, Mar 11, 2010 at 1:46 AM, Frederico Azeiteiro
>  wrote:
>> Hi,
>>
>> I'm having timeouts commiting on a 125 GB index with about 2200
>> docs.
>>
>>
>>
>> I'm inserting new docs every 5m and commiting after that.
>>
>>
>>
>> I would like to try the autocommit option and see if I can get better
>> results. I need the docs indexed available for searching in about 10
>> minutes after the insert.
>>
>>
>>
>> I was thinking of using something like
>>
>>
>>
>> 
>>
>>  5000
>>
>>  86000
>>
>>
>>
>>
>>
>> I update about 4000 docs every 15m.
>>
>>
>>
>> Can you share your thoughts on this config?
>>
>> Do you think this will solve my commits timeout problem?
>>
>>
>>
>> Thanks,
>>
>> Frederico
>>
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>
>
>


-- 
Lance Norskog
goks...@gmail.com

Re: HTMLStripTransformer not working with data importer

2010-03-13 Thread Lance Norskog

DIH has special handling for upper & lower case field names. It is
possible your config is running afoul of this.

Try using different names for the Solr fields than the database fields.


On 3/11/10, James Ostheimer  wrote:
> Hi-
>
> I can't seem to make any of the transfomers work, I am using the
> DataImporter to pull in data from a wordpress instance (see below).  Neither
> REGEX or HTMLStrip seems to do anything to my content.
>
> Do I have to include a separate jar with the transformers?  Are the
> transformers in 1.4 (particularly the HTMLStrip)?
>
> James
>
> On Wed, Mar 10, 2010 at 10:47 PM, James Ostheimer > wrote:
>
>> HI-
>>
>> I am working a contract to index some wordpress data.  For the posts I of
>> course have html in the content of the column, I'd like to strip it out.
>>  Here is my data importer config
>>
>> 
>> > url="jdbc:mysql://localhost:3306/econetsm" user="***"
>> password="***"
>> />
>> 
>> > query="SELECT id, post_content, post_title FROM elinstmkting_posts e"
>> onError="abort"
>> deltaQuery="SELECT * FROM elinstmkting_posts e where
>> post_modified_gmt > '${dataimporter.last_index_time}'">
>>> stripHTML="false"/>
>> > stripHTML="true"  />
>> 
>> 
>> 
>>
>> Looks perfect according to the wiki docs, but the html is found when I
>> search for "strong" ( tag) and html is returned in the field.
>>
>> I assume I am doing something stupid wrong, I am using the latest stable
>> solr (1.4.0).
>>
>> Does it matter that the post data is not a complete html document (it
>> doesn't have a  start tag or a  tag)?
>>
>> James
>>
>


-- 
Lance Norskog
goks...@gmail.com

Re: Cant commit on 125 GB index

2010-03-13 Thread Lance Norskog

What is timing out? The external HTTP request? Commit times are a
sawtooth and slowly increase. My record is 59 minutes, but I was doing
benchmarking.

On Thu, Mar 11, 2010 at 1:46 AM, Frederico Azeiteiro
 wrote:
> Hi,
>
> I'm having timeouts commiting on a 125 GB index with about 2200
> docs.
>
>
>
> I'm inserting new docs every 5m and commiting after that.
>
>
>
> I would like to try the autocommit option and see if I can get better
> results. I need the docs indexed available for searching in about 10
> minutes after the insert.
>
>
>
> I was thinking of using something like
>
>
>
> 
>
>      5000
>
>      86000
>
>    
>
>
>
> I update about 4000 docs every 15m.
>
>
>
> Can you share your thoughts on this config?
>
> Do you think this will solve my commits timeout problem?
>
>
>
> Thanks,
>
> Frederico
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Updating FAQ for International Characters?

2010-03-13 Thread Lance Norskog

You might also try using CDATA blocks to wrap your Unicode text. It is
usually much easier to view the text while debugging these problems.

On Thu, Mar 11, 2010 at 12:13 AM, Eric Pugh
 wrote:
> So I am using Sunspot to post over, which means an extra layer of
> indirection between mean and my XML!  I will look tomorrow.
>
>
> On Mar 10, 2010, at 7:21 PM, Chris Hostetter wrote:
>
>>
>> : Any time a character like that was index Solr through a unknown entity
>> error.
>> : But if converted to À or À then everything works great.
>> :
>> : I tried out using Tomcat versus Jetty and got the same results.  Before
>> I edit
>>
>> Uh, you mean like the characters in exampledocs/utf8-example.xml ?
>>
>> it contains literale utf8 characters, and it works fine.
>>
>> Based on your "À" comment I assume you are posting XML ... are you
>> sure you are using the utf8 charset?
>>
>> -Hoss
>>
>
> -
> Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com
> Co-Author: Solr 1.4 Enterprise Search Server available from
> http://www.packtpub.com/solr-1-4-enterprise-search-server
> Free/Busy: http://tinyurl.com/eric-cal
>
>
>
>
>
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: SOLR Search Query : Exception : Software caused connection abort: recv failed

2010-03-13 Thread Lance Norskog

It is usually a limitation in the servlet container. You could try
using embedded Solr or using an HTTP POST instead of an HTTP GET.
However, in this case it is probably not possible.

If these long filter queries never change, you could embed these in
the solrconfig.xml declaration for a request handler. That way, they
don't get parsed by the HTTP parser.

2010/3/10 Kranti™ K K Parisa :
> Hi,
>
> I am trying to test the SOLR search with very big query.. sample code
> snippet is as follows. when I try that its throwing exceptions.
>
> Is SOLR query has any limitations with size or length..etc??
>
> =
>    solrServer = SolrUtils.getSolrServerTest("http://localhost:8080/solr-tag
> ");
>        StringBuffer strFq = new StringBuffer("&fq=(");
>        SolrQuery q = new SolrQuery("*:*");
>        int intStart = 934051;
>        for(int m=intStart;m<(intStart+500);m++){
>            strFq.append("fieldname:" + m + " OR ");
>        }
>        strFq.replace(strFq.lastIndexOf(" OR "), strFq.length() - 1, "");
>        strFq.append(")");
>        q.addFilterQuery(strFq.toString());
>        QueryResponse res = solrServer.query(q);
>        System.out.println("response=="+res.getResponse());
> ==
>
> but getting the following exception
> Exception in thread "main" org.apache.solr.client.solrj.SolrServerException:
> java.net.SocketException: Software caused connection abort: recv failed
>        at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:472)
>        at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
>        at
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
>        at
> org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
>        at com.getset.inre.test.IndexTest.testTagIndex(IndexTest.java:311)
>        at com.getset.inre.test.IndexTest.main(IndexTest.java:330)
> Caused by: java.net.SocketException: Software caused connection abort: recv
> failed
>        at java.net.SocketInputStream.socketRead0(Native Method)
>        at java.net.SocketInputStream.read(SocketInputStream.java:129)
>        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>        at java.io.BufferedInputStream.read(BufferedInputStream.java:235)
>        at
> org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
>        at
> org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
>        at
> org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
>        at
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
>        at
> org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
>        at
> org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
>        at
> org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
>        at
> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
>        at
> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
>        at
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
>        at
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
>        at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:416)
>        ... 5 more
> =
>
>
> Best Regards,
> Kranti K K Parisa
>



-- 
Lance Norskog
goks...@gmail.com

Re: Boundary match as part of query language?

2010-03-13 Thread Lance Norskog

One way is to add magic 'beginning' and 'end' terms, then do phrase
searches with those terms.

On Wed, Mar 10, 2010 at 7:51 AM, Jan Høydahl / Cominvent
 wrote:
> Hi,
>
> Sometimes you need to anchor your search to start/end of field.
>
> Example:
> 1. title=New York Yankees
> 2. title=New York
> 3. title=York
>
> If I search title:"New York", or title:"York" I would get a match, but I'd 
> like to anchor my search to beginning and/or end of the field, e.g. with 
> regex syntax, title:"^New York$"
>
> Now, I know how to work-around this, by appending some unique character 
> sequence at each end of the field and then include this in my search in the 
> front end. However, I wonder if any of you have been planning a patch to add 
> a native boundary match feature to Solr that would automagically add tokens 
> (also for multi-value fields!), and expand the query language to allow 
> querying for starts-with(), ends-with() and equals()
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Training in Europe - www.solrtraining.com
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: HTML encode extracted docs - Problems with solr.HTMLStripCharFilter

2010-03-13 Thread Lance Norskog

HTMLStripCharFilter is only in the analyzer: it creates searchable
terms from the HTML input. The raw HTML is stored and fetched.

There are some bugs in term positions and highlighting, An
EntityProcessor wrapping the HTMLStripCharFIlter would be really
useful.

On Tue, Mar 9, 2010 at 5:31 AM, Mark Roberts  wrote:
> Sounds like "solr.HTMLStripCharFilter" may work... except, I'm getting a 
> couple of problems:
>
> 1) HTML still seems to be getting into my content field
>
> All I did was add  to 
> the index analyzer for the my "text" fieldType.
>
>
> 2) Some it seems to have broken my highlighting, I get this error:
>
> 'org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token wrong 
> exceeds length of provided text sized 3862'
>
>
>
> Any ideas how I can fix this?
>
>
>
>
>
> -Original Message-
> From: Lance Norskog [mailto:goks...@gmail.com]
> Sent: 09 March 2010 04:36
> To: solr-user@lucene.apache.org
> Subject: Re: HTML encode extracted docs
>
> A Tika integration with the DataImportHandler is in the Solr trunk.
> With this, you can copy the raw HTML into different fields and process
> one copy with Tika.
>
> If it's just straight HTML, would the HTMLStripCharFilter be good enough?
>
> http://www.lucidimagination.com/search/document/CDRG_ch05_5.7.2
>
> On Mon, Mar 8, 2010 at 5:50 AM, Mark Roberts  
> wrote:
>> I'm uploading .htm files to be extracted - some of these files are "include" 
>> files that have snippets of HTML rather than fully formed html documents.
>>
>> solr-cell stores the raw HTML for these items, rather than extracting the 
>> text. Is there any way I can get solr to encode this content prior to 
>> storing it?
>>
>> At the moment, I have the problem that when the highlighted snippets are  
>> retrieved via search, I need to parse the snippet and HTML encode the bits 
>> of HTML that where indexed, whilst *not* encoding the bits that where added 
>> by the highlighter, which is messy and time consuming.
>>
>> Thanks! Mark,
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>



-- 
Lance Norskog
goks...@gmail.com

Re: PDF extraction leads to reversed words

2010-03-08 Thread Lance Norskog

Is this a mistake in the Tika library collection in the Solr trunk?

On Mon, Mar 8, 2010 at 5:15 PM, Robert Muir  wrote:
> I think the problem is that Solr does not include the ICU4J jar, so it
> won't work with Arabic PDF files.
>
> Try putting ICU4J 3.8 (http://site.icu-project.org/download) in your 
> classpath.
>
> On Mon, Mar 8, 2010 at 6:30 PM, Abdelhamid  ABID  wrote:
>> Hi,
>> Posting arabic pdf files to Solr using a web form (to solr/update/extract)
>> get extracted texts and each words displayed in reverse direction(instead of
>> right to left).
>> When perform search against these texts with -always- reversed key-words I
>> get results but reversed.
>> This problem doesn't occur when posting MsWord document.
>> I think the problem come from Tika !
>>
>> Any clue ?
>>
>> --
>> elsadek
>> Software Engineer- J2EE / WEB / ESB MULE
>>
>
>
>
> --
> Robert Muir
> rcm...@gmail.com
>



-- 
Lance Norskog
goks...@gmail.com

Re: More contextual information in analyser

2010-03-08 Thread Lance Norskog

Yes, payloads should do this.

On Mon, Mar 8, 2010 at 8:29 PM, Jon Baer  wrote:
> Isn't this what Lucene/Solr payloads are theoretically for?
>
> ie: 
> http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
>
> - Jon
>
> On Mar 8, 2010, at 11:15 PM, Lance Norskog wrote:
>
>> This is an interesting idea. There are other projects to make the
>> analyzer/filter chain more "porous", or open to outside interaction.
>>
>> A big problem is that queries are analyzed, too. If you want to give
>> the same metadata to the analyzer when doing a query against the
>> field, things get tough. You would need a special query parser to
>> implement your own syntax to do that. However, the analyzer chain in
>> the query phase does not receive the parsed query, so you have to in
>> some way change this.
>>
>> On Mon, Mar 8, 2010 at 2:14 AM, dbejean  wrote:
>>>
>>> Hello,
>>>
>>> If I write a custom analyser that accept a specific attribut in the
>>> constructor
>>>
>>> public MyCustomAnalyzer(String myAttribute);
>>>
>>> Is there a way to dynamically send a value for this attribute from Solr at
>>> index time in the XML Message ?
>>>
>>> 
>>>  
>>>    .
>>>
>>>
>>> Obviously, in Sorl shema.xml, the "content" field is associated to my custom
>>> Analyser.
>>>
>>> Thank you.
>>>
>>> Dominique
>>>
>>> --
>>> View this message in context: 
>>> http://old.nabble.com/More-contextual-information-in-analyser-tp27819298p27819298.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: [ANN] Zoie Solr Plugin - Zoie Solr Plugin enables real-time update functionality for Apache Solr 1.4+

2010-03-08 Thread Lance Norskog

Solr unique ids can be any type. The QueryElevateComponent complains
if the unique id is not a string, but you can comment out the QEC.  I
have one benchmark test with 2 billion documents with an integer id.
Works great.

On Mon, Mar 8, 2010 at 5:06 PM, Don Werve  wrote:
> Too bad it requires integer (long) primary keys... :/
>
> 2010/3/8 Ian Holsman 
>
>>
>> I just saw this on twitter, and thought you guys would be interested.. I
>> haven't tried it, but it looks interesting.
>>
>> http://snaprojects.jira.com/wiki/display/ZOIE/Zoie+Solr+Plugin
>>
>> Thanks for the RT Shalin!
>>
>

-- 
Lance Norskog
goks...@gmail.com

Re: HTML encode extracted docs

2010-03-08 Thread Lance Norskog

A Tika integration with the DataImportHandler is in the Solr trunk.
With this, you can copy the raw HTML into different fields and process
one copy with Tika.

If it's just straight HTML, would the HTMLStripCharFilter be good enough?

http://www.lucidimagination.com/search/document/CDRG_ch05_5.7.2

On Mon, Mar 8, 2010 at 5:50 AM, Mark Roberts  wrote:
> I'm uploading .htm files to be extracted - some of these files are "include" 
> files that have snippets of HTML rather than fully formed html documents.
>
> solr-cell stores the raw HTML for these items, rather than extracting the 
> text. Is there any way I can get solr to encode this content prior to storing 
> it?
>
> At the moment, I have the problem that when the highlighted snippets are  
> retrieved via search, I need to parse the snippet and HTML encode the bits of 
> HTML that where indexed, whilst *not* encoding the bits that where added by 
> the highlighter, which is messy and time consuming.
>
> Thanks! Mark,
>



-- 
Lance Norskog
goks...@gmail.com

Re: More contextual information in analyser

2010-03-08 Thread Lance Norskog

This is an interesting idea. There are other projects to make the
analyzer/filter chain more "porous", or open to outside interaction.

A big problem is that queries are analyzed, too. If you want to give
the same metadata to the analyzer when doing a query against the
field, things get tough. You would need a special query parser to
implement your own syntax to do that. However, the analyzer chain in
the query phase does not receive the parsed query, so you have to in
some way change this.

On Mon, Mar 8, 2010 at 2:14 AM, dbejean  wrote:
>
> Hello,
>
> If I write a custom analyser that accept a specific attribut in the
> constructor
>
> public MyCustomAnalyzer(String myAttribute);
>
> Is there a way to dynamically send a value for this attribute from Solr at
> index time in the XML Message ?
>
> 
>  
>    .
>
>
> Obviously, in Sorl shema.xml, the "content" field is associated to my custom
> Analyser.
>
> Thank you.
>
> Dominique
>
> --
> View this message in context: 
> http://old.nabble.com/More-contextual-information-in-analyser-tp27819298p27819298.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

-- 
Lance Norskog
goks...@gmail.com

Re: Can't delete from curl

2010-03-08 Thread Lance Norskog

... curl http://xen1.xcski.com:8080/solrChunk/nutch/select

that should be /update, not /select

On Sun, Mar 7, 2010 at 4:32 PM, Paul Tomblin  wrote:
> On Tue, Mar 2, 2010 at 1:22 AM, Lance Norskog  wrote:
>
>> On Mon, Mar 1, 2010 at 4:02 PM, Paul Tomblin  wrote:
>> > I have a schema with a field name "category" (> > type="string" stored="true" indexed="true"/>).  I'm trying to delete
>> > everything with a certain value of category with curl:...
>> >
>> > I send:
>> >
>> > curl http://localhost:8080/solrChunk/nutch/update -H "Content-Type:
>> > text/xml" --data-binary 'category:Banks'
>> >
>> > Response is:
>> >
>> > 
>> > 
>> > 0> > name="QTime">23
>> > 
>> >
>> > I send
>> >
>> > curl http://localhost:8080/solrChunk/nutch/update -H "Content-Type:
>> > text/xml" --data-binary ''
>> >
>> > Response is:
>> >
>> > 
>> > 
>> > 0> > name="QTime">1914
>> > 
>> >
>> > but when I go back and query, it shows all the same results as before.
>> >
>> > Why isn't it deleting?
>>
>> Do you query with curl also? If you use a web browser, Solr by default
>> uses http caching, so your browser will show you the old result of the
>> query.
>>
>>
> I think you're right about that.  I tried using curl, and it did go to zero.
>  But now I've got a different problem: sometimes when I try to commit, I get
> a NullPointerException:
>
>
> curl http://xen1.xcski.com:8080/solrChunk/nutch/select -H "Content-Type:
> text/xml" --data-binary ''Apache Tomcat/6.0.20 -
> Error report<!--H1
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
> H2
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
> H3
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
> BODY
> {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
> P
> {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
> {color : black;}A.name {color : black;}HR {color : #525D76;}-->
> HTTP Status 500 - null
>
> java.lang.NullPointerException
> at java.io.StringReader.<init>(StringReader.java:33)
> at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:173)
> at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78)
> at org.apache.solr.search.QParser.getQuery(QParser.java:131)
> at
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
> at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
> at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
> at java.lang.Thread.run(Thread.java:619)
> type Status
> reportmessage null
>
> java.lang.NullPointerException
> at java.io.StringReader.<init>(StringReader.java:33)
> at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:173)
> at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78)
> at org.apache.solr.search.QParser.getQuery(QParser.java:131)
> at
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>
>
> --
> http://www.linkedin.com/in/paultomblin
> http://careers.stackoverflow.com/ptomblin
>



-- 
Lance Norskog
goks...@gmail.com

Re: SolrJ commit options

2010-03-08 Thread Lance Norskog

waitFlush=true means that the commit HTTP call waits until everything
is sent to disk before it returns.
waitSearcher=true means that the commit HTTP call waits until Solr has
reloaded the index and is ready to search against it. (For more, study
Solr warming up.)

Both of these mean that the HTTP call (or curl program or Solrj
program) that started the commit, waits until it is done. Other
processes doing searches against the index are not blocked. However,
the commit may have so much disk activity that the other searches do
not proceeed very fast. They are not completely blocked.

The commit will take as long as it takes, and your results will appear
after that. If you want to time that, use
waitFlush=true&waitSearcher=true.

On Fri, Mar 5, 2010 at 9:39 PM, gunjan_versata  wrote:
>
> But can anyone explain me the use of these parameters.. I have read upon it..
> what i could  not understand was.. if can i set both the params to false,
> after how much time will my changes start reflecting?
>
> --
> View this message in context: 
> http://old.nabble.com/SolrJ-commit-options-tp27714405p27802041.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

-- 
Lance Norskog
goks...@gmail.com

Re: SolrJ commit options

2010-03-05 Thread Lance Norskog

One technique to control commit times is to do automatic commits: you
can configure a core to commit every N seconds (really milliseconds,
but less than 5 minutes becomes difficult) and/or every N documents.
This promotes a more fixed amount of work per commit.

Also, the maxMergeDocs parameter lets you force a maximum segment size
(in documents). This may cap the longest possible commit times.

http://www.lucidimagination.com/search/document/CDRG_ch08_8.1.2.3?q=maxMergeDocs

On Fri, Mar 5, 2010 at 2:57 PM, Otis Gospodnetic
 wrote:
> Jerry,
>
> This is why people often do index modifications on one server (master) and 
> replicate the read-only index to 1+ different servers (slaves).
> If you do that, does the problem go away?
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
>
>
>
> - Original Message 
>> From: Jerome L Quinn 
>> To: solr-user@lucene.apache.org
>> Sent: Fri, March 5, 2010 10:13:03 AM
>> Subject: Re: SolrJ commit options
>>
>> Shalin Shekhar Mangar wrote on 02/25/2010 07:38:39
>> AM:
>>
>> > On Thu, Feb 25, 2010 at 5:34 PM, gunjan_versata
>> wrote:
>> >
>> > >
>> > > We are using SolrJ to handle commits to our solr server.. All runs
>> fine..
>> > > But whenever the commit happens, the server becomes slow and stops
>> > > responding.. therby resulting in TimeOut errors on our production. We
>> are
>> > > using the default commit with waitFlush = true, waitSearcher = true...
>> > >
>> > > Can I change there values so that the requests coming to solr dont
>> block on
>> > > recent commit?? Also, what will be the impact of changing these
>> values??
>> > >
>> >
>> > Solr does not block reads during a commit/optimize. Write operations are
>> > queued up but they are still accepted. Are you using the same Solr server
>> > for reads as well as writes?
>>
>> I've seen similar things with Solr 1.3 (not using SolrJ).  If I try to
>> optimize the
>> index, queries will take much longer - easily a minute or more, resulting
>> in timeouts.
>>
>> Jerry
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: facet on null value

2010-03-05 Thread Lance Norskog

(I don't know where filter queries came in.)

If you get a result with
- 
 
- 
- 
 40
 60
 20
 2
 
 
 

and you want to get facets of '000' and Null, this query will include
documents that match those facets:

&q=features:000 OR -features[* TO *]

On Thu, Mar 4, 2010 at 8:16 PM, Andy  wrote:
> My understanding is that 2 means there are 2 documents missing a 
> facet value.
>
> But how does adding  fq=-fieldName:[* TO *] enable users to click on that 
> value to filter? There was no value, only the count (2) was returned.
>
> --- On Thu, 3/4/10, Lance Norskog  wrote:
>
> From: Lance Norskog 
> Subject: Re: facet on null value
> To: solr-user@lucene.apache.org
> Date: Thursday, March 4, 2010, 10:33 PM
>
> I have added facet.limit=5 to the above to make this easier. Here is
> the  part of the response:
>
>
> - 
>   
> - 
> - 
>   0
>   0
>   0
>   0
>   0
>   2
>   
>   
>   
>   
>
> (What is the 2?)
>
> On Thu, Mar 4, 2010 at 7:30 PM, Lance Norskog  wrote:
>> Set up the out-of-the-box example Solr. Index the documents in
>> example/exampledocs.
>>
>> Run this query:
>>
>> http://localhost:8983/solr/select/?q=*:*&fq=-features:[* TO
>> *]&version=2.2&start=0&rows=10&indent=on&facet=true&facet.field=features&facet.missing=on
>>
>> Now, change facet.missing=on to =off. There is no change. You get all
>> of the 0-valued facets anyway.
>>
>> What exactly is facet.missing supposed to do with this query?
>>
>> On Thu, Mar 4, 2010 at 6:39 PM, Andy  wrote:
>>> What would the response look like with this query?
>>>
>>> Can you give an example?
>>>
>>> --- On Thu, 3/4/10, Chris Hostetter  wrote:
>>>
>>> From: Chris Hostetter 
>>> Subject: Re: facet on null value
>>> To: solr-user@lucene.apache.org
>>> Date: Thursday, March 4, 2010, 8:40 PM
>>>
>>>
>>> : > I want to find a way to let users to find those documents. One way is to
>>> : > make Null an option the users can choose, something like:
>>>
>>> : Isn't it facet.missing=on?
>>> : http://wiki.apache.org/solr/SimpleFacetParameters#facet.missing
>>>
>>> that will get you the count, but if you then want to let them click on
>>> that value to filter your query you need:  fq=-fieldName:[* TO *]
>>>
>>>
>>>
>>> -Hoss
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: facet on null value

2010-03-04 Thread Lance Norskog

I have added facet.limit=5 to the above to make this easier. Here is
the  part of the response:


- 
  
- 
- 
  0
  0
  0
  0
  0
  2
  
  
  
  

(What is the 2?)

On Thu, Mar 4, 2010 at 7:30 PM, Lance Norskog  wrote:
> Set up the out-of-the-box example Solr. Index the documents in
> example/exampledocs.
>
> Run this query:
>
> http://localhost:8983/solr/select/?q=*:*&fq=-features:[* TO
> *]&version=2.2&start=0&rows=10&indent=on&facet=true&facet.field=features&facet.missing=on
>
> Now, change facet.missing=on to =off. There is no change. You get all
> of the 0-valued facets anyway.
>
> What exactly is facet.missing supposed to do with this query?
>
> On Thu, Mar 4, 2010 at 6:39 PM, Andy  wrote:
>> What would the response look like with this query?
>>
>> Can you give an example?
>>
>> --- On Thu, 3/4/10, Chris Hostetter  wrote:
>>
>> From: Chris Hostetter 
>> Subject: Re: facet on null value
>> To: solr-user@lucene.apache.org
>> Date: Thursday, March 4, 2010, 8:40 PM
>>
>>
>> : > I want to find a way to let users to find those documents. One way is to
>> : > make Null an option the users can choose, something like:
>>
>> : Isn't it facet.missing=on?
>> : http://wiki.apache.org/solr/SimpleFacetParameters#facet.missing
>>
>> that will get you the count, but if you then want to let them click on
>> that value to filter your query you need:  fq=-fieldName:[* TO *]
>>
>>
>>
>> -Hoss
>>
>>
>>
>>
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>



-- 
Lance Norskog
goks...@gmail.com

Re: facet on null value

2010-03-04 Thread Lance Norskog

Set up the out-of-the-box example Solr. Index the documents in
example/exampledocs.

Run this query:

http://localhost:8983/solr/select/?q=*:*&fq=-features:[* TO
*]&version=2.2&start=0&rows=10&indent=on&facet=true&facet.field=features&facet.missing=on

Now, change facet.missing=on to =off. There is no change. You get all
of the 0-valued facets anyway.

What exactly is facet.missing supposed to do with this query?

On Thu, Mar 4, 2010 at 6:39 PM, Andy  wrote:
> What would the response look like with this query?
>
> Can you give an example?
>
> --- On Thu, 3/4/10, Chris Hostetter  wrote:
>
> From: Chris Hostetter 
> Subject: Re: facet on null value
> To: solr-user@lucene.apache.org
> Date: Thursday, March 4, 2010, 8:40 PM
>
>
> : > I want to find a way to let users to find those documents. One way is to
> : > make Null an option the users can choose, something like:
>
> : Isn't it facet.missing=on?
> : http://wiki.apache.org/solr/SimpleFacetParameters#facet.missing
>
> that will get you the count, but if you then want to let them click on
> that value to filter your query you need:  fq=-fieldName:[* TO *]
>
>
>
> -Hoss
>
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: merge indexes command

2010-03-04 Thread Lance Norskog

Add quotes around the URL string:

curl 
'http://localhost:8983/solr/admin/cores?action=mergeindexes&core=core0&indexDir=/opt/solr/core1/data/index&indexDir=/opt/solr/core2/data/index'

On Thu, Mar 4, 2010 at 5:24 PM, Mark Fletcher
 wrote:
> Hi,
>
> Can someone pls suggest how to use this command as a part of linux script:
>
> *
> http://localhost:8983/solr/admin/cores?action=mergeindexes&core=core0&indexDir=/opt/solr/core1/data/index&indexDir=/opt/solr/core2/data/index
> *
> Will just adding *curl* at  the beginning help..
>
> I tried this but it gives err:-
> *Missing required parameter: core*
> **
> Any help is deeply appreciated.
>
> Thanks and Rgds,
> Mark.
>



-- 
Lance Norskog
goks...@gmail.com

Re: facet on null value

2010-03-04 Thread Lance Norskog

Ah!  I did not know this one.

On Thu, Mar 4, 2010 at 5:01 PM, Andy  wrote:
> Yes. Thank you.
>
> --- On Thu, 3/4/10, Koji Sekiguchi  wrote:
>
> From: Koji Sekiguchi 
> Subject: Re: facet on null value
> To: solr-user@lucene.apache.org
> Date: Thursday, March 4, 2010, 7:21 PM
>
> Andy wrote:
>> There's a field "A" I want to facet on.
>>
>> Some documents have no value for field "A". So they wouldn't show up in the 
>> list of facet value options.
>>
>> I want to find a way to let users to find those documents. One way is to 
>> make Null an option the users can choose, something like:
>>
>> value1 (4558)
>> value2 (1345)
>> Null (156)
>> value3 (85)
>>
>>
> Isn't it facet.missing=on?
>
> http://wiki.apache.org/solr/SimpleFacetParameters#facet.missing
>
> Koji
>
> --
> http://www.rondhuit.com/en/
>
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: facet on null value

2010-03-04 Thread Lance Norskog

This query will find them: *:* AND -A:[* TO *]

The '*:* AND' is to get around a weird quirk of Lucene. "Minus field
range star TO star" is the trick.

On Thu, Mar 4, 2010 at 3:06 PM, Andy  wrote:
> There's a field "A" I want to facet on.
>
> Some documents have no value for field "A". So they wouldn't show up in the 
> list of facet value options.
>
> I want to find a way to let users to find those documents. One way is to make 
> Null an option the users can choose, something like:
>
> value1 (4558)
> value2 (1345)
> Null (156)
> value3 (85)
>
> Is that something that Solr support?
>
>
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Can I used .XML files instead of .OSM files

2010-03-04 Thread Lance Norskog

Is the 'body' field a text type? If it is a string, searching for
words will not work.

Does search for 'id:1' work?

On Thu, Mar 4, 2010 at 3:44 AM, mamathahl  wrote:
>
> I forgot to mention that I have been working on geo-saptial examples
> downloaded from http://www.ibm.com/developerworks/java/library/j-spatial/.
> I have replaced the OSM files(data) which initially existed, with my data
> (i.e XML file with OSM extension).  My XML file has many data records.  The
> 1st record is shown below.
>  I use the following commands to index and retrieve the data:
> ant index
> ant start-solr
> and then hit the url http://localhost:8983/solr/admin
> But when a keyword that exists in the data file is given, I get the
> following
> −
> 
> −
> 
> 0
> 0
> −
> 
> on
> 0
> DRI
> 2.2
> 10
> 
> 
> 
> 
> Since there is no error message being displayed, I'm unable to figure out
> what is going wrong.  Kindly help me by providing an appropriate solution.
>
> mamathahl wrote:
>>
>> I'm very new to Solr.  I downloaded apache-solr-1.5-dev and was trying out
>> the example in order to first figure out how Solr is working.  I found out
>> that the data directory consisted of .OSM files.  But I have an XML file
>> consisting of latitude, longitude and relevant news for that location.
>> Can I just use the XML file to index the data or is it necessary for me to
>> convert this file to .OSM file using some tool and then proceed further?
>> Also the attribute value from the .OSM file is being considered in that
>> example.  Since there are no attributes for the tags in my XML file, how
>> can I extract only the contents of my tags?Any help in this direction will
>> be appreciated.  Thanks in advance.
>>
>
> --
> View this message in context: 
> http://old.nabble.com/Can-I-used-.XML-files-instead-of-.OSM-files-tp27769082p27779694.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Too many .cfs files

2010-03-04 Thread Lance Norskog

It's not that it uses 10 files, it's that when 10 files of size X
exist it merges all of them into a file of size Y.

If you run an optimize everything will be merged back into one large .cfs file.

On Wed, Mar 3, 2010 at 11:09 PM, mklprasad  wrote:
>
> HI All,
> I set up my 'mergerfactor ' as 10.
> i have loaded a 1million docs in to solr,after that iam able to see 14 .cfs
> files in my data/index folder.
> mergeFactor will not merge after the 11th record comes?
>
> Plese clearify?
>
> Thanks,
> Prasad
>
> --
> View this message in context: 
> http://old.nabble.com/Too-many-.cfs-files-tp2508p2508.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Confused with Shards multicore search results

2010-03-03 Thread Lance Norskog

"different unique id for each schema.xml file."

All cores should have the same schema file with the same unique id
field and type.

Did you mean that the documents in both cores have a different value
for the unique id field?

On Wed, Mar 3, 2010 at 6:45 PM, JavaGuy84  wrote:
>
> Hi,
>
> I finally got shards work with multicore but now I am facing a different
> issue.
>
> I have 2 seperate schema / data config files for each core. I also have
> different unique id for each schema.xml file.
>
> I indexed both the cores and I was able to successfully search independently
> on each core but when I used Shards, I didnt get what I expected. For ex:
>
> http://localhost:8990/solr/core0/select?q=1565 returned 1 row
> http://localhost:8990/solr/core1/select?q=1565 returned 1 row
>
> When I tried this
> http://localhost:8990/solr/core0/select/?q=1565&shards=localhost:8990/solr/core0,localhost:8990/solr/core1
>
> It again returned just one row.. but I would think that it should return 2
> rows if I have different unique id for each document.
>
> Is there any configuration I need to do in order to make it searchable
> across multiple indexex? any primary / slave configuration? any help would
> be of great help to me.
>
> Thanks a lot in advance.
>
> Thanks,
> Barani
> --
> View this message in context: 
> http://old.nabble.com/Confused-with-Shards-multicore-search-results-tp27776478p27776478.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: weighted search and index

2010-03-03 Thread Lance Norskog

Boosting by convention is "flat" at 1.0. Usually people boost with
numbers like 3 or 5 or 20.

On Wed, Mar 3, 2010 at 6:34 PM, Jianbin Dai  wrote:
> Hi Erick,
>
> Each doc contains some keywords that are indexed. However each keyword is
> associated with a weight to represent its importance. In my example,
> D1: fruit 0.8, apple 0.4, banana 0.2
>
> The keyword fruit is the most important keyword, which means I really really
> want it to be matched in a search result, but banana is less important (It
> would be good to be matched though).
>
> Hope that explains.
>
> Thanks.
>
> JB
>
>
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Wednesday, March 03, 2010 6:23 PM
> To: solr-user@lucene.apache.org
> Subject: Re: weighted search and index
>
> Then I'm totally lost as to what you're trying to accomplish. Perhaps
> a higher-level statement of the problem would help.
>
> Because no matter how often I look at your point <2>, I don't see
> what relevance the numbers have if you're not using them to
> boost at index time. Why are they even there?
>
> Erick
>
> On Wed, Mar 3, 2010 at 8:54 PM, Jianbin Dai  wrote:
>
>> Thank you very much Erick!
>>
>> 1. I used boost in search, but I don't know exactly what's the best way to
>> boost, for such as Sports 0.8, golf 0.5 in my example, would it be
>> sports^0.8 AND golf^0.5 ?
>>
>>
>> 2. I cannot use boost in indexing. Because the weight of the value
> changes,
>> not the field, look at this example again,
>>
>> C1: fruit 0.8, apple 0.4, banana 0.2
>> C2: music 0.9, pop song 0.6, Britney Spears 0.4
>>
>> There is no good way to boost it during indexing.
>>
>> Thanks.
>>
>> JB
>>
>>
>> -Original Message-
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: Wednesday, March 03, 2010 5:45 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: weighted search and index
>>
>> You have to provide some more details to get meaningful help.
>>
>> You say "I was trying to use boosting". How? At index time?
>> Search time? Both? Can you provide some code snippets?
>> What does your schema look like for the relevant field(s)?
>>
>> You say "but seems not working right". What does that mean? No hits?
>> Hits not ordered as you expect? Have you tried putting "&debugQuery=on" on
>> your URL and examined the return values?
>>
>> Have you looked at your index with the admin page and/or Luke to see if
>> the data in the index is as you expect?
>>
>> As far as I know, boosts are multiplicative. So boosting by a value less
>> than
>> 1 will actually decrease the ranking. But see the Lucene scoring, See:
>>
>>
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.
>>
> html<http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Simila
> rity.%0Ahtml>
>>
>> And remember, that boosting will *tend* to move a hit up or down in the
>> ranking, not position it absolutely.
>>
>> HTH
>> Erick
>>
>> On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai  wrote:
>>
>> > Hi,
>> >
>> > I am trying to use solr for a content match application.
>> >
>> > A content is described by a set of keywords with weights associated,
> eg.,
>> >
>> > C1: fruit 0.8, apple 0.4, banana 0.2
>> > C2: music 0.9, pop song 0.6, Britney Spears 0.4
>> >
>> > Those contents would be indexed in solr.
>> > In the search, I also have a set of keywords with weights:
>> >
>> > Query: Sports 0.8, golf 0.5
>> >
>> > I am trying to find the closest matching contents for this query.
>> >
>> > My question is how to index the contents with weighted scores, and how
> to
>> > write search query. I was trying to use boosting, but seems not working
>> > right.
>> >
>> > Thanks.
>> >
>> > Jianbin
>> >
>> >
>> >
>>
>>
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Escaping options for tika/solr cell extract-only output

2010-03-03 Thread Lance Norskog

You can return it with any of the other writers, like JSON or PHP.

The alternative design decision for the XML output writer would be to
emit using CDATA instead of escaping.

On Wed, Mar 3, 2010 at 12:54 PM, Dan Hertz (Insight 49, LLC)
 wrote:
> Looking at http://wiki.apache.org/solr/ExtractingRequestHandler:
>
> Extract Only
> "the output includes XML generated by Tika (and is hence further escaped by
> Solr's XML)"
>
> ...is there an option to NOT have the resulting TIKA output escaped?
>
> so <head> would come back as 
>
> If no, what would need to be done to enable this option? Looked into
> SOLR-1274.patch, but didn't see a parameter for such a thing.
>
> Thanks,
>
> Dan
>



-- 
Lance Norskog
goks...@gmail.com

Re: 2 Cores, 1 Table, 2 DataImporter --> Import at the same time ?

2010-03-03 Thread Lance Norskog

No, a "core" is a lucene index. Two DataImportHandler sessions to the
same core will run on the same index.

You should use lockType of simple or native. 'single' should only be
used on a read-only index.

>From the stack trace it looks like you're only using one index in
solr/core. You have to configure two separate cores with separate core
directories. Check out the example/multicore directory for how that
works.

On Wed, Mar 3, 2010 at 6:39 AM, stocki  wrote:
>
>
> okay i change the "lockType" to "single" but with no good effect.
>
> so i think now, that my two DIH are using the same data-Folder. why ist it
> so ? i thought that each DIH use his own index ... ?!
>
> i think it is not possible to import from one table parallel with more than
> one DIH`s ?!
>
>
> myexception:
>
> java.io.FileNotFoundException:
> /var/lib/tomcat5.5/temp/solr/data/index/_5d.fnm (No such file or directory)
>        at java.io.RandomAccessFile.open(Native Method)
>        at java.io.RandomAccessFile.(RandomAccessFile.java:212)
>        at
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:78)
>        at
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:108)
>        at
> org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.(NIOFSDirectory.java:94)
>        at
> org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:70)
>        at
> org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:691)
>        at org.apache.lucene.index.FieldInfos.(FieldInfos.java:68)
>        at
> org.apache.lucene.index.SegmentReader$CoreReaders.(SegmentReader.java:116)
>        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638)
>        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:608)
>        at
> org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:686)
>        at
> org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:662)
>        at
> org.apache.lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java:954)
>        at
> org.apache.lucene.index.IndexWriter.applyDeletes(IndexWriter.java:5190)
>        at
> org.apache.lucene.index.IndexWriter.doFlushInternal(IndexWriter.java:4354)
>        at
> org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:4192)
>        at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:4183)
>        at
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2647)
>        at
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2601)
>        at
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
>        at
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
>        at
> org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75)
>        at
> org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:292)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:392)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
>        at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
>        at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
>        at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
>
>
>
>
>
> Erik Hatcher-4 wrote:
>>
>> what's the error you're getting?
>>
>> is DIH keeping some static that prevents it from running across two
>> cores separately?  if so, that'd be a bug.
>>
>>       Erik
>>
>> On Mar 3, 2010, at 4:12 AM, stocki wrote:
>>
>>>
>>> pleeease help me somebody =( :P
>>>
>>>
>>>
>>>
>>> stocki wrote:
>>>>
>>>> Hello again ;)
>>>>
>>>> i install tomcat5.5 on my debian server ...
>>>>
>>>> i use 2 cores and two different DIH with seperatet Index, one for the
>>>> normal search-feature and the other core for the suggest-feature.
>>>>
>>>> but i cannot start both DIH with an import command at the same
>>>> time. how
>>>> it this possible ?
>>>>
>>>>
>>>> thx
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/2-Cores%2C-1-Table%2C-2-DataImporter---%3E-Import-at-the-same-time---tp27756255p27765825.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>
>>
>>
>
> --
> View this message in context: 
> http://old.nabble.com/SEVERE%3A-SolrIndexWriter-was-not-closed-prior-to-finalize%28%29%2C-indicates-a-bugPOSSIBLE-RESOURCE-LEAK%21%21%21-tp27756255p27768997.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: How can I get Solr-Cell to extract to multi-valued fields?

2010-03-02 Thread Lance Norskog

It is a bug. I just filed this. It is just a unit test that displays
the behavior.

http://issues.apache.org/jira/browse/SOLR-1803

On Tue, Mar 2, 2010 at 9:07 AM, Mark Roberts  wrote:
> Hi,
>
> I have a schema with a multivalued field like so:
>
>  multiValued="true"/>
>
> I am uploading html documents to the Solr extraction handler which contain 
> meta in the head, like so:
>
> 
> 
> 
>
> I want the extraction handler to map each of these pieces of meta onto the 
> product field, however, there seems to be a problem - only the last item 
> "andanotherproduct" is mapped, the first seem to be ignored.
>
> It does work, however, if I pass the values as literals in the query string 
> (e.g. 
> literal.product=firstproduct&literal.product=anotherproduct&literal.product=andanotherproduct)
>
> I've tried the release version 1.4 of solr and a recent nightly build of 1.5 
> and neither work.
>
> Is this a bug in Solr-cell or am I doing something wrong?
>
> Many thanks,
> Mark.
>



-- 
Lance Norskog
goks...@gmail.com

Re: Simultaneous Writes to Index

2010-03-02 Thread Lance Norskog

Locking is at a lower level than indexing and queries. Solr
coordinates multi-threaded indexing and query operations in memory and
a separate thread writes data to disk. There are no performance
problems with multiple searches and indexes happening at the same
time.

2010/3/2 Kranti™ K K Parisa :
> and also about the time when two update requests come at the same time. Then
> whichever request comes first will be updating the index while other
> requests wait until the locktimeout that we have configured??
>
>
> Best Regards,
> Kranti K K Parisa
>
>
>
> 2010/3/2 Kranti™ K K Parisa 
>
>> Hi Ron,
>>
>> Thanks for the reply. So does this mean that writer lock is nothing to do
>> with concurrent writes?
>>
>> Best Regards,
>> Kranti K K Parisa
>>
>>
>>
>> On Tue, Mar 2, 2010 at 4:19 PM, Ron Chan  wrote:
>>
>>> as long as the document id is unique, concurrent writes is fine
>>>
>>> if for same reason the same doc id is used then it is overwritten, so last
>>> in will be the one that is in the index
>>>
>>> Ron
>>>
>>> - Original Message -
>>> From: "Kranti™ K K Parisa" 
>>> To: solr-user@lucene.apache.org
>>> Sent: Tuesday, 2 March, 2010 10:40:37 AM
>>> Subject: Simultaneous Writes to Index
>>>
>>> Hi,
>>>
>>> I am planning to development some application on which users could update
>>> their account data after login, this is on top of the search facility
>>> users
>>> have. the basic work flow is
>>> 1) user logs in
>>> 2) searches for some data
>>> 3) gets the results from solr index
>>> 4) save some of the search results into their repository
>>> 5) later on they may view their repository
>>>
>>> for this, at step4 I am planning to write that into a separate solr index
>>> as
>>> user may search within his repository and get the results, facets..etc.
>>> So thinking to write such data/info to a separate solr index.
>>>
>>> in this plan, how simultaneous writes to the user history index works.
>>> what
>>> are the best practices in such scenarios of updating index at a time by
>>> different users.
>>>
>>> the other alternative is to store such user info into DB, and schedule
>>> indexing process at regular intervals. But that wont make the system live
>>> with user actions, as there would be some delay, users cant see the data
>>> they saved in their repository until its indexed.
>>>
>>> that is the reason I am planning to use SOLR xml post request to update
>>> the
>>> index silently but how about multiple users writing on same index?
>>>
>>> Best Regards,
>>> Kranti K K Parisa
>>>
>>
>>
>



-- 
Lance Norskog
goks...@gmail.com

Re: question regarding coord() value

2010-03-02 Thread Lance Norskog

The first 2 queries 'electORnics' instead of 'electROnics'.

The third query shows the situation. The first clause has 1 out of 2
matches, and the second has 1 out of 3 matches. Look for the two
'coord' entries. They are 1/2 and 1/3.

  
0.61808145 = (MATCH) sum of:
  0.16856766 = (MATCH) product of:
0.33713531 = (MATCH) sum of:
  0.33713531 = (MATCH) weight(name:samsung in 0), product of:
0.39687544 = queryWeight(name:samsung), product of:
  3.3978953 = idf(docFreq=1, maxDocs=22)
  0.116800375 = queryNorm
0.84947383 = (MATCH) fieldWeight(name:samsung in 0), product of:
  1.0 = tf(termFreq(name:samsung)=1)
  3.3978953 = idf(docFreq=1, maxDocs=22)
  0.25 = fieldNorm(field=name, doc=0)
0.5 = coord(1/2)
   0.44951376 = (MATCH) product of:
1.3485413 = (MATCH) sum of:
  1.3485413 = (MATCH) weight(manu:electronics in 0), product of:
0.39687544 = queryWeight(manu:electronics), product of:
  3.3978953 = idf(docFreq=1, maxDocs=22)
  0.116800375 = queryNorm
3.3978953 = (MATCH) fieldWeight(manu:electronics in 0), product of:
  1.0 = tf(termFreq(manu:electronics)=1)
  3.3978953 = idf(docFreq=1, maxDocs=22)
  1.0 = fieldNorm(field=manu, doc=0)
0.3334 = coord(1/3)


On Tue, Mar 2, 2010 at 3:35 AM, Smith G  wrote:
> Hello ,
>         I have been trying to find out what exactly coord-value is .
> I have executed different queries where I have observed strange
> behaviour.
> Leave the numerator-value in coord fraction at the moment as I am
> really confused what exactly the denominator is.
> Here are the examples .
>
> Query 1)
>
>  (+text:samsung +text:electron +name:samsung) (+manu:samsung
> +features:samsung (+manu:electronics +name:electronics))
> manu:electornics name:one name:two
>
> coord value is : 1/5 [consider only denominator], I guess as there are
> 5 clauses (combinations) it could be five.
> 
> Query 2)
>
> ((+text:samsung +(text:electron name:samsung)) (+manu:samsung
> +features:samsung (+manu:electronics +name:electronics)))
> (manu:electornics name:one) name:two
>
> coord value is :1/3 . Same logic works here [for the denominator value-3]
> 
> Query 3)
>
> (name:samsung features:abc) (features:name name:electronics manu:electronics)
>
> But here, coord value is : 1/3 . I have been trying to reckon how it
> could be "3", but I could not.
> -
>
> I have tried to correlate the info present in the Java Documentation,
> but I was not successful again.
> Please clarify.
>
> Thanks.
>



-- 
Lance Norskog
goks...@gmail.com

Re: Boost a document score via query using MoreLikeThisHandler

2010-03-01 Thread Lance Norskog

If I remove the space before !query, this is the error:
Cannot parse ')}': Encountered " ")" ") "" at line 1, column 0.

Perhaps someone knows how parentheses and curlies combine here?

Also: *.yahoo.com will not work. Wildcards do not work at the
beginning of a word. To make this search work, you should reverse the
order of the site name parts: 'com.yahoo.wahoo'.

On Mon, Mar 1, 2010 at 7:35 PM, Christopher Bottaro
 wrote:
> On Mon, Mar 1, 2010 at 7:36 PM, Christopher Bottaro
>  wrote:
>> Hello,
>>
>> Is it possible to boost a document's score based on something like
>> fq=site(com.google*).  In other words, I want to boost the score of
>> documents who's "site" field starts with "com.google".
>>
>> I'm using the MoreLikeThisHandler.
>>
>> Thanks for the help,
>> -- Christopher
>>
>
> Ok, I think I need to do this with BoostQParserPlugin and nested
> queries, but I can't quite figure it out.
>
> So this works...
> q={!boost b=log(popularity)}(title:barack OR title:obama)
>
> But instead of boosting by popularity, I want to boost by site:
> q={!boost b=query({ !query q='site:*.yahoo.com' })}(title:barack OR 
> title:obama)
>
> This is the exception I get...
> org.apache.lucene.queryParser.ParseException: Expected identifier at
> pos 18 str='{!boost b=query({ !query q='site:*.yahoo.com'
> })}(title:barack OR title:obama)'
>
> But that doesn't work.  Any tips?  Thanks.
>



-- 
Lance Norskog
goks...@gmail.com

Re: Solr Cell and Deduplication - Get ID of doc

2010-03-01 Thread Lance Norskog

To quote from the wiki,
http://wiki.apache.org/solr/ExtractingRequestHandler

curl 'http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true'
-F "myfi...@tutorial.html"

This runs the extractor on your input file (in this case an HTML
file). It then stores the generated document with the id field (the
uniqueKey declared in schema.xml) set to 'doc1'. This way, you do not
rely on the ExtractingRequestHandler to create a unique key for you.
This command throws away that generated key.

On Mon, Mar 1, 2010 at 4:22 PM, Chris Hostetter
 wrote:
>
> : You could create your own unique ID and pass it in with the
> : literal.field=value feature.
>
> By which Lance means you could specify an unique value in a differnet
> field from yoru uniqueKey field, and then query on that field:value pair
> to get the doc after it's been added -- but that query will only work
> until some other version of the doc (with some other value) overwrites it.
> so you'd esentially have to query for the field:value to lookup the
> uniqueKey.
>
> it seems like it should definitely be feasible for the
> Update RequestHandlers to return the uniqueKeyField values for all the
> added docs (regardless of wether the key was included in the request, or
> added by an UpdateProcessor -- but i'm not sure how that would fit in with
> the SolrJ API.
>
> would you mind opening a feature request in Jira?
>
>
>
> -Hoss
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Can't delete from curl

2010-03-01 Thread Lance Norskog

On Mon, Mar 1, 2010 at 4:02 PM, Paul Tomblin  wrote:
> I have a schema with a field name "category" ( type="string" stored="true" indexed="true"/>).  I'm trying to delete
> everything with a certain value of category with curl:
>
> I send:
>
> curl http://localhost:8080/solrChunk/nutch/update -H "Content-Type:
> text/xml" --data-binary 'category:Banks'
>
> Response is:
>
> 
> 
> 0 name="QTime">23
> 
>
> I send
>
> curl http://localhost:8080/solrChunk/nutch/update -H "Content-Type:
> text/xml" --data-binary ''
>
> Response is:
>
> 
> 
> 0 name="QTime">1914
> 
>
> but when I go back and query, it shows all the same results as before.
>
> Why isn't it deleting?
>
> --
> http://www.linkedin.com/in/paultomblin
> http://careers.stackoverflow.com/ptomblin
>

Do you query with curl also? If you use a web browser, Solr by default
uses http caching, so your browser will show you the old result of the
query.

-- 
Lance Norskog
goks...@gmail.com

Re: Solr Cell and Deduplication - Get ID of doc

2010-02-26 Thread Lance Norskog

You could create your own unique ID and pass it in with the
literal.field=value feature.

http://wiki.apache.org/solr/ExtractingRequestHandler#Input_Parameters

On Fri, Feb 26, 2010 at 7:56 AM, Bill Engle  wrote:
> Any thoughts on this? I would like to get the id back in the request after
> indexing.  My initial thoughts were to do a search to get the docid  based
> on the attr_stream_name after indexing but now that I reread my message I
> mentioned the attr_stream_name (file_name) may be different so that is
> unreliable.  My only option is to somehow return the id in the XML
> response.  Any guidance is greatly appreciated.
>
> -Bill
>
> On Wed, Feb 24, 2010 at 12:06 PM, Bill Engle  wrote:
>
>> Hi -
>>
>> New Solr user here.  I am using Solr Cell to index files (PDF, doc, docx,
>> txt, htm, etc.) and there is a good chance that a new file will have
>> duplicate content but not necessarily the same file name.  To avoid this I
>> am using the deduplication feature of Solr.
>>
>>   
>>     > class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
>>       true
>>       id
>>       true
>>       attr_content
>>       org.apache.solr.update.processor.
>>     
>>     
>>     
>>   
>>
>> How do I get the "id" value post Solr processing.  Is there someway to
>> modify the curl response so that id is returned.  I need this id because I
>> would like to rename the file to the id value.  I could probably do a Solr
>> search after the fact to get the id field based on the attr_stream_name but
>> I would like to do only one request.
>>
>> curl '
>> http://localhost:8080/solr/update/extract?uprefix=attr_&fmap.content=attr_content&commit=true'
>> -F "myfi...@myfile.pdf"
>>
>> Thanks,
>> Bill
>>
>



-- 
Lance Norskog
goks...@gmail.com

Re: If you could have one feature in Solr...

2010-02-25 Thread Lance Norskog

Error messages that make sense. I have to read the source far too
often when a simple change to errror-handling would make some feature
easy to use. If I want to read Java I'll use Lucene!

Passive-aggressive error handling is a related problem: when I do
something nonsensical I too often get "0 results found" instead of
"what does that mean?".

On Thu, Feb 25, 2010 at 12:52 PM, Smiley, David W.  wrote:
> 1. Spatial search
> 2. Ease of managing a sharded index, multi-server Solr instance.
>
> I am aware these are in-progress, slated for Solr 1.5.
>
> I may find myself getting involved on these shortly because I'm working on a 
> very large scale search project requiring both.
>
> ~ David
>
> On Feb 24, 2010, at 8:42 AM, Grant Ingersoll wrote:
>
>> What would it be?
>
>

-- 
Lance Norskog
goks...@gmail.com

Re: Solr Cell RTF Woes

2010-02-25 Thread Lance Norskog

StyledDocument.java:95)
>    at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:42)
>    at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119)
>    at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
>    at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
>    at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>    at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>    at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
>    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>    at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>    at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>    at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>    at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>    at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>    at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>    at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>    at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>    at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>    at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
>    at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
>    at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>    at
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
>    at java.lang.Thread.run(Thread.java:637)
> description The server encountered an internal error
> (Could not initialize class java.awt.EventQueue
>
> java.lang.NoClassDefFoundError: Could not initialize class
> java.awt.EventQueue
>    at
> javax.swing.SwingUtilities.isEventDispatchThread(SwingUtilities.java:1333)
>    at javax.swing.text.StyleContext.reclaim(StyleContext.java:437)
>    at javax.swing.text.StyleContext.addAttribute(StyleContext.java:294)
>    at
> javax.swing.text.StyleContext$NamedStyle.addAttribute(StyleContext.java:1488)
>    at
> javax.swing.text.StyleContext$NamedStyle.setName(StyleContext.java:1298)
>    at
> javax.swing.text.StyleContext$NamedStyle.<init>(StyleContext.java:1245)
>    at javax.swing.text.StyleContext.addStyle(StyleContext.java:90)
>    at javax.swing.text.StyleContext.<init>(StyleContext.java:70)
>    at
> javax.swing.text.DefaultStyledDocument.<init>(DefaultStyledDocument.java:95)
>    at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:42)
>    at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119)
>    at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
>    at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
>    at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>    at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>    at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
>    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>    at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>    at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>    at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>    at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>    at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>    at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>    at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>    at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>    at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>    at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
>    at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
>    at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>    at
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
>    at java.lang.Thread.run(Thread.java:637)
> ) that prevented it from fulfilling this request. noshade="noshade">Apache Tomcat/6.0.18
>



-- 
Lance Norskog
goks...@gmail.com

Re: Delta Query - DIH

2010-02-25 Thread Lance Norskog

It may be easier to understand the problem if you create views for the
full- and delta-import queries.

On Thu, Feb 25, 2010 at 9:09 AM, JavaGuy84  wrote:
>
> Hi,My data config looks like below,
>           query="select * from z where id=x.id">I am able to successfully run the
> Full-Import query without any issue. I am not sure how can I implement a
> delta query as each of the tables get updated independantly and I need the
> updates of that particular table to get reflected independently (in the solr
> document).Thanks,Barani
> --
> View this message in context: 
> http://old.nabble.com/Delta-Query---DIH-tp27714480p27714480.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Using XSLT with DIH for a URLDataSource

2010-02-25 Thread Lance Norskog

There could be a common 'open an url' utility method. This would help
make the DIH components consistent.

2010/2/24 Noble Paul നോബിള്‍  नोब्ळ् :
> you are right. The StreamSource class is not throwing the proper exception
>
> Do we really have to handle this.?
>
> On Thu, Feb 25, 2010 at 9:06 AM, Lance Norskog  wrote:
>> [Taken off the list]
>>
>> The problem is that the XSLT code swallows the real exception, and
>> does not return it as the "deeper" exception.  To show the right
>> error, the code would open a file name or an URL directly. The problem
>> is, the code has to throw an exception on a file or an URL and try the
>> other, then decide what to do.
>>
>>       try {
>>          URL u = new URL(xslt);
>>          iStream = u.openStream();
>>        } catch (MalformedURLException e) {
>>          iStream = new FileInputStream(new File(xslt));
>>        }
>>        TransformerFactory transFact = TransformerFactory.newInstance();
>>        xslTransformer = transFact.newTransformer(new StreamSource(iStream));
>>
>>
>> On Mon, Feb 22, 2010 at 6:24 AM, Roland Villemoes  
>> wrote:
>>> You're right!
>>>
>>> I was as simple (stupid!) as that,
>>>
>>> Thanks a lot (for your time .. very appreciated)
>>>
>>> Roland
>>>
>>> -Oprindelig meddelelse-
>>> Fra: noble.p...@gmail.com [mailto:noble.p...@gmail.com] På vegne af Noble 
>>> Paul ??? ??
>>> Sendt: 22. februar 2010 14:01
>>> Til: solr-user@lucene.apache.org
>>> Emne: Re: Using XSLT with DIH for a URLDataSource
>>>
>>> The xslt file looks fine . is the location of the file correct ?
>>>
>>> On Mon, Feb 22, 2010 at 2:57 PM, Roland Villemoes  
>>> wrote:
>>>>
>>>> Hi
>>>>
>>>> (thanks a lot)
>>>>
>>>> Yes, The full stacktrace is this:
>>>>
>>>> 22-02-2010 08:37:00 org.apache.solr.handler.dataimport.DataImporter 
>>>> doFullImport
>>>> SEVERE: Full Import failed
>>>> org.apache.solr.handler.dataimport.DataImportHandlerException: Error 
>>>> initializing XSL  Processing Document # 1
>>>>        at 
>>>> org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:103)
>>>>        at 
>>>> org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:76)
>>>>        at 
>>>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71)
>>>>        at 
>>>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319)
>>>>        at 
>>>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
>>>>        at 
>>>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
>>>>        at 
>>>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
>>>>        at 
>>>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
>>>>        at 
>>>> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:203)
>>>>        at 
>>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>>>>        at 
>>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>>>>        at 
>>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>>>>        at 
>>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>>>        at 
>>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>>>        at 
>>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>>>        at 
>>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>>>>        at 
>>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>>>>        at 
>>>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>>>        at 
>&

Re: Highlighting inside a field with HTML contents

2010-02-24 Thread Lance Norskog

Yes, the raw HTML will have word inserted. This may put
markup where you did not intend.

On Mon, Feb 22, 2010 at 7:24 AM, Xavier Schepler
 wrote:
> Hello,
>
> this field would not be searched, but it would be used to display results.
>
> A query could be :
>
> q=table&hl=true&hl.fl=htmlfield&hl.fragsize=0
>
> It would be tokenized with the HTMLStripStandardTokenizerFactory, then
> analyzed the same way as the searcheable fields.
>
> Could this result in highlighting inside HTML tags (I mean thinks like
> <table...table>) ?
>



-- 
Lance Norskog
goks...@gmail.com

Re: Using XSLT with DIH for a URLDataSource

2010-02-24 Thread Lance Norskog

 24 more
>> 22-02-2010 08:37:00 org.apache.solr.update.DirectUpdateHandler2 rollback
>>
>>
>> My import feed (for testing is this):
>> 
>> 
>> 
>> 
>> 
>>    
>>        
>>        
>>            
>>        
>>    
>>    
>>        
>>    
>> 
>> 310.70> currency='SEK'>233.03> id='6'>11-11-2008
>>  15:10:31
>> 
>>  
>>    
>>      
>>      
>>        
>>      
>>    
>>    
>>      
>>    
>>  
>>  > id='39'>> currency='SEK'>154.96> currency='SEK'>154.96> id='28'>> id='5'>18-08-2009
>>  15:44:46
>> 
>>
>>
>> And my test.xslt (cut down to almost nothing just to move further and see 
>> that XSLT was working):
>>
>> 
>> > xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
>>  
>>    
>>      
>>    
>>  
>>
>>  
>>    
>>      
>>        
>>      
>>      
>>        
>>      
>>    
>>  
>>
>> 
>>
>>
>>
>>
>> -Oprindelig meddelelse-
>> Fra: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com]
>> Sendt: 22. februar 2010 10:08
>> Til: solr-user@lucene.apache.org
>> Emne: Re: Using XSLT with DIH for a URLDataSource
>>
>> On Mon, Feb 22, 2010 at 1:18 PM, Roland Villemoes 
>> wrote:
>>
>>> Hi,
>>>
>>> I have to load data for Solr from a UrlDataSource supplying me with a XML
>>> feed.
>>>
>>> In the simple case where I just do simple XSLT select this works just fine.
>>> Just as shown on the wiki (http://wiki.apache.org/solr/DataImportHandler)
>>>
>>> But I need to do some manipulation of the XML feed first, So I am trying to
>>> a transform first using:
>>>
>>>
>>> 
>>
>>
>>> But no matter what I do in my "test.xslt" - I get the same error:
>>>
>>> ...
>>> org.apache.solr.handler.dataimport.DataImportHandlerException: Error
>>> initializing XSL  Processing Document # 1
>>> ...
>>> Caused by: javax.xml.transform.TransformerConfigurationException: Could not
>>> compile stylesheet
>>>
>>>
>>> Anyone that can help me out here? Or has a running example using XSLT with
>>> DIH?
>>>
>>>
>> Can you send the complete stacktrace?
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>
>
>
> --
> -
> Noble Paul | Systems Architect| AOL | http://aol.com
>



-- 
Lance Norskog
goks...@gmail.com

Re: parsing strings into phrase queries

2010-02-22 Thread Lance Norskog

Thanks Robert, that helped.

On Thu, Feb 18, 2010 at 5:48 AM, Robert Muir  wrote:
> i gave it a rough shot Lance, if there's a better way to explain it, please
> edit
>
> On Wed, Feb 17, 2010 at 10:23 PM, Lance Norskog  wrote:
>
>> That would be great. After reading this and the PositionFilter class I
>> still don't know how to use it.
>>
>> On Wed, Feb 17, 2010 at 12:38 PM, Robert Muir  wrote:
>> > i think we can improve the docs/wiki to show this example use case, i
>> > noticed the wiki explanation for this filter gives a more complex
>> shingles
>> > example, which is interesting, but this seems to be a common problem and
>> > maybe we should add this use case.
>> >
>> > On Wed, Feb 17, 2010 at 1:54 PM, Chris Hostetter
>> > wrote:
>> >
>> >>
>> >> : take a look at PositionFilter
>> >>
>> >> Right, there was another thread recently where almost the exact same
>> issue
>> >> was discussed...
>> >>
>> >> http://old.nabble.com/Re%3A-Tokenizer-question-p27120836.html
>> >>
>> >> ..except that i was ignorant of the existence of PositionFilter when i
>> >> wrote that message.
>> >>
>> >>
>> >>
>> >> -Hoss
>> >>
>> >>
>> >
>> >
>> > --
>> > Robert Muir
>> > rcm...@gmail.com
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>
>
>
> --
> Robert Muir
> rcm...@gmail.com
>



-- 
Lance Norskog
goks...@gmail.com

Re: Deleting spelll checker index

2010-02-22 Thread Lance Norskog

More precisely, remnant terms from deleted documents slowly disappear
as you add new documents or when you optimize the index.

On Thu, Feb 18, 2010 at 11:09 AM, darniz  wrote:
>
> Thanks
> If this is really the case, i declared a new filed called mySpellTextDup and
> retired the original field.
> Now i have a new field which powers my dictionary with no words in it and
> now i am free to index which ever term i want.
>
> This is not the best of solution but i cant think of a reasonable workaround
>
> Thanks
> darniz
>
>
> Lance Norskog-2 wrote:
>>
>> This is a quirk of Lucene - when you delete a document, the indexed
>> terms for the document are not deleted. That is, if 2 documents have
>> the word 'frampton' in an indexed field, the term dictionary contains
>> the entry 'frampton' and pointers to those two documents. When you
>> delete those two documents, the index contains the entry 'frampton'
>> with an empty list of pointers. So, the terms are still there even
>> when you delete all of the documents.
>>
>> Facets and the spellchecking dictionary build from this term
>> dictionary, not from the text string that are 'stored' and returned
>> when you search for the documents.
>>
>> The  command throws away these remnant terms.
>>
>> http://www.lucidimagination.com/blog/2009/03/18/exploring-lucenes-indexing-code-part-2/
>>
>> On Wed, Feb 17, 2010 at 12:24 PM, darniz  wrote:
>>>
>>> Please bear with me on the limitted understanding.
>>> i deleted all documents and i made a rebuild of my spell checker  using
>>> the
>>> command
>>> spellcheck=true&spellcheck.build=true&spellcheck.dictionary=default
>>>
>>> After this i went to the schema browser and i saw that mySpellText still
>>> has
>>> around 2000 values.
>>> How can i make sure that i clean up that field.
>>> We had the same issue with facets too, even though we delete all the
>>> documents, and if we do a facet on make we still see facets but we can
>>> filter out facets by saying facet.mincount>0.
>>>
>>> Again coming back to my question how can i make mySpellText fields get
>>> rid
>>> of all previous terms
>>>
>>> Thanks a lot
>>> darniz
>>>
>>>
>>>
>>> hossman wrote:
>>>>
>>>> : But still i cant stop thinking about this.
>>>> : i deleted my entire index and now i have 0 documents.
>>>> :
>>>> : Now if i make a query with accrd i still get a suggestion of accord
>>>> even
>>>> : though there are no document returned since i deleted my entire index.
>>>> i
>>>> : hope it also clear the spell check index field.
>>>>
>>>> there are two Lucene indexes when you use spell checking.
>>>>
>>>> there is the "main" index which is goverend by your schema.xml and is
>>>> what
>>>> you add your own documents to, and what searches are run agains for the
>>>> result section of solr responses.
>>>>
>>>> There is also the "spell" index which has only two fields and in
>>>> which each "document" corrisponds to a "word" that might be returend as
>>>> a
>>>> spelling suggestion, and the other fields contain various
>>>> start/end/middle
>>>> ngrams that represent possible misspellings.
>>>>
>>>> When you use the spellchecker component it builds the "spell" index
>>>> makinga document out of every word it finds in whatever field name you
>>>> configure it to use.
>>>>
>>>> deleting your entire "main" index won't automaticly delete the "spell"
>>>> index (allthough you should be able rebuild the "spell" index using the
>>>> *empty* "main" index, that should work).
>>>>
>>>> : i am copying both fields to a field called
>>>> : 
>>>> : 
>>>>
>>>> ..at this point your "main" index has a field named mySpellText, and for
>>>> ever document it contains a copy of make and model.
>>>>
>>>> :         
>>>> :             default
>>>> :             mySpellText
>>>> :             true
>>>> :             true
>>>>
>>>> ...so whenever you commit or optimize your "main" index it will take
>>>>

Re: Faceting

2010-02-22 Thread Lance Norskog

There are several component libraries for UIMA on the net:
http://incubator.apache.org/uima/external-resources.html

2010/2/18 José Moreira :
> have you used UIMA? i did a quick read on the docs and it seems to do what
> i'm looking for.
>
> 2010/2/11 Otis Gospodnetic 
>
>> Note that UIMA doesn't doe NER itself (as far as I know), but instead
>> relies on GATE or OpenNLP or OpenCalais, AFAIK :)
>>
>> Those interested in UIMA and living close to New York should go to
>> http://www.meetup.com/NYC-Search-and-Discovery/calendar/12384559/
>>
>>
>> Otis
>> 
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Hadoop ecosystem search :: http://search-hadoop.com/
>>
>>
>>
>> - Original Message 
>> > From: Jan Høydahl / Cominvent 
>> > To: solr-user@lucene.apache.org
>> > Sent: Tue, February 9, 2010 9:57:26 AM
>> > Subject: Re: Faceting
>> >
>> > NOTE: Please start a new email thread for a new topic (See
>> > http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking)
>> >
>> > Your strategy could work. You might want to look into dedicated entity
>> > extraction frameworks like
>> > http://opennlp.sourceforge.net/
>> > http://nlp.stanford.edu/software/CRF-NER.shtml
>> > http://incubator.apache.org/uima/index.html
>> >
>> > Or if that is too much work, look at
>> > http://issues.apache.org/jira/browse/SOLR-1725 for a way to plug in your
>> entity
>> > extraction code into Solr itself using a scripting language.
>> >
>> > --
>> > Jan Høydahl  - search architect
>> > Cominvent AS - www.cominvent.com
>> >
>> > On 5. feb. 2010, at 20.10, José Moreira wrote:
>> >
>> > > Hello,
>> > >
>> > > I'm planning to index a 'content' field for search and from that
>> > > fields text content i would like to facet (probably) according to if
>> > > the content has e-mails, urls and within urls, url's to pictures,
>> > > videos and others.
>> > >
>> > > As i'm a relatively new user to Solr, my plan was to regexp the
>> > > content in my application and add tags to a Solr field according to
>> > > the content, so for example the content "m...@email.com
>> > > http://www.site.com"; would have the tags "email, link".
>> > >
>> > > If i follow this path can i then facet on "email" and/or "link" ? For
>> > > example combining facet field with facet value params?
>> > >
>> > > Best
>> > >
>> > > --
>> > > http://pt.linkedin.com/in/josemoreira
>> > > josemore...@irc.freenode.net
>> > > http://djangopeople.net/josemoreira/
>>
>>
>
>
> --
> josemore...@irc.freenode.net
> http://pt.linkedin.com/in/josemoreira
> http://djangopeople.net/josemoreira/
>



-- 
Lance Norskog
goks...@gmail.com

Re: some scores to 0 using omitNorns=false

2010-02-22 Thread Lance Norskog

http://wiki.apache.org/lucene-java/ConceptsAndDefinitions

On Thu, Feb 18, 2010 at 7:13 AM, Raimon Bosch  wrote:
>
>
> I am not an expert in lucene scoring formula, but omintNorms=false makes the
> scoring formula a little bit more complex, taking into account boosting for
> fields and documents. If I'm not wrong (if I am please, correct me) I think
> that with omitNorms=false take into account the queryNorm(q) and norm(t,d)
> from formula: score(q,d)   =   coord(q,d)  ·  queryNorm(q)  ·            ∑    
>    (
> tf(t in d)  ·  idf(t)2  ·  t.getBoost() ·  norm(t,d)  ) so the formula will
> be more complex.
>
> See
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html,
> and
> http://old.nabble.com/scores-are-the-same-for-many-diferent-documents-td27623039.html#a27623039
>
> multiValued option is used to create fields with multiple values.
>
> We use it one of our indexed modifying the schema.xml, adding a new field
>
> ...
>  stored="true" multiValued="true"/>
> ...
>
> This field is processed in a specific UpdateRequestProcessorFactory (write
> by us) from a comma separated field called 's_similar_names':
> ...
> public void processAdd(AddUpdateCommand cmd) throws IOException {
>    SolrInputDocument doc = cmd.getSolrInputDocument();
>
>    String v = (String)doc.getFieldValue( "s_similar_names" );
>    if( v != null ) {
>      String s_similar_names[] = v.split(",");
>      for(String s_similar_name : s_similar_names){
>        if(!s_similar_name.equals(""))
>            doc.addField( "s_similar_name", s_similar_name );
>      }
>    }
>
>    // pass it up the chain
>    super.processAdd(cmd);
>  }
> ...
>
> A processofactory is specified in solrconfig.xml
>
> ...
> # 
> #      class="org.apache.solr.update.processor.MyUpdateProcessorFactory"/>
> #     
> #     
> #   
> ...
>
> and adding this chain to XmlUpdateRequestHandler in solrconfig.xml:
>
> ...
> # 
> #     
> #        mychain
> #      
> #   
> ...
>
> termVector is used to save more info about terns of a document in the index
> and save computational time in functions like MoreLikeThis.
> http://wiki.apache.org/solr/TermVectorComponent. We don't use it.
>
>
> adeelmahmood wrote:
>>
>> I was gonna ask a question about this but you seem like you might have the
>> answer for me .. wat exactly is the omitNorms field do (or is expected to
>> do) .. also if you could please help me understand what termVectors and
>> multiValued options do ??
>> Thanks for ur help
>>
>>
>> Raimon Bosch wrote:
>>>
>>>
>>> Hi,
>>>
>>> We did some tests with omitNorms=false. We have seen that in the last
>>> result's page we have some scores set to 0.0. This scores setted to 0 are
>>> problematic to our sorters.
>>>
>>> It could be some kind of bug?
>>>
>>> Regrads,
>>> Raimon Bosch.
>>>
>>
>>
>
> --
> View this message in context: 
> http://old.nabble.com/some-scores-to-0-using-omitNorns%3Dfalse-tp27637436p27637827.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Has anyone prepared a general purpose synonyms.txt for search engines

2010-02-17 Thread Lance Norskog

openthesaurus seems to be european languages, not including English :)
 Wordnet is a venerable thesaurus project:

http://wordnet.princeton.edu/

and lucene-contrib includes a set of tools for using it.

http://www.lucidimagination.com/search/?q=wordnet

On Fri, Feb 12, 2010 at 11:51 AM, Julian Hille  wrote:
> Hi,
>
> Your welcome. Thats something google came up with some weeks ago :)
>
>
> Am 12.02.2010 um 20:42 schrieb Emad Mushtaq:
>
>> Wow thanks!! You all are awesome! :D :D
>>
>> On Sat, Feb 13, 2010 at 12:32 AM, Julian Hille  wrote:
>>
>>> Hi,
>>>
>>> at openthesaurus.org or .com you can find a mysql version of synonyms you
>>> just have to join it to fit the synonym schema of solr yourself.
>>>
>>>
>>> Am 12.02.2010 um 20:03 schrieb Emad Mushtaq:
>>>
>>>> Hi,
>>>>
>>>> I was wondering if anyone has prepared a synonyms.txt for general purpose
>>>> search engines,  that can be shared. If not could you refer me to places
>>>> where such a synonym list or thesaurus can be found. Synonyms for search
>>>> engines are different from the regular thesaurus. Any help would be
>>> highly
>>>> appreciated. Thanks.
>>>>
>>>> --
>>>> Muhammad Emad Mushtaq
>>>> http://www.emadmushtaq.com/
>>>
>>> Mit freundlichen Grüßen,
>>> Julian Hille
>>>
>>>
>>>
>>
>>
>> --
>> Muhammad Emad Mushtaq
>> http://www.emadmushtaq.com/
>
> Mit freundlichen Grüßen,
> Julian Hille
>
>
> ---
> NetImpact KG
> Altonaer Straße 8
> 20357 Hamburg
>
> Tel: 040 / 6738363 2
> Mail: jul...@netimpact.de
>
> Geschäftsführer: Tarek Müller
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: labeling facets and highlighting question

2010-02-17 Thread Lance Norskog

Here's the problem: the wiki page is confusing:

http://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters

The line:
q=mainquery&fq=status:public&fq={!tag=dt}doctype:pdf&facet=on&facet.field={!ex=dt}doctype

is standalone, but the later line:

facet.field={!ex=dt key=mylabel}doctype

mean 'change the long query from {!ex=dt}docType to {!ex=dt key=mylabel}docType'

'tag=dt' creates a tag (name) for a filter query, and 'ex=dt' means
'exclude this filter query'.

On Wed, Feb 17, 2010 at 4:30 PM, adeelmahmood  wrote:
>
> simple question: I want to give a label to my facet queries instead of the
> name of facet field .. i found the documentation at solr site that I can do
> that by specifying the key local param .. syntax something like
> facet.field={!ex=dt%20key='By%20Owner'}owner
>
> I am just not sure what the ex=dt part does .. if i take it out .. it throws
> an error so it seems its important but what for ???
>
> also I tried turning on the highlighting and i can see that it adds the
> highlighting items list in the xml at the end .. but it only points out the
> ids of all the matching results .. it doesnt actually shows the text data
> thats its making a match with // so i am getting something like this back
>
> 
>  
>  
> ...
>
> instead of the actual text thats being matched .. isnt it supposed to do
> that and wrap the search terms in em tag .. how come its not doing that in
> my case
>
> here is my schema
>  />
> 
> 
> 
>
> --
> View this message in context: 
> http://old.nabble.com/labeling-facets-and-highlighting-question-tp27632747p27632747.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: parsing strings into phrase queries

2010-02-17 Thread Lance Norskog

That would be great. After reading this and the PositionFilter class I
still don't know how to use it.

On Wed, Feb 17, 2010 at 12:38 PM, Robert Muir  wrote:
> i think we can improve the docs/wiki to show this example use case, i
> noticed the wiki explanation for this filter gives a more complex shingles
> example, which is interesting, but this seems to be a common problem and
> maybe we should add this use case.
>
> On Wed, Feb 17, 2010 at 1:54 PM, Chris Hostetter
> wrote:
>
>>
>> : take a look at PositionFilter
>>
>> Right, there was another thread recently where almost the exact same issue
>> was discussed...
>>
>> http://old.nabble.com/Re%3A-Tokenizer-question-p27120836.html
>>
>> ..except that i was ignorant of the existence of PositionFilter when i
>> wrote that message.
>>
>>
>>
>> -Hoss
>>
>>
>
>
> --
> Robert Muir
> rcm...@gmail.com
>



-- 
Lance Norskog
goks...@gmail.com

< 5 6 7 8 9 10 11 12 13 14 >

901 - 1000 of 1436 matches

Mail list logo