Re: Date Facet Giving Count more than actual

2009-11-05 Thread Aakash Dharmadhikari
Thanks Hoss, the problem is resolved.

The real problem was my query parameter. I was storing daysForFilter with
offset of 1 sec, and date in query parameter "facet.date.start" also had
same offset. This was causing the overlaps, as in the facet value of
2009-10-23T18:30:01 was matching both 2009-10-23T18:30:01 and
2009-10-24T18:30:01.

just changing the query to
"q=&facet=true&facet.date=daysForFilter&facet.date.start=2009-10-23T18:30:00Z&facet.date.gap=%2B1DAY&facet.date.end=2009-10-28T18:30:00Z"
works.

thanks any way.

regards,
aakash.

On Tue, Nov 3, 2009 at 9:43 PM, Chris Hostetter wrote:

>
> :
> q=&facet=true&facet.date=daysForFilter&facet.date.start=2009-10-23T18:30:01Z&facet.date.gap=%2B1DAY&facet.date.end=2009-10-28T18:30:01Z
>
> : For example I get total 18 documents for my query, and the facet count
> for
> : date 2009-10-23T18:30:01Z is 11; whereas there are only 5 documents
> : containing this field value. I have verified this in result. Also when I
> : query for daysForFilter:2009-10-23T18:30:01Z, it gives me 5 results.
>
> I think you are missunderstanding what date faceting does.  you have a
> facet.date.gap of +1DAY, which means the facet count is anything between
> 2009-10-23T18:30:01Z and 2009-10-24T18:30:01Z inclusively.  you can verify
> this using a range query (not a term query) ...
>
>  daysForFilter:[2009-10-23T18:30:01Z TO 2009-10-23T18:30:01Z+1DAY]
>
> if you only want to facet on a unique moment in time (not a range) then
> you cna use facet.query ... or you can set the facet gap smaller.
>
> you should also take a look at facet.date.hardend...
> http://wiki.apache.org/solr/SimpleFacetParameters#facet.date.hardend
>
>
> -Hoss
>
>


ERROR: multiple values encountered for non multiValued copy field

2009-11-05 Thread Christian López Espínola
Hi,

I'm using solr with solrj and when I specify a field to copy in my
schema it stops working with the exception:


org.apache.solr.client.solrj.SolrServerException:
org.apache.solr.client.solrj.SolrServerException:
org.apache.solr.common.SolrException: ERROR: multiple values
encountered for non multiValued copy field all: 
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:161)
at 
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:217)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)
at org.apache.solr.client.solrj.SolrServer.addBean(SolrServer.java:67)

My field 'all' is defined as follows:

   



   




Those fields are:







If I remove the

everything works fine.


Any hint? TIA

-- 
Cheers,

Christian López Espínola 


Re: leading and trailing wildcard query

2009-11-05 Thread Otis Gospodnetic
> Please elaborate. What do you mean by *desrever* string?

Try reading in reverse ;).

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: A. Steven Anderson 
> To: solr-user@lucene.apache.org
> Sent: Thu, November 5, 2009 5:23:48 PM
> Subject: Re: leading and trailing wildcard query
> 
> >
> > The guilt trick is not the best thing to try on public mailing lists. :)
> >
> 
> Point taken, although not my intention.  I guess I have been spoiled by
> quick replies and was getting to think it was a stupid question.
> 
> Plus, I'm literally gonna get trash talk from my Oracle DBE if I can't make
> this work. ;-)
> 
> We've basically relegated Oracle to handling ingest from which we index Solr
> and provide all search features.  I'd hate to have to succumb to using
> Oracle to service this one special query.
> 
> 
> > The first thing that popped to my mind is to use 2 fields, where the second
> > one contains the desrever string of the first one.
> >
> 
> Please elaborate. What do you mean by *desrever* string?
> 
> 
> > The second idea is to use n-grams (if it's OK to tokenize), more
> > specifically edge n-grams.
> >
> 
> Well, that's the problem.  The field may have non-Latin characters that may
> not have whitespace nor punctuation.
> 
> 
> -- 
> A. Steven Anderson



Re: CPU Max Utilization

2009-11-05 Thread Otis Gospodnetic
You may also want to share some sample queries, your fields definitions, and 
tell us how long a core remains 100% utilized.

 Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: ba ba 
> To: solr-user@lucene.apache.org
> Sent: Thu, November 5, 2009 9:20:13 PM
> Subject: CPU Max Utilization
> 
> Greetings,
> 
> I'm running a solr instance with 100 million documents in it. The index is
> 18 GB.
> 
> The strange behavior I'm seeing is CPU utilization gets maxed out. I'm
> running on an 8 core machine with 32 GB or ram. Every concurrent query I run
> on it uses up one of the cores. So, if I am running 1 concurrent query I'm
> using up the cpu of one of the cores. If I have 8 concurrent queries I'm
> using up all of the cores.
> 
> Is this normal to have such a high CPU utilization. If not, what am I doing
> wrong here. The only thing I have modified is the schema.xml file to
> correspond to the documents I want to store. Everything else is just using
> the default values for all the config files.
> 
> Thanks.



Re: solr query help alpha numeric and not

2009-11-05 Thread Joel Nylund
Avlesh, thanks those worked, for somre reason I never got your mail,  
found it in one of the list archives though.


thanks again
Joel

On Nov 5, 2009, at 9:08 PM, Avlesh Singh wrote:


Didn't the queries in my reply work?

Cheers
Avlesh

On Fri, Nov 6, 2009 at 4:16 AM, Joel Nylund  wrote:

Hi yes its a string, in the case of a title, it can be anything, a  
letter a

number, a symbol or a multibyte char etc.

Any ideas if I wanted a query that was not a letter a-z or a number  
0-9,

given that its a string?

thanks
Joel


On Nov 4, 2009, at 9:10 AM, Jonathan Hendler wrote:

Hi Joel,


The ID is sent back as a string (instead of as an integer) in your
example. Could this be the cause?

- Jonathan

On Nov 4, 2009, at 9:08 AM, Joel Nylund wrote:

Hi, I have a field called firstLetterTitle, this field has 1 char,  
it can

be anything, I need help with a few queries on this char:

1.) I want all NON ALPHA and NON numbers, so any char that is not  
A-Z or

0-9

I tried:


http://localhost:8983/solr/select?q=NOT%20firstLetterTitle:0%20TO%209%20AND%20NOT%20firstLetterTitle:A%20TO%20Z

But I get back numeric results:


9
23946447



2.) I want all only Numerics:

http://localhost:8983/solr/select?q=firstLetterTitle:0%20TO%209

This seems to work but just checking if its the right way.



2.) I want all only English Letters:

http://localhost:8983/solr/select?q=firstLetterTitle:A%20TO%20Z

This seems to work but just checking if its the right way.


thanks
Joel










Re: DIH timezone offset

2009-11-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
anyone to add this here
http://wiki.apache.org/solr/DataImportHandlerFaq

On Thu, Nov 5, 2009 at 8:35 PM,   wrote:
> """
> DIH relies on the driver to get the date. It does not do any automatic
> conversion. Is it possible for the driver to give the date with the
> right offset?
> """
>
> I have retried a full-import after setting the Java user.timezone property to 
> UTC and the dates import correctly. I've narrowed down the problem to the way 
> SQL server is returning dates. Converting it to ISO-8601 format resolves the 
> issue, but I had to append a 'Z' at the end of the conversion like so: 
> "select convert(varchar(30),datesentutc,126)+'Z' as date from table".
>
> Hope this is helpful to someone else. Thanks for the help.
>
> Mike
>
>
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: specify multiple files in for DataImportHandler

2009-11-05 Thread Jay Hill
You can set up multiple request handlers each with their own configuration
file. For example, in addition to the config you listed you could add
something like this:



 data-two-config.xml



and so on with as many handlers as you need.

-Jay
http://www.lucidimagination.com


On Thu, Nov 5, 2009 at 8:57 AM, javaxmlsoapdev  wrote:

>
>   class="org.apache.solr.handler.dataimport.DataImportHandler">
> 
>  data-config.xml
> 
> 
>
> is there a way to list more than one files in the  above
> configuration?
> I understand I can have multiple  itself in the config but I need
> to
> keep two data-config files separate and still use same DIH to create one
> index.
> --
> View this message in context:
> http://old.nabble.com/specify-multiple-files-in-%3Clst%3E-for-DataImportHandler-tp26215805p26215805.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: CPU Max Utilization

2009-11-05 Thread Walter Underwood

Are you requesting results by relevance or are you sorting by a field?

How many results are you requesting?

Are you using real user queries (with repetition) or a flat  
distrubution of queries?


wunder

On Nov 5, 2009, at 6:20 PM, ba ba wrote:


Greetings,

I'm running a solr instance with 100 million documents in it. The  
index is

18 GB.

The strange behavior I'm seeing is CPU utilization gets maxed out. I'm
running on an 8 core machine with 32 GB or ram. Every concurrent  
query I run
on it uses up one of the cores. So, if I am running 1 concurrent  
query I'm
using up the cpu of one of the cores. If I have 8 concurrent queries  
I'm

using up all of the cores.

Is this normal to have such a high CPU utilization. If not, what am  
I doing

wrong here. The only thing I have modified is the schema.xml file to
correspond to the documents I want to store. Everything else is just  
using

the default values for all the config files.

Thanks.




CPU Max Utilization

2009-11-05 Thread ba ba
Greetings,

I'm running a solr instance with 100 million documents in it. The index is
18 GB.

The strange behavior I'm seeing is CPU utilization gets maxed out. I'm
running on an 8 core machine with 32 GB or ram. Every concurrent query I run
on it uses up one of the cores. So, if I am running 1 concurrent query I'm
using up the cpu of one of the cores. If I have 8 concurrent queries I'm
using up all of the cores.

Is this normal to have such a high CPU utilization. If not, what am I doing
wrong here. The only thing I have modified is the schema.xml file to
correspond to the documents I want to store. Everything else is just using
the default values for all the config files.

Thanks.


Re: solr query help alpha numeric and not

2009-11-05 Thread Avlesh Singh
Didn't the queries in my reply work?

Cheers
Avlesh

On Fri, Nov 6, 2009 at 4:16 AM, Joel Nylund  wrote:

> Hi yes its a string, in the case of a title, it can be anything, a letter a
> number, a symbol or a multibyte char etc.
>
> Any ideas if I wanted a query that was not a letter a-z or a number 0-9,
> given that its a string?
>
> thanks
> Joel
>
>
> On Nov 4, 2009, at 9:10 AM, Jonathan Hendler wrote:
>
>  Hi Joel,
>>
>> The ID is sent back as a string (instead of as an integer) in your
>> example. Could this be the cause?
>>
>> - Jonathan
>>
>> On Nov 4, 2009, at 9:08 AM, Joel Nylund wrote:
>>
>>  Hi, I have a field called firstLetterTitle, this field has 1 char, it can
>>> be anything, I need help with a few queries on this char:
>>>
>>> 1.) I want all NON ALPHA and NON numbers, so any char that is not A-Z or
>>> 0-9
>>>
>>> I tried:
>>>
>>>
>>> http://localhost:8983/solr/select?q=NOT%20firstLetterTitle:0%20TO%209%20AND%20NOT%20firstLetterTitle:A%20TO%20Z
>>>
>>> But I get back numeric results:
>>>
>>> 
>>> 9
>>> 23946447
>>> 
>>>
>>>
>>> 2.) I want all only Numerics:
>>>
>>> http://localhost:8983/solr/select?q=firstLetterTitle:0%20TO%209
>>>
>>> This seems to work but just checking if its the right way.
>>>
>>>
>>>
>>> 2.) I want all only English Letters:
>>>
>>> http://localhost:8983/solr/select?q=firstLetterTitle:A%20TO%20Z
>>>
>>> This seems to work but just checking if its the right way.
>>>
>>>
>>> thanks
>>> Joel
>>>
>>>
>>
>


Re: StreamingUpdateSolrServer - indexing process stops in a couple of hours

2009-11-05 Thread Yonik Seeley
Seems fixed.
https://issues.apache.org/jira/browse/SOLR-1543


-Yonik
http://www.lucidimagination.com



On Mon, Nov 2, 2009 at 6:05 AM, Shalin Shekhar Mangar
 wrote:
> I'm able to reproduce this issue consistently using JDK 1.6.0_16
>
> After an optimize is called, only one thread keeps adding documents and the
> rest wait on StreamingUpdateSolrServer line 196.
>
> On Sun, Oct 25, 2009 at 8:03 AM, Dadasheva, Olga > wrote:
>
>> I am using java 1.6.0_05
>>
>> To illustrate what is happening I wrote this test program that has 10
>> threads adding a collection of documents and one thread optimizing the index
>> every 10 sec.
>>
>> I am seeing that after the first optimize there is only one thread that
>> keeps adding documents. The other ones are locked.
>>
>> In the real code I ended up adding synchronized around add on optimize to
>> avoid this.
>>
>> public static void main(String[] args) {
>>
>>        final JettySolrRunner jetty = new JettySolrRunner("/solr", 8983 );
>>        try {
>>                jetty.start();
>>                // setup the server...
>>                String url = "http://localhost:8983/solr";;
>>                final StreamingUpdateSolrServer server = new
>> StreamingUpdateSolrServer( url, 2, 5 ) {
>>                         @Override
>>                        public void handleError(Throwable ex) {
>>                                 // do somethign...
>>                        }
>>                };
>>                server.setConnectionTimeout(1000);
>>                server.setDefaultMaxConnectionsPerHost(100);
>>                server.setMaxTotalConnections(100);
>>                int i = 0;
>>                while (i++ < 10) {
>>                        new Thread("add-thread"+i) {
>>                                public void run(){
>>                                        int j = 0;
>>                                        while (true) {
>>                                        try {
>>                                                List docs
>> = new ArrayList();
>>                                                for (int n = 0; n < 50; n++)
>> {
>>                                                    SolrInputDocument doc =
>> new SolrInputDocument();
>>                                                    String docID =
>> this.getName()+"_doc_"+j++;
>>                                                    doc.addField( "id",
>> docID);
>>                                                    doc.addField( "content",
>> "document_"+docID);
>>                                                    docs.add(doc);
>>                                                }
>>                                                server.add(docs);
>>
>>  System.out.println(this.getName()+" added "+docs.size()+" documents");
>>                                                Thread.sleep(100);
>>                                        } catch (Exception e) {
>>                                                e.printStackTrace();
>>
>>  System.err.println(this.getName()+" "+e.getLocalizedMessage());
>>                                                System.exit(0);
>>                                        }
>>                                }
>>                                }
>>                        }.start();
>>                }
>>
>>                new Thread("optimizer-thread") {
>>                        public void run(){
>>                                while (true) {
>>                                try {
>>                                        Thread.sleep(1);
>>                                        server.optimize();
>>                                        System.out.println(this.getName()+"
>> optimized");
>>                                } catch (Exception e) {
>>                                        e.printStackTrace();
>>                                        System.err.println("optimizer
>> "+e.getLocalizedMessage());
>>                                        System.exit(0);
>>                                }
>>                        }
>>                        }
>>                }.start();
>>
>>
>>        } catch (Exception e) {
>>                e.printStackTrace();
>>         }
>>
>> }
>> -Original Message-
>> From: Lance Norskog [mailto:goks...@gmail.com]
>> Sent: Tuesday, October 13, 2009 8:59 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: StreamingUpdateSolrServer - indexing process stops in a couple
>> of hours
>>
>> Which Java release is this?  There are known thread-blocking problems in
>> Java 1.5.
>>
>> Also, what sockets are used during this time? Try 'netstat -s | fgrep 8983'
>> (or your Solr URL port #) and watch the active, TIME_WAIT, CLOSE_WAIT
>> sockets build up. This may give a hint.
>>
>> On Tue, Oct 13, 2009 at 8:47 AM, Dadasheva, Olga <
>> olga_dadash...@harvard.edu> wrote:
>> > Hi,
>> >
>> > I am indexing documents using StreamingUpdateSolrServer. My 'setup'
>> > code is almost a copy of the junit test of the Solr trunk.
>> >
>> >   

Re: leading and trailing wildcard query

2009-11-05 Thread Andrzej Bialecki

A. Steven Anderson wrote:

No thoughts on this? Really!?

I would hate to admit to my Oracle DBE that Solr can't be customized to do a
common query that a relational database can do. :-(


On Wed, Nov 4, 2009 at 6:01 PM, A. Steven Anderson <
a.steven.ander...@gmail.com> wrote:


I've scoured the archives and JIRA , but the answer to my question is just
not clear to me.

With all the new Solr 1.4 features, is there any way  to do a leading and
trailing wildcard query on an *untokenized* field?

e.g. q=myfield:*abc* would return a doc with myfield=xxxabcxxx

Yes, I know how expensive such a query would be, but we have the user
requirement, nonetheless.

If not, any suggestions on how to implement a custom solution using Solr?
Using an external data structure?


You can use ReversedWildcardFilterFactory that creates additional tokens 
(in your case, a single additional token :) ) that is reversed, _and_ 
also triggers the setAllowLeadingWildcards in the QueryParser - won't 
help much with the performance though, due to the trailing wildcard in 
your original query. Please see the discussion in SOLR-1321 (this will 
be available in 1.4 but it should be easy to patch 1.3 to use it).


If you really need to support such queries efficiently you should 
implement a full permu-term indexing, i.e. a token filter that rotates 
tokens and adds all rotations (with a special marker to mark the 
beginning of the word), and a query plugin that detects such query terms 
and rotates the query term appropriately.



--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: field queries seem slow

2009-11-05 Thread Lance Norskog
Restarting Solr clears out all caching.

Doing a commit used to drop all of the caches for new requests, but it
no longer does this.

On Linux you can clear the kernel's disk buffer cache with a special
hook. You echo '1' into a /proc/something and this tells the kernel to
drop its caches. Sorry, don't remember the exact command.

On Thu, Nov 5, 2009 at 10:09 AM, Otis Gospodnetic
 wrote:
> Hi,
>
> There is no way that I know to clear Solr's caches (query, document, filter 
> caches).
> FIeldCache is a Lucene thing and it's also something you can't clear, as far 
> as I know.
>
> Slowness on start could be due to:
>
>  * OS not cached the index yet (would be the case if your Solr was down for a 
> while and its index got displaced from the OS buffers)
>  * sort query run for the first time, FieldCache not populated yet
>  * expensive query run for the first time, its results and hits not cached in 
> Solr caches
>
>  * ...
>
> Otis
>
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> - Original Message 
>> From: mike anderson 
>> To: solr-user@lucene.apache.org
>> Sent: Thu, November 5, 2009 11:34:59 AM
>> Subject: Re: field queries seem slow
>>
>> On production our servers are restarted very rarely (once a month). But this
>> raises a question, what does it take to clear the cache? On my benchmarking
>> platform I've been simply restarting the server as a method of starting
>> fresh. Is there a cache file I could delete to make sure I'm getting
>> unbiased results? Second of all, is there an internal cache for sort fields
>> separate from the cache for queries and filters which has settings found in
>> the solrconfig.xml file?
>>
>> I did a test as you suggested to determine if that type of query is always
>> slow or just when it starts up, it seems that it is only slow when it starts
>> up. However, it seems to be slow when it starts up with and without sorting.
>> (I'm still trying to figure out how to do good benchmarking with one
>> independent variable, so it's possible that this result is inconsistent)
>>
>> for reference, my query is looking like this (+/- sort field):
>>
>> http://10.0.20.174:8986/solr/select?mlt=false&rows=10&shards=localhost:8986/solr,localhost:8986/solr,localhost:8986/solr&q=abbrev_authors%3A%22Gallinger+S%22
>>
>> I like the suggestion on date resolution, we definitely don't need second
>> accuracy (which it is now), and in fact I think we'll just start stamping
>> documents with year/week and then sort by that.
>>
>>
>> thanks for all your help!
>>
>> Cheers,
>> Mike
>>
>>
>>
>> On Wed, Nov 4, 2009 at 2:07 PM, Erick Erickson wrote:
>>
>> > By readers, I meant your searchers. Perhaps you were shutting
>> > down your servers?
>> >
>> > The warming isn't to pre-load authors, it's to pre-populate, particularly,
>> > sort fields. Which are then kept in caches. There is considerable
>> > overhead in loading the sort field the first time you sort by it. So,
>> > my question was really based on the chance that "over the
>> > weekend" corresponded to "the first queries after the server
>> > restarted", or "the first query after the underlying index searchers
>> > were (re)opened.
>> >
>> > The real question comes down to whether the same form of query
>> > (i.e. searching for different values on the same fields with the
>> > same kind of sort) is slow all the time or just when things start up.
>> >
>> > How fine is the resolution for your dates? Assuming that the sorting
>> > is the issue, if you are storing dates in the millisecond range, that's
>> > probably 20M dates that have to be loaded to sort. You might
>> > want to think about a coarser resolution  if this has any relevance.
>> >
>> > HTH
>> > Erick
>> >
>> > On Wed, Nov 4, 2009 at 1:54 PM, mike anderson
>> > >wrote:
>> >
>> > > Erik, we are doing a sort by date first, and then by score. I'm not sure
>> > > what you mean by readers.
>> > >
>> > > Since we have nearly 6M authors attached to our 20M documents I'm not
>> > sure
>> > > that autowarming would help that much (especially since we have very
>> > little
>> > > overlap in what users are searching for). But maybe it would?
>> > >
>> > > Lance, I was just being a bit lazy. thanks though.
>> > >
>> > > -mike
>> > >
>> > >
>> > > On Mon, Nov 2, 2009 at 10:27 PM, Lance Norskog
>> > wrote:
>> > >
>> > > > This searches author:albert and (default text field): einstein. This
>> > > > may not be what you expect?
>> > > >
>> > > > On Mon, Nov 2, 2009 at 2:30 PM, Erick Erickson <
>> > erickerick...@gmail.com>
>> > > > wrote:
>> > > > > H, are you sorting? And has your readers been reopened? Is the
>> > > > > second query of that sort also slow? If the answer to this last
>> > > question
>> > > > is
>> > > > > "no",
>> > > > > have you tried some autowarming queries?
>> > > > >
>> > > > > Best
>> > > > > Erick
>> > > > >
>> > > > > On Mon, Nov 2, 2009 at 4:34 PM, mike anderson <
>>

Re: Newb Question about the TemplateTransformer

2009-11-05 Thread Lance Norskog
I think you need custom code for this. You can write plugins in Java,
or (in Java 1.6) any of the Java-based scripting languages like
JavaScript.

http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer

On Thu, Nov 5, 2009 at 8:54 AM, Mark Ellul  wrote:
> Hi Noble,
>
> Thanks for the response...
>
> My data config is below... Basically I have a List table and a Tweeter
> table...
>
> In the document I want a field called list_members which is a csv string of
> all the rows where tweeter has the particular list id.
>
> Do you understand what I mean?
>
> Regards
>
> Mark
>
> 
>                 url="jdbc:postgresql://api.tlists.com:5432/tlists"
>         user="tlists_dev"
>         password="foocarrot4"
>        readOnly="true" autoCommit="false"
> transactionIsolation="TRANSACTION_READ_COMMITTED"
> holdability="CLOSE_CURSORS_AT_COMMIT"
>         />
>        
>              transformer="TemplateTransformer">
>                 
>                
>                
>                 query=" select id from api_tweeter where " template=""
> -->
> 
> 
> 
> ~
>
>
> 2009/11/5 Noble Paul നോബിള്‍ नोब्ळ् 
>
>> there is no parent document or child document there is only one.
>> maybe you can paste your data-config
>>
>> On Thu, Nov 5, 2009 at 5:45 PM, Mark Ellul  wrote:
>> > Hi,
>> >
>> > I have read on the wiki that its possibile to concatenate values using a
>> > TemplateTransformer.
>> >
>> > Basically I have a Parent Table, and Child Table, I need to create a
>> > children field (in my Parent Document) which has all the ids of the
>> Parent's
>> > child rows in a comma separated string.
>> >
>> > Is this possible with the TemplateTransformer?
>> >
>> > if so can you please give me an snippet?
>> >
>> > Thanks and Regards
>> >
>> > Mark
>> >
>>
>>
>>
>> --
>> -
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>
>



-- 
Lance Norskog
goks...@gmail.com


Re: DIH full-import with fetchSize(Integer.MIN_VALUE) taking long time to start processing rows

2009-11-05 Thread Lance Norskog
Right, a view will not help here. It is just and SQL query embedded as
a virtual table, and is used to lift SQL syntax out of the DIH.

InnoDB is row-level except for auto-increment operations. Ow. You
could drop the indexes on the table. Each insert batch has to
recalculate all indexes, so this will cut the amount of database
contention.

On Thu, Nov 5, 2009 at 6:47 AM, Marc Sturlese  wrote:
>
> Hey there,
> I need this funcionality because I have an indexer continuously updating my
> index with delta import from a table. This table is fed by another process
> that is constantly runing too.
> With delta-import tehre's no problem but sometimes I need to execute
> full-import.
> I don't see the benefits of using a view. I think provably the best solution
> would be to create a master-slave mysql structure. The process that inserts
> would attack the master and the query would attack the slave. This provably
> would speed up the rows processing, isn't it?
>
> Avlesh Singh wrote:
>>
>>>
>>> Parallelly I have another process wich is doing lots of inserts to that
>>> table (I also had it before but with less number of inserts). Could this
>>> be
>>> causing some bloking that makes the query take that long? In case not,
>>> and
>>> advice what could make to take so long until I start see rows beeing
>>> processed?
>>>
>> Sounds scary! With innodb engine you are causing a table level lock with
>> each insert (assuming your table has an auto-increment column). With
>> frequent inserts you are of-course delaying the read time.
>> Why would you want to do this kind of an operation in the very first
>> place?
>> Can't you use views for indexing?
>>
>> Cheers
>> Avlesh
>>
>> On Thu, Nov 5, 2009 at 6:18 PM, Marc Sturlese
>> wrote:
>>
>>>
>>> I have been using fetchSize(Integer.MIN_VALUE) for a long time and it was
>>> working perfect until now. I use MySQL, java 1.6,
>>> mysql-connector-java-5.1.7-bin.jar and InoDB tables.
>>> Since a month ago when the query is executed it will take a long time
>>> untill
>>> it starts processing the results from the resultSet. The query matches
>>> about
>>> 2M rows. It use to take 10 min untill rows processing started. Now it’s
>>> taking about 2 hours.
>>> Parallelly I have another process wich is doing lots of inserts to that
>>> table (I also had it before but with less number of inserts). Could this
>>> be
>>> causing some bloking that makes the query take that long? In case not,
>>> and
>>> advice what could make to take so long until I start see rows beeing
>>> processed?
>>> Thanks in advance.
>>> --
>>> View this message in context:
>>> http://old.nabble.com/DIH-full-import-with-fetchSize%28Integer.MIN_VALUE%29-taking-long-time-to-start-processing-rows-tp26213642p26213642.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: 
> http://old.nabble.com/DIH-full-import-with-fetchSize%28Integer.MIN_VALUE%29-taking-long-time-to-start-processing-rows-tp26213642p26215730.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Regarding to ramBufferSizeMB and mergeFactor

2009-11-05 Thread Attachot Tuangphon
Hi, Jeff Newburn

Thank you for you good explanations. That helps me a lot.

Attachot Tuangphon

On 09/11/06 0:36, "Jeff Newburn"  wrote:

> If I am correct the two are related but not dependent on each other.  Merge
> factor is used to determine how many segment files exist on disk where as
> the ram buffer is to determine how often the flush to disk will happen.  So
> you should be able to set them independently.
> -- 
> Jeff Newburn
> Software Engineer, Zappos.com
> jnewb...@zappos.com - 702-943-7562
> 
> 
>> From: ATTACHOT TUANGPHON 
>> Reply-To: 
>> Date: Thu, 05 Nov 2009 22:50:47 +0900
>> To: 
>> Subject: Regarding to ramBufferSizeMB and mergeFactor
>> 
>> Hello, everybody
>> 
>> I am a new Solr user.
>> I have a question about ramBufferSizeMB and mergeFactor.
>> I would like to know if I increase number of mergeFactor, do I have to
>> change number of ramBufferSizeMB too?
>> 
>> For example: 
>> I set mergeFactor as 30 and ramBufferSizeMB as 50
>> I would like to change mergeFactor to 50 , do I have to increase
>> ramBufferSizeMB?
>> 




Re: leading and trailing wildcard query

2009-11-05 Thread A. Steven Anderson
> Not sure what version it was supported from, but we're on 1.3.


Really!? Great answer!

Thanks!
-- 
A. Steven Anderson


RE: leading and trailing wildcard query

2009-11-05 Thread Bernadette Houghton
Not sure what version it was supported from, but we're on 1.3.
bern

-Original Message-
From: A. Steven Anderson [mailto:a.steven.ander...@gmail.com] 
Sent: Friday, 6 November 2009 10:25 AM
To: solr-user@lucene.apache.org
Subject: Re: leading and trailing wildcard query

> Hi Steve, a query such as *abc* would need the NGramFilterFactor, hence the
> doubleedgytext, and would be retrievable by a query such as contains:abc.
> Note that you can set the max and minimum size of strings that get indexed.
>

Excellent!  Just to clarify though, NGramFilterFactor is a Solr 1.4 feature
only, correct?

-- 
A. Steven Anderson


Re: Set MMap in Solr

2009-11-05 Thread ba ba
Thanks for the help.

-Brad Anderson

2009/11/5 Otis Gospodnetic 

> To use MMapDirectory, invoke Java with the System property
> org.apache.lucene.FSDirectory.class set to
> org.apache.lucene.store.MMapDirectory. This will cause
> FSDirectory.getDirectory(File,boolean) to return instances of this class.
>
> So, start your servlet container with
> -Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.MMapDirectory
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> - Original Message 
> > From: ba ba 
> > To: solr-user@lucene.apache.org
> > Sent: Thu, November 5, 2009 2:55:42 PM
> > Subject: Set MMap in Solr
> >
> > Hi,
> >
> > I'm trying to set my default directory to MMap. I saw that this is done
> by
> > specifying here
> >
> > A DirectoryProvider plugin can be configured in solrconfig.xml with the
> > following XML:
> >
> >
> >
> >
> > in solrconfig.xml.
> >
> > This did not work for me when I put in the MMapDirectory class name.
> >
> > I got this information from here
> >
> http://issues.apache.org/jira/browse/SOLR-465?focusedCommentId=12715282#action_12715282
> >
> > I'm using the latest nightly build.
> >
> > If anyone knows how to configure solr to use MMap, please let me know. I
> > would greatly appreciate it.
> >
> > Thanks.
>
>


Re: leading and trailing wildcard query

2009-11-05 Thread A. Steven Anderson
> Note that N-grams are limited to specific string lengths. I presume that
> you need to search for arbitrary strings, not just three-letter ones.
>

Understood, but that is a limitation that we can live with.

Thanks!
-- 
A. Steven Anderson


Re: leading and trailing wildcard query

2009-11-05 Thread A. Steven Anderson
> Ah. With that restriction, it is impossible.
> If it is OK to pay Lucid to make a one-line change, you might be able to do
> it. Otherwise, get ready to spend a lot of money for a search engine.
>

Well, now that Lucid is getting In-Q-Tel $$$, they will soon learn that
officially releases are all that matters, and 12-18 month release cycles are
not acceptable. ;-)

-- 
A. Steven Anderson


Re: leading and trailing wildcard query

2009-11-05 Thread Walter Underwood
Note that N-grams are limited to specific string lengths. I presume  
that you need to search for arbitrary strings, not just three-letter  
ones.


wunder

On Nov 5, 2009, at 3:23 PM, Bernadette Houghton wrote:

Hi Steve, a query such as *abc* would need the NGramFilterFactor,  
hence the doubleedgytext, and would be retrievable by a query such  
as contains:abc. Note that you can set the max and minimum size of  
strings that get indexed.


bern

-Original Message-
From: A. Steven Anderson [mailto:a.steven.ander...@gmail.com]
Sent: Friday, 6 November 2009 10:08 AM
To: solr-user@lucene.apache.org
Subject: Re: leading and trailing wildcard query

Thanks for the solution, but could you elaborate on how it would find
something like *abc* in a field that contains abc.

Steve

On Thu, Nov 5, 2009 at 5:25 PM, Bernadette Houghton <
bernadette.hough...@deakin.edu.au> wrote:


I've just set up something similar (much thanks to Avesh!)-



 
 
 


 
 





 
 
 maxGramSize="25"

/>


 
 


.
.
 stored="false"

multiValued="true"/>
 
.
.
 
 
 
 

 
 
 
 

bern






Re: leading and trailing wildcard query

2009-11-05 Thread A. Steven Anderson
> Hi Steve, a query such as *abc* would need the NGramFilterFactor, hence the
> doubleedgytext, and would be retrievable by a query such as contains:abc.
> Note that you can set the max and minimum size of strings that get indexed.
>

Excellent!  Just to clarify though, NGramFilterFactor is a Solr 1.4 feature
only, correct?

-- 
A. Steven Anderson


Re: leading and trailing wildcard query

2009-11-05 Thread Walter Underwood

Ah. With that restriction, it is impossible.

If it is OK to pay Lucid to make a one-line change, you might be able  
to do it. Otherwise, get ready to spend a lot of money for a search  
engine.


wunder

On Nov 5, 2009, at 3:18 PM, A. Steven Anderson wrote:

Unfortunately, we can only use official releases (not even  
snapshots) since

it's a government-related project.

--
A. Steven Anderson




RE: leading and trailing wildcard query

2009-11-05 Thread Bernadette Houghton
Hi Steve, a query such as *abc* would need the NGramFilterFactor, hence the 
doubleedgytext, and would be retrievable by a query such as contains:abc. Note 
that you can set the max and minimum size of strings that get indexed.

bern

-Original Message-
From: A. Steven Anderson [mailto:a.steven.ander...@gmail.com] 
Sent: Friday, 6 November 2009 10:08 AM
To: solr-user@lucene.apache.org
Subject: Re: leading and trailing wildcard query

Thanks for the solution, but could you elaborate on how it would find
something like *abc* in a field that contains abc.

Steve

On Thu, Nov 5, 2009 at 5:25 PM, Bernadette Houghton <
bernadette.hough...@deakin.edu.au> wrote:

> I've just set up something similar (much thanks to Avesh!)-
>
>  positionIncrementGap="100">
>  
>   
>   
>maxGramSize="25" />
>  
>  
>   
>   
>  
> 
>
>  positionIncrementGap="100">
>  
>   
>   
>/>
>  
>  
>   
>   
>  
> 
> .
> .
>multiValued="true"/>
>stored="false" multiValued="true"/>
> .
> .
>   
>   
>   
>   
>
>   
>   
>   
>   
>
> bern


Re: leading and trailing wildcard query

2009-11-05 Thread A. Steven Anderson
> Doesn't it work to call SolrQueryParser.setAllowLeadingWildcard?


Good question.  Anyone?


> It can be really slow, what an RDBMS person would call a full table scan.


Understood.


> There is an open bug to make that settable in a config file, but this is a
> pretty tiny change to the source.
>   http://issues.apache.org/jira/browse/SOLR-218
>

Unfortunately, we can only use official releases (not even snapshots) since
it's a government-related project.

-- 
A. Steven Anderson


Re: leading and trailing wildcard query

2009-11-05 Thread Erick Erickson
Because that is the semantics of Solr/Lucene wildcard syntax. * stands for
"any number of any character". Basically, it enumerates all the terms in the
field for all the documents and assembles a list of all of them that contain
the
substring "abc" and uses that as one of the clauses of your search...

Best
Erick

On Thu, Nov 5, 2009 at 6:07 PM, A. Steven Anderson <
a.steven.ander...@gmail.com> wrote:

> Thanks for the solution, but could you elaborate on how it would find
> something like *abc* in a field that contains abc.
>
> Steve
>
> On Thu, Nov 5, 2009 at 5:25 PM, Bernadette Houghton <
> bernadette.hough...@deakin.edu.au> wrote:
>
> > I've just set up something similar (much thanks to Avesh!)-
> >
> >  > positionIncrementGap="100">
> >  
> >   
> >   
> >> maxGramSize="25" />
> >  
> >  
> >   
> >   
> >  
> > 
> >
> >  > positionIncrementGap="100">
> >  
> >   
> >   
> >maxGramSize="25"
> > />
> >  
> >  
> >   
> >   
> >  
> > 
> > .
> > .
> >> multiValued="true"/>
> >> stored="false" multiValued="true"/>
> > .
> > .
> >   
> >   
> >   
> >   
> >
> >   
> >   
> >   
> >   
> >
> > bern
>


Re: leading and trailing wildcard query

2009-11-05 Thread A. Steven Anderson
Thanks for the solution, but could you elaborate on how it would find
something like *abc* in a field that contains abc.

Steve

On Thu, Nov 5, 2009 at 5:25 PM, Bernadette Houghton <
bernadette.hough...@deakin.edu.au> wrote:

> I've just set up something similar (much thanks to Avesh!)-
>
>  positionIncrementGap="100">
>  
>   
>   
>maxGramSize="25" />
>  
>  
>   
>   
>  
> 
>
>  positionIncrementGap="100">
>  
>   
>   
>/>
>  
>  
>   
>   
>  
> 
> .
> .
>multiValued="true"/>
>stored="false" multiValued="true"/>
> .
> .
>   
>   
>   
>   
>
>   
>   
>   
>   
>
> bern


Re: solr query help alpha numeric and not

2009-11-05 Thread Joel Nylund
Hi yes its a string, in the case of a title, it can be anything, a  
letter a number, a symbol or a multibyte char etc.


Any ideas if I wanted a query that was not a letter a-z or a number  
0-9, given that its a string?


thanks
Joel

On Nov 4, 2009, at 9:10 AM, Jonathan Hendler wrote:


Hi Joel,

The ID is sent back as a string (instead of as an integer) in your  
example. Could this be the cause?


- Jonathan

On Nov 4, 2009, at 9:08 AM, Joel Nylund wrote:

Hi, I have a field called firstLetterTitle, this field has 1 char,  
it can be anything, I need help with a few queries on this char:


1.) I want all NON ALPHA and NON numbers, so any char that is not A- 
Z or 0-9


I tried:

http://localhost:8983/solr/select?q=NOT%20firstLetterTitle:0%20TO%209%20AND%20NOT%20firstLetterTitle:A%20TO%20Z

But I get back numeric results:


9
23946447



2.) I want all only Numerics:

http://localhost:8983/solr/select?q=firstLetterTitle:0%20TO%209

This seems to work but just checking if its the right way.



2.) I want all only English Letters:

http://localhost:8983/solr/select?q=firstLetterTitle:A%20TO%20Z

This seems to work but just checking if its the right way.


thanks
Joel







MoreLikeThis and filtering/restricting on "target" fields

2009-11-05 Thread Cody Caughlan
I am trying to use MoreLikeThis (both the component and handler,
trying combinations) and I would like to give it an input document
reference which has a "source" field to analyze and then get back
other documents which have a given field that is used by MLT.

My dataset is composed of documents like:

# Doc 1
id:Article:99
type_s:Article
body_t: the body of the article...

# Doc 2
id:Article:646
types_s:Article
body_t: another article...

# Doc 3
id:Community:44
type_s:Community
description_t: description of this community...

# Doc 4
id:Community:34874
type_s:Community
description_t: another description

# Doc 5
id:BlogPost:2384
type_s:BlogPost
body_t: contents of some blog post

So I would like to say, "given an article (e.g. id:"Article:99" which
has a field "body_t" that should be analyze), give more related
Communities, and you will want to search on "description_t" for your
analysis".'

When I run a basic query like:

(using raw URL values for clarity, but they are encoded in reality)

http://localhost:9007/solr/mlt?q=id:WikiArticle:948&mlt.fl=body_t

then I get back a ton of other articles. Which is fine if my target
type was Article.

So how I can I say "search on field A for your analysis of the input
document, but for related terms use field B, filtered by type_s"

It seems that I can really only specify one field via mlt.fl

I have tried using MLT as a search component so that it has access to
filter queries (via fq) but I cannot seem to get it to give me any
data other than more of the same, that is, I can get a ton of Articles
back but not other "content types".

Am I just trying to do too much?

Thanks
/Cody


Re: leading and trailing wildcard query

2009-11-05 Thread Walter Underwood

Doesn't it work to call SolrQueryParser.setAllowLeadingWildcard?

It can be really slow, what an RDBMS person would call a full table  
scan.


There is an open bug to make that settable in a config file, but this  
is a pretty tiny change to the source.


   http://issues.apache.org/jira/browse/SOLR-218

wunder

On Nov 5, 2009, at 2:13 PM, Otis Gospodnetic wrote:

The guilt trick is not the best thing to try on public mailing  
lists. :)


The first thing that popped to my mind is to use 2 fields, where the  
second one contains the desrever string of the first one.
The second idea is to use n-grams (if it's OK to tokenize), more  
specifically edge n-grams.


Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 

From: A. Steven Anderson 
To: solr-user@lucene.apache.org
Sent: Thu, November 5, 2009 3:04:32 PM
Subject: Re: leading and trailing wildcard query

No thoughts on this? Really!?

I would hate to admit to my Oracle DBE that Solr can't be  
customized to do a

common query that a relational database can do. :-(


On Wed, Nov 4, 2009 at 6:01 PM, A. Steven Anderson <
a.steven.ander...@gmail.com> wrote:

I've scoured the archives and JIRA , but the answer to my question  
is just

not clear to me.

With all the new Solr 1.4 features, is there any way  to do a  
leading and

trailing wildcard query on an *untokenized* field?

e.g. q=myfield:*abc* would return a doc with myfield=xxxabcxxx

Yes, I know how expensive such a query would be, but we have the  
user

requirement, nonetheless.

If not, any suggestions on how to implement a custom solution  
using Solr?

Using an external data structure?



--
A. Steven Anderson






RE: leading and trailing wildcard query

2009-11-05 Thread Bernadette Houghton
I've just set up something similar (much thanks to Avesh!)-


 
   
   

 
 
   
   
 



 
   
   
   
 
 
   
   
 

.
.
   
   
.
.
   
   
   
   
   
   
   
   
   

bern

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Friday, 6 November 2009 9:13 AM
To: solr-user@lucene.apache.org
Subject: Re: leading and trailing wildcard query

The guilt trick is not the best thing to try on public mailing lists. :)

The first thing that popped to my mind is to use 2 fields, where the second one 
contains the desrever string of the first one.
The second idea is to use n-grams (if it's OK to tokenize), more specifically 
edge n-grams.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: A. Steven Anderson 
> To: solr-user@lucene.apache.org
> Sent: Thu, November 5, 2009 3:04:32 PM
> Subject: Re: leading and trailing wildcard query
> 
> No thoughts on this? Really!?
> 
> I would hate to admit to my Oracle DBE that Solr can't be customized to do a
> common query that a relational database can do. :-(
> 
> 
> On Wed, Nov 4, 2009 at 6:01 PM, A. Steven Anderson <
> a.steven.ander...@gmail.com> wrote:
> 
> > I've scoured the archives and JIRA , but the answer to my question is just
> > not clear to me.
> >
> > With all the new Solr 1.4 features, is there any way  to do a leading and
> > trailing wildcard query on an *untokenized* field?
> >
> > e.g. q=myfield:*abc* would return a doc with myfield=xxxabcxxx
> >
> > Yes, I know how expensive such a query would be, but we have the user
> > requirement, nonetheless.
> >
> > If not, any suggestions on how to implement a custom solution using Solr?
> > Using an external data structure?
> >
> >
> -- 
> A. Steven Anderson



Re: leading and trailing wildcard query

2009-11-05 Thread A. Steven Anderson
>
> The guilt trick is not the best thing to try on public mailing lists. :)
>

Point taken, although not my intention.  I guess I have been spoiled by
quick replies and was getting to think it was a stupid question.

Plus, I'm literally gonna get trash talk from my Oracle DBE if I can't make
this work. ;-)

We've basically relegated Oracle to handling ingest from which we index Solr
and provide all search features.  I'd hate to have to succumb to using
Oracle to service this one special query.


> The first thing that popped to my mind is to use 2 fields, where the second
> one contains the desrever string of the first one.
>

Please elaborate. What do you mean by *desrever* string?


> The second idea is to use n-grams (if it's OK to tokenize), more
> specifically edge n-grams.
>

Well, that's the problem.  The field may have non-Latin characters that may
not have whitespace nor punctuation.


-- 
A. Steven Anderson


Re: Set MMap in Solr

2009-11-05 Thread Otis Gospodnetic
To use MMapDirectory, invoke Java with the System property 
org.apache.lucene.FSDirectory.class set to 
org.apache.lucene.store.MMapDirectory. This will cause 
FSDirectory.getDirectory(File,boolean) to return instances of this class. 

So, start your servlet container with 
-Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.MMapDirectory

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: ba ba 
> To: solr-user@lucene.apache.org
> Sent: Thu, November 5, 2009 2:55:42 PM
> Subject: Set MMap in Solr
> 
> Hi,
> 
> I'm trying to set my default directory to MMap. I saw that this is done by
> specifying here
> 
> A DirectoryProvider plugin can be configured in solrconfig.xml with the
> following XML:
> 
> 
> 
> 
> in solrconfig.xml.
> 
> This did not work for me when I put in the MMapDirectory class name.
> 
> I got this information from here
> http://issues.apache.org/jira/browse/SOLR-465?focusedCommentId=12715282#action_12715282
> 
> I'm using the latest nightly build.
> 
> If anyone knows how to configure solr to use MMap, please let me know. I
> would greatly appreciate it.
> 
> Thanks.



Re: leading and trailing wildcard query

2009-11-05 Thread Otis Gospodnetic
The guilt trick is not the best thing to try on public mailing lists. :)

The first thing that popped to my mind is to use 2 fields, where the second one 
contains the desrever string of the first one.
The second idea is to use n-grams (if it's OK to tokenize), more specifically 
edge n-grams.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: A. Steven Anderson 
> To: solr-user@lucene.apache.org
> Sent: Thu, November 5, 2009 3:04:32 PM
> Subject: Re: leading and trailing wildcard query
> 
> No thoughts on this? Really!?
> 
> I would hate to admit to my Oracle DBE that Solr can't be customized to do a
> common query that a relational database can do. :-(
> 
> 
> On Wed, Nov 4, 2009 at 6:01 PM, A. Steven Anderson <
> a.steven.ander...@gmail.com> wrote:
> 
> > I've scoured the archives and JIRA , but the answer to my question is just
> > not clear to me.
> >
> > With all the new Solr 1.4 features, is there any way  to do a leading and
> > trailing wildcard query on an *untokenized* field?
> >
> > e.g. q=myfield:*abc* would return a doc with myfield=xxxabcxxx
> >
> > Yes, I know how expensive such a query would be, but we have the user
> > requirement, nonetheless.
> >
> > If not, any suggestions on how to implement a custom solution using Solr?
> > Using an external data structure?
> >
> >
> -- 
> A. Steven Anderson



Re: how to use ajax-solr - example?

2009-11-05 Thread Lance Norskog
google "applying a diff patch"

http://www.linuxjournal.com/article/1237 looks like a good start.



On Thu, Nov 5, 2009 at 6:39 AM, Joel Nylund  wrote:
> this is exactly what I was looking for, any directions how to install? I
> dont really understand how to use a .patch file.
>
> thanks
> Joel
>
> On Nov 4, 2009, at 9:16 PM, Lance Norskog wrote:
>
>> http://issues.apache.org/jira/browse/SOLR-1163
>>
>> This is a really nice index browser.
>>
>> On Wed, Nov 4, 2009 at 12:51 PM, Joel Nylund  wrote:
>>>
>>> Hi Israel,
>>>
>>> I agree the idea of adding a scripting language in between is good, but I
>>> want something simple I can easily test my queries with data and scroll
>>> through the results. I have been using the browser and getting xml for
>>> now,
>>> but would like to save my queries in a simple html page and format the
>>> data.
>>>
>>> I figured this is something I can throw together in a few hours, but I
>>> also
>>> figured someone would have already done the work.
>>>
>>> thanks
>>> Joel
>>>
>>> On Nov 4, 2009, at 2:02 PM, Israel Ekpo wrote:
>>>
 On Wed, Nov 4, 2009 at 10:48 AM, Joel Nylund  wrote:

> Hi, I looked at the documentation and I have no idea how to get
> started?
> Can someone point me to or show me an example of how to send a query to
> a
> solr server and paginate through the results using ajax-solr.
>
> I would glady write a blog tutorial on how to do this if someone can
> get
> me
> started.
>
> I dont know jquery but have used prototype & scriptaculous.
>
> thanks
> Joel
>
>

 Joel,

 It will be best if you use a scripting language between Solr and
 JavaScript

 This is becasue sending data only between JavaScript and Solr will limit
 you
 to only one domain name.

 However, if you are using a scripting language between JavaScript and
 Solr
 you can use the scripting language to retrieve the request parameters
 from
 JavaScript and then same them to Solr with the response writer set to
 json.

 This will cause Solr to send the response in JSON format which the
 scripting
 language can pass on to JavaScript.

 This example here will cause Solr to return the response in JSON.

 http://example.com:8443/solr/select?q=searchkeyword&wt=json


 --
 "Good Enough" is not good enough.
 To give anything less than your best is to sacrifice the gift.
 Quality First. Measure Twice. Cut Once.
>>>
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Sending file to Solr via HTTP POST

2009-11-05 Thread Jay Hill
Here is a brief example of how to use SolrJ with the
ExtractingRequestHandler:

  ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest("/update/extract");
  req.addFile(fileToIndex);
  req.setParam("literal.id", getId(fileToIndex));
  req.setParam("literal.hostname", getHostname());
  req.setParam("literal.filename", fileToIndex.getName());

  try {
getSolrServer().request(req);
  } catch (SolrServerException e) {
e.printStackTrace();
  }

You'll need a request handler configured in solrconfig.xml:

  
  text
  true
  ignored_

  

Note that the example also shows how to use the "literal.*" parameter to add
metadata fields of your choice to the document.

Hope that helps get you started.

-Jay
http://www.lucidimagination.com


On Tue, Nov 3, 2009 at 10:38 PM, Caroline Tan wrote:

> Hi,
> From the Solr wiki on ExtractingRequestHandler tutorial, when it comes to
> the part to post file to Solr, it always uses the curl command, e.g.
> curl '
> http://localhost:8983/*solr*/update/extract?literal.id=doc1&commit=true'
> -F myfi...@tutorial.html
>
> I have never used curl and i was thinking is  there any replacement to such
> method?
>
> Is there any API that i can use to achieve the same thing in a java
> project without relying on CURL? Does SolrJ have such method? Thanks
>
> ~caroLine
>


Re: leading and trailing wildcard query

2009-11-05 Thread A. Steven Anderson
No thoughts on this? Really!?

I would hate to admit to my Oracle DBE that Solr can't be customized to do a
common query that a relational database can do. :-(


On Wed, Nov 4, 2009 at 6:01 PM, A. Steven Anderson <
a.steven.ander...@gmail.com> wrote:

> I've scoured the archives and JIRA , but the answer to my question is just
> not clear to me.
>
> With all the new Solr 1.4 features, is there any way  to do a leading and
> trailing wildcard query on an *untokenized* field?
>
> e.g. q=myfield:*abc* would return a doc with myfield=xxxabcxxx
>
> Yes, I know how expensive such a query would be, but we have the user
> requirement, nonetheless.
>
> If not, any suggestions on how to implement a custom solution using Solr?
> Using an external data structure?
>
>
-- 
A. Steven Anderson


Set MMap in Solr

2009-11-05 Thread ba ba
Hi,

I'm trying to set my default directory to MMap. I saw that this is done by
specifying here

A DirectoryProvider plugin can be configured in solrconfig.xml with the
following XML:




in solrconfig.xml.

This did not work for me when I put in the MMapDirectory class name.

I got this information from here
http://issues.apache.org/jira/browse/SOLR-465?focusedCommentId=12715282#action_12715282

I'm using the latest nightly build.

If anyone knows how to configure solr to use MMap, please let me know. I
would greatly appreciate it.

Thanks.