Re: Connection Pool

2010-05-18 Thread Lance Norskog
Do multiple calls with your client program. So,
curl _file1_ &
curl _file2_ &
curl _file3_ &
curl _file4_ &
wait; wait; wait; wait


On Sun, May 16, 2010 at 8:20 AM, Monmohan Singh  wrote:
> Sorry for hijacking the thread, but I have an additional question
> Is there a way to achieve similar performance (SUSS like) when targeting
> extract request handler (/update/extract)?
> I guess one way can be to extract content on the client side and then use
> SUSS to send update request but then extraction needs to be taken care of
> locally in an asynchronous/batch manner.
> Regards
> Monmohan
>
> On Sun, May 16, 2010 at 5:19 AM, Lance Norskog  wrote:
>
>> Connection spooling is specified by the underlying apache commons
>> connection manager when you create the Server.
>>
>> The SUSS does socket pooling by default and is the preferred way to do
>> concurrent indexing. There are some quirks in the Server
>> implementation set, and SUSS avoids them. Unless you are willing to
>> root around in the SolrJ Server code and understand exactly how it
>> works, stay with the SUSS.
>>
>> On Fri, May 14, 2010 at 6:44 AM, gabriele renzi  wrote:
>> > On Fri, May 14, 2010 at 3:35 PM, Anderson vasconcelos
>> >  wrote:
>> >> Hi
>> >> I wanna to know if has any connection pool client to manage the
>> connections
>> >> with solr. In my system, we have a lot of concurrency index request. I
>> cant
>> >> shared my  connection, i need to create one per transaction. But if i
>> create
>> >> one per transaction, i think the performance will down.
>> >>
>> >> How you resolve this problem?
>> >
>> > The commonsHttpSolrServer class does connection pooling, and IIRC also
>> > the StreamingUpdateSolrServer.
>> >
>> >
>> >
>> > --
>> > blog en: http://www.riffraff.info
>> > blog it: http://riffraff.blogsome.com
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>



-- 
Lance Norskog
goks...@gmail.com


Deduplication

2010-05-18 Thread Blargy

Basically for some uses cases I would like to show duplicates for other I
wanted them ignored.

If I have overwriteDupes=false and I just create the dedup hash how can I
query for only unique hash values... ie something like a SQL group by. 

Thanks

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Deduplication-tp828016p828016.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Long startup phase

2010-05-18 Thread Lance Norskog
There are no .pyc files in Solr. It's an all-Java app, no Python.

Run 'jps' to get a list of Java processes running. Then use 'jhat' or
'jstat' to examine the program.

'netstat -an | fgrep :8983' will give you a list of all sockets in use
by Solr, both client and server.

On Tue, May 18, 2010 at 7:29 AM, Andreas Jung  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Hi there,
>
> trying to deploy Solr 1.4/JDK 1.6/CentOS Linux 64bit
> on a new production server.
>
> Starting Solr takes very long on this machine. In particular
> it seems to hang for a minute or two showing only this on the
> console:
>
> [...@db01 backend_buildout]$ bin/solr-instance fg
> 2010-05-18 16:22:51.507::INFO:  Logging to STDERR via
> org.mortbay.log.StdErrLog
> 2010-05-18 16:22:51.585::INFO:  jetty-6.1.3
>
> Using strace shows that the process since to be waiting aka hanging
> in the wait4() call below. Any idea?
>
> Andreas
>
> open("/usr/local/lib/python2.6/plat-linux2/cStringIO.py", O_RDONLY) = -1
> ENOENT (No such file or directory)
> open("/usr/local/lib/python2.6/plat-linux2/cStringIO.pyc", O_RDONLY) =
> - -1 ENOENT (No such file or directory)
> stat("/usr/local/lib/python2.6/lib-tk/cStringIO", 0x7fff292873f0) = -1
> ENOENT (No such file or directory)
> open("/usr/local/lib/python2.6/lib-tk/cStringIO.so", O_RDONLY) = -1
> ENOENT (No such file or directory)
> open("/usr/local/lib/python2.6/lib-tk/cStringIOmodule.so", O_RDONLY) =
> - -1 ENOENT (No such file or directory)
> open("/usr/local/lib/python2.6/lib-tk/cStringIO.py", O_RDONLY) = -1
> ENOENT (No such file or directory)
> open("/usr/local/lib/python2.6/lib-tk/cStringIO.pyc", O_RDONLY) = -1
> ENOENT (No such file or directory)
> stat("/usr/local/lib/python2.6/lib-old/cStringIO", 0x7fff292873f0) = -1
> ENOENT (No such file or directory)
> open("/usr/local/lib/python2.6/lib-old/cStringIO.so", O_RDONLY) = -1
> ENOENT (No such file or directory)
> open("/usr/local/lib/python2.6/lib-old/cStringIOmodule.so", O_RDONLY) =
> - -1 ENOENT (No such file or directory)
> open("/usr/local/lib/python2.6/lib-old/cStringIO.py", O_RDONLY) = -1
> ENOENT (No such file or directory)
> open("/usr/local/lib/python2.6/lib-old/cStringIO.pyc", O_RDONLY) = -1
> ENOENT (No such file or directory)
> stat("/usr/local/lib/python2.6/lib-dynload/cStringIO", 0x7fff292873f0) =
> - -1 ENOENT (No such file or directory)
> open("/usr/local/lib/python2.6/lib-dynload/cStringIO.so", O_RDONLY) = 5
> fstat(5, {st_mode=S_IFREG|0755, st_size=50484, ...}) = 0
> open("/usr/local/lib/python2.6/lib-dynload/cStringIO.so", O_RDONLY) = 6
> read(6,
> "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\31\0\0\0\0\0\0"...,
> 832) = 832
> fstat(6, {st_mode=S_IFREG|0755, st_size=50484, ...}) = 0
> mmap(NULL, 2114584, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 6,
> 0) = 0x2acde2995000
> mprotect(0x2acde2999000, 2093056, PROT_NONE) = 0
> mmap(0x2acde2b98000, 8192, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 6, 0x3000) = 0x2acde2b98000
> close(6)                                = 0
> close(5)                                = 0
> close(4)                                = 0
> getrlimit(RLIMIT_NOFILE, {rlim_cur=1, rlim_max=1}) = 0
> close(3)                                = 0
> pipe([3, 4])                            = 0
> fcntl(4, F_GETFD)                       = 0
> fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
> clone(child_stack=0,
> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
> child_tidptr=0x2acdde8a9a50) = 21603
> close(4)                                = 0
> mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0) = 0x2acde2b9a000
> read(3, "", 1048576)                    = 0
> mremap(0x2acde2b9a000, 1052672, 4096, MREMAP_MAYMOVE) = 0x2acde2b9a000
> close(3)                                = 0
> munmap(0x2acde2b9a000, 4096)            = 0
> wait4(21603, 0x7fff2928f6f4, WNOHANG, NULL) = 0
> wait4(21603, 2010-05-18 16:21:04.731::INFO:
> 
> Logging to STDERR via org.mortbay.log.StdErrLog
> 2010-05-18 16:21:04.811::INFO:  jetty-6.1.3
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.10 (Darwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAkvypD8ACgkQCJIWIbr9KYz5BwCfSUgefA5fWco2grsIC4nE+4O9
> neYAoJpx5J/s6wu89CG5TdKYCZqts4u1
> =IXgz
> -END PGP SIGNATURE-
>



-- 
Lance Norskog
goks...@gmail.com


Re: Embedded Server, Caching, Stats page updates

2010-05-18 Thread Chris Hostetter

: I just switched from using CommonHttpSolrServer to EmbeddedSolrServer and
: the performance surprisingly deteriorated. I was expecting an improvement so
: in my confusion i went to the stats page and noticed that the caches were no
: longer getting hit. The embedded server however should still use
: IndexSearcher from Lucene (which is what the caches are supposed to be
: related to).

The way you phrased that paragraph makes me think that one of us doesn't 
understand what exactly you did when you "switched" ...

When using CommonsHttpSolrServer in some application you write, you are 
talk to a remote server that is running Solr.  when you use 
EmbeddedSolrServer, you are running solr directly within the application 
that you are writing.

Now for starters: if the remote server you were running solr on is more 
powerful then the local machine you are running your java application on, 
that alone could explain some performance differences (likewise for JVM 
settings).

Most importantly: when running solr embedded in your application, there is 
no "stats.jsp" page for you to look at -- because solr is no longer 
running in a servlet container.  so if you are seeing stats on your 
solr server that say your caches aren't being hit, the reason is because 
the server isn't being hit at all.

: Is there some kind of property that needs to be added or adjusted for
: embedded server to use cache? Should I create my own cache and wipe the rest

When running an embedded solr server, the filterCache and queryResultCache 
will still be used.  the settings in the solrconfig.xml you specify when 
initializing the SolrCore will be honored.  you can see use JMX to monitor 
those cache hit rates (assuming you have JMX enabled for your application, 
and the appropriate setting is in your solrconfig.xml)


-Hoss



Embedded Server, Caching, Stats page updates

2010-05-18 Thread Antoniya Statelova
I just switched from using CommonHttpSolrServer to EmbeddedSolrServer and
the performance surprisingly deteriorated. I was expecting an improvement so
in my confusion i went to the stats page and noticed that the caches were no
longer getting hit. The embedded server however should still use
IndexSearcher from Lucene (which is what the caches are supposed to be
related to).

Is there some kind of property that needs to be added or adjusted for
embedded server to use cache? Should I create my own cache and wipe the rest
out entirely? Should I remove the httpcache from the configuration since
i'll no longer be accessing the service remotely? How accurate is the stats
page and is the error actually coming from it rather than the actual
backend?

Thank you beforehand,
Tony


Re: Merge Search for Suggestion. Keywords and Products ?!

2010-05-18 Thread Chris Hostetter
: i searching for a way to merge my two different autocompletion in one
: request. 
: thats what i want:

you could copyField your two differnet ields into one destination, and 
then use a single strategy on that new field, but ...

: - suggestion for Product Names (EdgNGram)
: - suggestion for keywords. (TermsComponent with Shingle)

...to get two differnet approaches like that, you'll need a more complex 
solution.  one example is to configure a special core just for 
autosuggest, where each "document" corrisponds to a specific "term" you 
want to suggest, and then other fields contain the EdgeNGrams for 
productnames and Shingles for other terms -- then you just search against 
this core using your input.



-Hoss



Re: Autosuggest

2010-05-18 Thread Blargy

Thanks for the info Hoss.

I will probably need to go with one of the more complicated solutions. Is
there any online documentation for this task? Thanks.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autosuggest-tp818430p827329.html
Sent from the Solr - User mailing list archive at Nabble.com.


TikaEntityProcessor on Solr 1.4?

2010-05-18 Thread Sixten Otto
Sorry to repeat this question, but I realized that it probably
belonged in its own thread:

The TikaEntityProcessor class that enables DataImportHandler to
process business documents was added after the release of Solr 1.4,
along with some other changes (like the binary DataSources) to support
it. Obviously, there hasn't been an official release of Solr since
then. Has anyone tried back-porting those changes to Solr 1.4?

(I do see that the question was asked last month, without any
response: http://www.lucidimagination.com/search/document/5d2d25bc57c370e9)

The patches for these issues don't seem all that complex or pervasive,
but it's hard for me (as a Solr n00b) to tell whether this is really
all that's involved:
https://issues.apache.org/jira/browse/SOLR-1583
https://issues.apache.org/jira/browse/SOLR-1358

Sixten


Re: Which Solr to use?

2010-05-18 Thread Sixten Otto
On Tue, May 18, 2010 at 10:40 AM, Robert Muir  wrote:
> Some discussions/voting happened and the trunk is intended to be ...
> more like a normal trunk.
>
> If you need features not in an official release, and are looking for a
> codebase with updated features, I would recommend instead considering:
>
> http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/

So features are being actively added to / code rearranged in
trunk/4.0, with some of the work being back-ported to this branch to
form a stable 3.1 release? Is that accurate?

Is there any thinking about when that might drop (beyond the quite
understandable "when it's done")? Or, perhaps more reasonably, when it
might freeze?

(I've done some casual searching of the site + list archives without
finding this information, but by all means if there's a thread I
should go read to bone up on this stuff, a link is all I need.)

Sixten


Re: disable caches in real time

2010-05-18 Thread Chris Hostetter
: I want to know if there is any approach to disable caches in a specific core
: from a multicore server.

only via hte config.

: I have a multicore server where the core0 will be listen to the queries and
: other core (core1) that will be replicated from a master server. Once the
: replication has been done, i will swap the cores. My point is that i want to
: disable the caches in the core that is in charge of the replication to save
: memory in the machine.

that seems bizarely complicated -- replication can work against a "live" 
core, no need to do the swap yourself, the replicationHandler takes care 
of this for your transparently (ie: you have one core, replicating from a 
master -- the old index will be searched by users, and have caches, and 
when the new version of the index is ready, the replication handler will 
swap the *index* in that core (but the core itself never changes) ... it 
can even autowarm the caches on the new index for you before the swap if 
you configure it that way.

-Hoss



Re: Autosuggest

2010-05-18 Thread Chris Hostetter

: So there is no generally accepted preferred way to do auto-suggest? 

there are many generally accepted and preferred ways to do auto-suggest -- 
it all comes down to specifics goals and needs.

for example: using the TermsComponent is really simple to setup if you 
want your suggestions to come from a single field of your index and be in 
a simple ordering -- but if you want the suggested terms to be limited 
based on other criteria, or if you want to influence hte ordering, by 
other things you need to use a complicated solution (like Facets).

For people with really tricky requirements (like ordering the results by a 
custome rule) it can even make sense to setup a special core where each 
document corrisponds to a "term" to suggest, with a text field 
containing ngrams, and other fields containing numeric values that you use 
in boost functions.

there are lots of options -- all of them equally accepted -- prefrence is 
based on needs.

-Hoss



Storing RandomSortField

2010-05-18 Thread Alexandre Rocco
Hi guys,

Is there any way to mak a RandomSortField be stored?
I'm trying to do it for debugging purposes,
My intention is to take a look at the values that are stored there to
determine the sorting that is being applied to the results.

I tried to make it a stored field as:


And also tried to create another text field, copying the result from the
random field like this:



Neither of the approaches worked.
Is there any restriction on this kind of field that prevents it from being
displayed in the results?

Thanks,
Alexandre


'Minimum Should Match' on subquery level

2010-05-18 Thread Myron Chelyada
Hi All

I need to use Lucene's  `minimum number should match` option of BooleanQuery
on Solr.
Actually I need to do the same as DisMaxRequestHandler's `mm` parameter does
but to use it on subquery level,
i.e. I have complex query which consists of several Boolean subqueries and I
need to specify different 'minimum number should match'  threshold for each
of such sub-queries.

Can somebody advice me how can I do it with Solr?

Thanks  in advance,
Myron


RE: how to achieve filters

2010-05-18 Thread Ahmet Arslan

> q=rock&fq:bitrate:[* TO 128]
> 
> bitrate is int
> This also return docs with more then 128 bitrate, Is there
> something I am doing wrong 

If you are using solr 1.4.0 you need to use 



RE: how to achieve filters

2010-05-18 Thread Doddamani, Prakash
Thanks Ahmet,

Let me try these options

Regards
Prakash  

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Tuesday, May 18, 2010 9:06 PM
To: solr-user@lucene.apache.org
Subject: RE: how to achieve filters

> Yep content is string, and bitrate is int.

bitrate should be trie based tint, not int, for range queries work
correctly.

> I am digging more now Can we combine both the scenarios.
> 
> q=rock&fq={!field f=content}mp3
> q=rock&fq:bitrate:[* TO 128]
> 
> Say if I want only mp3 from 0 to 128

You can append filter queries (fq) as many as you want. 

&q=rock&fq={!field f=content}mp3&fq=bitrate:[* TO 128]




  


RE: how to achieve filters

2010-05-18 Thread Doddamani, Prakash
Hey

q=rock&fq:bitrate:[* TO 128]

bitrate is int
This also return docs with more then 128 bitrate, Is there something I am doing 
wrong 

Regards
prakash

-Original Message-
From: Doddamani, Prakash [mailto:prakash.doddam...@corp.aol.com] 
Sent: Tuesday, May 18, 2010 8:44 PM
To: solr-user@lucene.apache.org
Subject: RE: how to achieve filters

Thanks much Ahmet,

Yep content is string, and bitrate is int.

I am digging more now Can we combine both the scenarios.

q=rock&fq={!field f=content}mp3
q=rock&fq:bitrate:[* TO 128]

Say if I want only mp3 from 0 to 128

Regards
Prakash

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: Tuesday, May 18, 2010 8:24 PM
To: solr-user@lucene.apache.org
Subject: Re: how to achieve filters

> I am using "dismax" query to fetch docs from solr where I have set 
> some boost to the each fields,
> 
>  
> 
> If I search for query "Rock" I get following docs with some boost 
> value which I have specified,
> 
>  
> 
> 
>   19.494072
>   120
>   mp3
>   Rock
>   1
>   st name 1
> 
> 
>   19.494052
>   248
>   aac+
>   Rock
>   2
>   st name 2
> 
> 
>   19.494042
>   127
>   aac+
>   Rock
>   3
>   st name 3
> 
> 
>   19.494032
>   256
>   mp3
>   Rock
>   4
>   st name 5
> 
>  
> 
> I am looking for something below What is the best way to achieve them 
> ?

With filter queries. fq=

> 1. Query=rock where content= mp3 where it should return only first and 
> last docs where content=mp3

Assuming that content is string typed. q=rock&fq={!field f=content}mp3 

> 2. Query=rock where bitrate<128 where it should return only first and 
> third docs where bitrate<128

&q=rock&fq:bitrate:[* TO 128] for this bitrate field must be tint type.



  


RE: how to achieve filters

2010-05-18 Thread Ahmet Arslan
> Yep content is string, and bitrate is int.

bitrate should be trie based tint, not int, for range queries work correctly.

> I am digging more now Can we combine both the scenarios.
> 
> q=rock&fq={!field f=content}mp3 
> q=rock&fq:bitrate:[* TO 128]
> 
> Say if I want only mp3 from 0 to 128

You can append filter queries (fq) as many as you want. 

&q=rock&fq={!field f=content}mp3&fq=bitrate:[* TO 128]




  


Re: Recommended MySQL JDBC driver

2010-05-18 Thread Shawn Heisey

On 5/14/2010 12:40 PM, Shawn Heisey wrote:
I downgraded to 5.0.8 for testing. Initially, I thought it was going 
to be faster, but it slows down as it gets further into the index.  It 
now looks like it's probably going to take the same amount of time.


On the server timeout thing - that's a setting you'd have to put in 
my.ini or my.cfg, there may also be a way to change it on the fly 
without restarting the server.  I suspect that when you are running a 
multiple query setup like yours, it opens multiple connections, and 
when one of them is busy doing some work, the others are idle.  That 
may be related to the timeout with the older connector version.  On my 
setup, I only have one query that retrieves records, so I'm probably 
not going to run into that.  I could be wrong about how it works - you 
can confirm or refute this idea by looking at SHOW PROCESSLIST on your 
MySQL server while it's working.


I was having no trouble with the 5.0.8 connector on 1.5-dev build 
922440M, but then I upgraded the test machine to the latest 4.0 from 
trunk, and ran into the timeout issue you described, so I am going back 
to the 5.1.12 connector.  I just saw the message on the list about 
branch_3x in SVN, which looks like a better option than trunk.


Shawn



RE: how to achieve filters

2010-05-18 Thread Doddamani, Prakash
Thanks much Ahmet,

Yep content is string, and bitrate is int.

I am digging more now Can we combine both the scenarios.

q=rock&fq={!field f=content}mp3 
q=rock&fq:bitrate:[* TO 128]

Say if I want only mp3 from 0 to 128

Regards
Prakash

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Tuesday, May 18, 2010 8:24 PM
To: solr-user@lucene.apache.org
Subject: Re: how to achieve filters

> I am using "dismax" query to fetch docs from solr where I have set 
> some boost to the each fields,
> 
>  
> 
> If I search for query "Rock" I get following docs with some boost 
> value which I have specified,
> 
>  
> 
> 
>   19.494072
>   120
>   mp3
>   Rock
>   1
>   st name 1
> 
> 
>   19.494052
>   248
>   aac+
>   Rock
>   2
>   st name 2
> 
> 
>   19.494042
>   127
>   aac+
>   Rock
>   3
>   st name 3
> 
> 
>   19.494032
>   256
>   mp3
>   Rock
>   4
>   st name 5
> 
>  
> 
> I am looking for something below What is the best way to achieve them 
> ?

With filter queries. fq=

> 1. Query=rock where content= mp3 where it should return only first and 
> last docs where content=mp3

Assuming that content is string typed. q=rock&fq={!field f=content}mp3 

> 2. Query=rock where bitrate<128 where it should return only first and 
> third docs where bitrate<128

&q=rock&fq:bitrate:[* TO 128] for this bitrate field must be tint type.



  


Re: how to achieve filters

2010-05-18 Thread Michael Kuhlmann
Am 18.05.2010 16:54, schrieb Ahmet Arslan:
>> 2. Query=rock where bitrate<128 where it should return
>> only first and third docs where bitrate<128
> 
> &q=rock&fq:bitrate:[* TO 128] for this bitrate field must be tint type.
> 

&q=rock&fq:bitrate:[* TO 127] would be better, because bitrate should be
lower than 128.

BTW, a bitrate of 127 is interesting...

@Prakash:
See http://wiki.apache.org/solr/SolrFacetingOverview


Re: how to achieve filters

2010-05-18 Thread Ahmet Arslan
> I am using "dismax" query to fetch docs from solr where I
> have set some
> boost to the each fields,
> 
>  
> 
> If I search for query "Rock" I get following docs with some
> boost value
> which I have specified,
> 
>  
> 
> 
>   19.494072
>   120
>   mp3
>   Rock
>   1
>   st name 1
> 
> 
>   19.494052
>   248
>   aac+
>   Rock
>   2
>   st name 2
> 
> 
>   19.494042
>   127
>   aac+
>   Rock
>   3
>   st name 3
> 
> 
>   19.494032
>   256
>   mp3
>   Rock
>   4
>   st name 5
> 
>  
> 
> I am looking for something below What is the best way to
> achieve them ?

With filter queries. fq=

> 1. Query=rock where content= mp3 where it should return
> only first and
> last docs where content=mp3

Assuming that content is string typed. q=rock&fq={!field f=content}mp3 

> 2. Query=rock where bitrate<128 where it should return
> only first and third docs where bitrate<128

&q=rock&fq:bitrate:[* TO 128] for this bitrate field must be tint type.






Re: Which Solr to use?

2010-05-18 Thread Robert Muir
On Mon, May 17, 2010 at 8:22 PM, Sixten Otto  wrote:
> - Plunge ahead with the trunk, and hope that things stabilize by a few
> months from now, when we'd be hoping to go live on one of our biggest
> client sites.
> - Go with the last 1.5 code, knowing that the features we want are in
> there, and hope we don't run into anything majorly broken.
> - Stick with 1.4, and just accept the necessity of needing to push
> content to the HTTP interface.
>

Of course this is really up to you, but personally I would not
recommend using the trunk (slated to become 4.0) and hope that it
stabilizes.

Some discussions/voting happened and the trunk is intended to be ...
more like a normal trunk.

If you need features not in an official release, and are looking for a
codebase with updated features, I would recommend instead considering:

http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/

I am sure someone will disagree, but its my opinion that this 3.x
release branch is actually more stable than you might think, it gets
all the bugfixes and "safe features" from the trunk, but nothing
really risky or scary.

So for example, it gets a lot of bugfixes and cleanups, and gets
things like improvements to spatial and new analyzers, but doesn't get
the really risky stuff like flexible indexing changes from Lucene.

https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/solr/CHANGES.txt
https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/lucene/CHANGES.txt
https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/lucene/contrib/CHANGES.txt

-- 
Robert Muir
rcm...@gmail.com


Re: Multifaceting on multivalued field

2010-05-18 Thread Peter Karich
Hi Marco,

oh, awkward. Thanks a lot!!

Regards,
Peter.

> Hi,
>
> This exception is fired when you don't have this field on your index, but
> this comes because you have an error in your query syntax  !{ex=cars}cars,
> should be {*!*ex=cars}cars , whith the exclamation inside the brackets.
>
>
>
> Marco Martínez Bautista
> http://www.paradigmatecnologico.com
> Avenida de Europa, 26. Ática 5. 3ª Planta
> 28224 Pozuelo de Alarcón
> Tel.: 91 352 59 42
>
>
> 2010/5/18 Peter Karich 
>
>   
>> Hi all,
>>
>> I read about multifaceting [1] and tried it for myself. With
>> multifaceting I would like to conserve the number of documents for the
>> 'un-facetted case'. This works nice with normal fields, but I get an
>> exception [2] if I apply this on a multivalued field.
>> Is this a bug or logical :-) ? If the latter one is the case, would
>> anybody help me to understand this?
>>
>> Regards,
>> Peter.
>>
>> [1]
>>
>> http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html
>>
>> [2]
>> org.apache.solr.common.SolrException: undefined field !{ex=cars}cars
>>at org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1077)
>>at
>> org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:226)
>>at
>>
>> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:283)
>>at
>> org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:166)
>>at
>>
>> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)
>>at
>>
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
>>at
>>
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>>at
>>
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:336)
>>at
>>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:239)
>>
>>
>> 
>   


-- 
Free your timetabling!
http://timefinder.sourceforge.net/



how to achieve filters

2010-05-18 Thread Doddamani, Prakash
Hi All

I am using "dismax" query to fetch docs from solr where I have set some
boost to the each fields,

 

If I search for query "Rock" I get following docs with some boost value
which I have specified,

 


  19.494072
  120
  mp3
  Rock
  1
  st name 1


  19.494052
  248
  aac+
  Rock
  2
  st name 2


  19.494042
  127
  aac+
  Rock
  3
  st name 3


  19.494032
  256
  mp3
  Rock
  4
  st name 5

 

I am looking for something below What is the best way to achieve them ?

1. Query=rock where content= mp3 where it should return only first and
last docs where content=mp3

2. Query=rock where bitrate<128 where it should return only first and
third docs where bitrate<128

 

 

Thanks in advance

Prakash



Long startup phase

2010-05-18 Thread Andreas Jung
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi there,

trying to deploy Solr 1.4/JDK 1.6/CentOS Linux 64bit
on a new production server.

Starting Solr takes very long on this machine. In particular
it seems to hang for a minute or two showing only this on the
console:

[...@db01 backend_buildout]$ bin/solr-instance fg
2010-05-18 16:22:51.507::INFO:  Logging to STDERR via
org.mortbay.log.StdErrLog
2010-05-18 16:22:51.585::INFO:  jetty-6.1.3

Using strace shows that the process since to be waiting aka hanging
in the wait4() call below. Any idea?

Andreas

open("/usr/local/lib/python2.6/plat-linux2/cStringIO.py", O_RDONLY) = -1
ENOENT (No such file or directory)
open("/usr/local/lib/python2.6/plat-linux2/cStringIO.pyc", O_RDONLY) =
- -1 ENOENT (No such file or directory)
stat("/usr/local/lib/python2.6/lib-tk/cStringIO", 0x7fff292873f0) = -1
ENOENT (No such file or directory)
open("/usr/local/lib/python2.6/lib-tk/cStringIO.so", O_RDONLY) = -1
ENOENT (No such file or directory)
open("/usr/local/lib/python2.6/lib-tk/cStringIOmodule.so", O_RDONLY) =
- -1 ENOENT (No such file or directory)
open("/usr/local/lib/python2.6/lib-tk/cStringIO.py", O_RDONLY) = -1
ENOENT (No such file or directory)
open("/usr/local/lib/python2.6/lib-tk/cStringIO.pyc", O_RDONLY) = -1
ENOENT (No such file or directory)
stat("/usr/local/lib/python2.6/lib-old/cStringIO", 0x7fff292873f0) = -1
ENOENT (No such file or directory)
open("/usr/local/lib/python2.6/lib-old/cStringIO.so", O_RDONLY) = -1
ENOENT (No such file or directory)
open("/usr/local/lib/python2.6/lib-old/cStringIOmodule.so", O_RDONLY) =
- -1 ENOENT (No such file or directory)
open("/usr/local/lib/python2.6/lib-old/cStringIO.py", O_RDONLY) = -1
ENOENT (No such file or directory)
open("/usr/local/lib/python2.6/lib-old/cStringIO.pyc", O_RDONLY) = -1
ENOENT (No such file or directory)
stat("/usr/local/lib/python2.6/lib-dynload/cStringIO", 0x7fff292873f0) =
- -1 ENOENT (No such file or directory)
open("/usr/local/lib/python2.6/lib-dynload/cStringIO.so", O_RDONLY) = 5
fstat(5, {st_mode=S_IFREG|0755, st_size=50484, ...}) = 0
open("/usr/local/lib/python2.6/lib-dynload/cStringIO.so", O_RDONLY) = 6
read(6,
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\31\0\0\0\0\0\0"...,
832) = 832
fstat(6, {st_mode=S_IFREG|0755, st_size=50484, ...}) = 0
mmap(NULL, 2114584, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 6,
0) = 0x2acde2995000
mprotect(0x2acde2999000, 2093056, PROT_NONE) = 0
mmap(0x2acde2b98000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 6, 0x3000) = 0x2acde2b98000
close(6)= 0
close(5)= 0
close(4)= 0
getrlimit(RLIMIT_NOFILE, {rlim_cur=1, rlim_max=1}) = 0
close(3)= 0
pipe([3, 4])= 0
fcntl(4, F_GETFD)   = 0
fcntl(4, F_SETFD, FD_CLOEXEC)   = 0
clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x2acdde8a9a50) = 21603
close(4)= 0
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x2acde2b9a000
read(3, "", 1048576)= 0
mremap(0x2acde2b9a000, 1052672, 4096, MREMAP_MAYMOVE) = 0x2acde2b9a000
close(3)= 0
munmap(0x2acde2b9a000, 4096)= 0
wait4(21603, 0x7fff2928f6f4, WNOHANG, NULL) = 0
wait4(21603, 2010-05-18 16:21:04.731::INFO:

Logging to STDERR via org.mortbay.log.StdErrLog
2010-05-18 16:21:04.811::INFO:  jetty-6.1.3
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkvypD8ACgkQCJIWIbr9KYz5BwCfSUgefA5fWco2grsIC4nE+4O9
neYAoJpx5J/s6wu89CG5TdKYCZqts4u1
=IXgz
-END PGP SIGNATURE-


Re: Multifaceting on multivalued field

2010-05-18 Thread Marco Martinez
Hi,

This exception is fired when you don't have this field on your index, but
this comes because you have an error in your query syntax  !{ex=cars}cars,
should be {*!*ex=cars}cars , whith the exclamation inside the brackets.



Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2010/5/18 Peter Karich 

> Hi all,
>
> I read about multifaceting [1] and tried it for myself. With
> multifaceting I would like to conserve the number of documents for the
> 'un-facetted case'. This works nice with normal fields, but I get an
> exception [2] if I apply this on a multivalued field.
> Is this a bug or logical :-) ? If the latter one is the case, would
> anybody help me to understand this?
>
> Regards,
> Peter.
>
> [1]
>
> http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html
>
> [2]
> org.apache.solr.common.SolrException: undefined field !{ex=cars}cars
>at org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1077)
>at
> org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:226)
>at
>
> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:283)
>at
> org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:166)
>at
>
> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)
>at
>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
>at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>at
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:336)
>at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:239)
>
>


Solr Cell and encrypted pdf files

2010-05-18 Thread Yiannis Pericleous

Hi,

I can't seem to get solr cell to index password protected pdf files.
I can't figure out how to pass the password to tika and looking at 
ExtractingDocumentLoader,
it doesn't seem to pass any pdf password related metadata to the tika 
parser.


Whatever I do, pdfbox complains that: "The supplied password does not 
match either the owner or user password in the document."


If i strip the password manually before trying to index the document it 
works


What I'm I missing?

thanks!

yiannis


Multifaceting on multivalued field

2010-05-18 Thread Peter Karich
Hi all,

I read about multifaceting [1] and tried it for myself. With
multifaceting I would like to conserve the number of documents for the
'un-facetted case'. This works nice with normal fields, but I get an
exception [2] if I apply this on a multivalued field.
Is this a bug or logical :-) ? If the latter one is the case, would
anybody help me to understand this?

Regards,
Peter.

[1]
http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html

[2]
org.apache.solr.common.SolrException: undefined field !{ex=cars}cars
at org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1077)
at
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:226)
at
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:283)
at
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:166)
at
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:336)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:239)



Re: DIH. behavior after a import. Log, delete table !?

2010-05-18 Thread Ahmet Arslan

> how can i say that solr should start the jar after every
> Delta-Import NOT
> after every Full-Import ? 

You cannot distinguish between delta or full. So you need to do it in your jar 
program. In your java program you need to send GET method to url 
http://localhost:8080/solr/dataimport

if result string/xml contains contains 'idle' and 
'Delta Dump started' then you can truncate your table.

if result string contains contains 'idle' and 'Full 
Dump Started' then do nothing.




  


Re: DIH. behavior after a import. Log, delete table !?

2010-05-18 Thread stockii

how can i say that solr should start the jar after every Delta-Import NOT
after every Full-Import ? 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-behavior-after-a-import-Log-delete-table-tp823232p825717.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Architecture discussion

2010-05-18 Thread rabahb

Hi,

I'd like to get some architectural advices concerning the setup of a solr
(v1.4) platform in a production environment. 
I'll first describe my targeted architecture and then ask the questions
related to that environment.


Here's briefly what I achieved so far:

I've already setup an environment which serves as a proof of concept. This
environment is composed of a master instance on one host, 
and a slave instance on a second host. The slave handles 2 solr cores. 
In the final version of the architecture I would add up one ore more SLAVE
nodes depending on the request load.  

  request   
   |
   V  
[  MASTER [core]  ] --- [SLAVE [core1] <--swap-->[core2]  ]
   |
   v
   [index backup]

The goal of this architecture is:
* Isolate indexing from requesting
* Enable index replication from master to slave
* Control the swap between newly replicated index  (use of dual core per
Slave ) 

Here's how the whole platform works when we need to renew the index (on the
slaves)
1- backup index files on master using solr backup capability (a backup is
always welcome)
2- launch index creation (I'm using the delta indexing capabilities in order
to limit the index generation time)
3- trigger replication from master core to slave core2 based on solr
capabilities too
4- trigger swap between core 1 and core2
5- At this point Slave index has been renewed ... we can revert back to the
previous index if there was any issues with the new one.

As this is aimed to be a production environment, redondancy is one of the
key elements, meaning that will double (or more) the front solr 
instances. If slave instances are not in the same network as the Master
instance, our strategy will probably be to set up one of the slaves 
as a relay.

That said, here are my questions:

1 /  I'd like to have insight about issues that may happen with that kind of
architecture?

2 / My first concern is about the size of the index that would need to be
replicated. We need to perform indexing all day long (every 5min) and
replicate as soon as the index is built.
As far as I know, replication copies over all the index files. I think that
there can not be delta replication (only replicating what changed). That's
my assumption. 
But, is there any way to make a delta replication if that make any sense?

3 / How can I improve this architecture based on your own experience?
Ex: Shall I use different network interface for solr commands and requests?

Thank you for sharing.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Architecture-discussion-tp825708p825708.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR-788 and merged trunk

2010-05-18 Thread Shawn Heisey

On 5/17/2010 3:34 PM, Shawn Heisey wrote:
I am looking at SOLR-788, trying to apply it to latest trunk.  It 
looks like that's going to require some rework, because the included 
constant PURPOSE_GET_MLT_RESULTS conflicts with something added later, 
PURPOSE_GET_TERMS.


How hard would it be to rework this to apply correctly to trunk?  Is 
it simply a matter of advancing the constant to the next bit in the 
mask?  There's been no discussion on the issue as to whether the 
original patch or the alternate one is better.  Does anyone know?


I could not make the original patch work.  I did get it to apply, but it 
would not compile.   With some massaging, the alternate patch applied, 
compiled, and seems to have passed all junit tests as well.  Considering 
that it's nearly 2 AM here, I will play further tomorrow.  I did have 
one question that I hope someone can answer.  It looks like the DIH has 
been moved outside the war file into separate jars that I will have to 
ensure are in the lib directory.  Is that an accurate statement?


Shawn