Re: Filter to cut out all zeors?

2010-03-10 Thread Norberto Meijome
won't this replace *all* 0s ? ie, 1024 will become 124 ?
_
{Beto|Norberto|Numard} Meijome

"The only people that never change are the stupid and the dead"
 Jorge Luis Borges.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


On 11 March 2010 03:24, Sebastian F  wrote:

> yes, thank you. That was exactly what I was looking for! Great help!
>
>
>
>
> 
> From: Ahmet Arslan 
> To: solr-user@lucene.apache.org
> Sent: Tue, March 9, 2010 7:26:46 PM
> Subject: Re: Filter to cut out all zeors?
>
> > I'm trying to figure out the best way to cut out all zeros
> > of an input string like "01.10." or "022.300"...
> > Is there such a filter in Solr or anything similar that I
> > can adapt to do the task?
>
> With solr.MappingCharFilterFactory[1] you can replace all zeros with ""
> before tokenizer.
>
> 
>
> SolrHome/conf/mapping.txt file will contain this line:
>
> "0" => ""
>
> So that "01.10." will become "1.1." and  "022.300" will become "22.3" Is
> that you want?
>
> [1]
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.MappingCharFilterFactory
>
>
>
>


Re: weird problem with letters S and T

2009-10-28 Thread Norberto Meijome
On Wed, 28 Oct 2009 19:20:37 -0400
Joel Nylund  wrote:

> Well I tried removing those 2 letters from stopwords, didnt seem to  
> help, I also tried changing the field type to "text_ws", didnt seem to  
> work. Any other ideas?


Hi Joel,
if your stop word filter was applied on index, you will have to reindex again 
(at least those documents with S and T).

If your stop filter was *only* on query, then it should work after you reloaded 
your app.

b

_
{Beto|Norberto|Numard} Meijome

"Those who do not remember the past are condemned to repeat it."
   George Santayana

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: 99.9% uptime requirement

2009-08-04 Thread Norberto Meijome
On Mon, 3 Aug 2009 13:15:44 -0700
"Robert Petersen"  wrote:

> Thanks all, I figured there would be more talk about daemontools if there
> were really a need.  I appreciate the input and for starters we'll put two
> slaves behind a load balancer and grow it from there.
> 

Robert,
not taking away from daemon tools, but daemon tools won't help you if your
whole server goes down.

 don't put all your eggs in one basket - several
servers, load balancer (hardware load balancers x 2, haproxy, etc)

and sure, use daemon tools to keep your services running within each server...

B
_
{Beto|Norberto|Numard} Meijome

"Why do you sit there looking like an envelope without any address on it?"
  Mark Twain

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Updating Solr index from XML files

2009-07-08 Thread Norberto Meijome
On Tue, 7 Jul 2009 22:16:04 -0700
Francis Yakin  wrote:

> 
> I have the following "curl" cmd to update and doing commit to Solr ( I have
> 10 xml files just for testing)

[...]

hello,
DIH supports XML, right? 

not sure if it works with n files...but it's worth looking at it. 
alternatively, u can write a relatively simple java app that will pick each 
file up and post it for you using SolrJ
b

_
{Beto|Norberto|Numard} Meijome

"Mix a little foolishness with your serious plans;
it's lovely to be silly at the right moment."
   Horace

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Is there any other way to load the index beside using "http" connection?

2009-07-08 Thread Norberto Meijome
On Tue, 7 Jul 2009 13:54:07 -0700
Francis Yakin  wrote:
[...]
> much on our setup.
> 
> Like said we have file name "test.xml" which come from SQL output , we put it
> locally on the solr server under "/opt/test.xml"
> 
> So, I need to execute the commands from solr system to add and update this to
> the solr data/indexes.
> 
> What commands do I have to use, for example the xml file
> named" /opt/test.xml" ?
> 

Francis,
as much as we can tell you the answer, have you tried reading the documentation 
in the wiki, and the example setup bundled with SOLR?

Most, if not ALL your questions, are answered there.

Good luck,
B
_
{Beto|Norberto|Numard} Meijome

Computers are like air conditioners; they can't do their job properly if you 
open windows.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Is there any other way to load the index beside using "http" connection?

2009-07-06 Thread Norberto Meijome
On Mon, 6 Jul 2009 09:56:03 -0700
Francis Yakin  wrote:

>  Norberto,
> 
> Thanks, I think my questions is:
> 
> >>why not generate your SQL output directly into your oracle server as a file
> 
> What type of file is this?
> 
> 

a file in a format that you can then import into SOLR. 

_
{Beto|Norberto|Numard} Meijome

"Gravity cannot be blamed for people falling in love."
  Albert Einstein

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Is there any other way to load the index beside using "http" connection?

2009-07-06 Thread Norberto Meijome
On Sun, 5 Jul 2009 10:28:16 -0700
Francis Yakin  wrote:

[...]> 
> >upload the file to your SOLR server? Then the data file is local to your SOLR
> >server , you will bypass any WAN and firewall you may be having. (or some
> >variation of it, sql -> SOLR server as file, etc..)
> 
> How we upload the file? Do we need to convert the data file to Lucene Index
> first? And Documentation how we do this?

pick your poison... rsync? ftp? scp ? 

B
_
{Beto|Norberto|Numard} Meijome

"The freethinking of one age is the common sense of the next."
   Matthew Arnold

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Is there any other way to load the index beside using "http" connection?

2009-07-06 Thread Norberto Meijome
On Sun, 5 Jul 2009 21:36:35 +0200
Marcus Herou  wrote:

> Sharing some of our exports from DB to solr. Note: many of the statements
> below might not work due to clip-clip.

thx Marcus - but that's a DIH config right? :)
b
_
{Beto|Norberto|Numard} Meijome

"I respect faith, but doubt is what gives you an education."
   Wilson Mizner

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Is there any other way to load the index beside using "http" connection?

2009-07-05 Thread Norberto Meijome
On Thu, 2 Jul 2009 11:02:28 -0700
Francis Yakin  wrote:

> Norberto, Thanks for your input.
> 
> What do you mean with "Have you tried connecting to  SOLR over HTTP from
> localhost, therefore avoiding any firewall issues and network latency ? it
> should work a LOT faster than from a remote site." ?
>
> 
> Here are how our servers lay out:
> 
> 1) Database ( Oracle ) is running on separate machine
> 2) Solr master is running on separate machine by itself
> 3) 6 solr slaves ( these 6 pulll the index from master using rsync)
> 
> We have a SQL(Oracle) script to post the data/index from Oracle Database
> machine to Solr Master over http. We wrote those script(Someone in Oracle
> Database administrator write it).

You said in your other email you are having issues with slow transfers between
1) and 2). Your subject relates to the data transfer between 1) and 2, - 2) and
3) is irrelevant to this part.

My question (what you quoted above) relates to the point you made about it
being slow ( WHY is it slow?), and issues with opening so many connections
through firewall. so, I'll rephrase my question (see below...)

[]
> 
> We can not do localhost since it's solr is not running on Oracle machine.

why not generate your SQL output directly into your oracle server as a file,
upload the file to your SOLR server? Then the data file is local to your SOLR
server , you will bypass any WAN and firewall you may be having. (or some
variation of it, sql -> SOLR server as file, etc..)

Any speed issues that are rooted in the fact that you are posting via
HTTP (vs embedded solr or DIH) aren't going to go away. But it's the simpler
approach without changing too much of your current setup.

 
> Another alternative that we think of is to transform XML into CSV and
> import/export it.
> 
> How about if LUSQL, some mentioned about this? Is this apps free(open source)
> application? Do you have any experience with this apps?

Not i, sorry.

Have you looked into DIH? It's designed for this kind of work.

B
_
{Beto|Norberto|Numard} Meijome

"Great spirits have often encountered violent opposition from mediocre minds."
  Albert Einstein

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Is there any other way to load the index beside using "http" connection?

2009-07-05 Thread Norberto Meijome
On Thu, 2 Jul 2009 11:28:51 -0700
Francis Yakin  wrote:

>  Norberto,
> 


Hi Francis,
Please reply to the list, or keep it in CC.

> You saying:
> 
> "Other alternatives are to transform the XML into csv and import it that way"
> 
> How do you transfer that CSV file to Solr?
> 

http://wiki.apache.org/solr/UpdateCSV 

There actually is a LOT of information in the wiki, as well as the mailing list 
archives.

good luck,
B
_
{Beto|Norberto|Numard} Meijome

"The freethinking of one age is the common sense of the next."
   Matthew Arnold

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Is it problem? I use solr to search and index is made by lucene. (not EmbeddedSolrServer(wiki is old))

2009-07-02 Thread Norberto Meijome
On Thu, 2 Jul 2009 16:12:58 +0800
James liu  wrote:

> I use solr to search and index is made by lucene. (not
> EmbeddedSolrServer(wiki is old))
> 
> Is it problem when i use solr to search?
> 
> which the difference between Index(made by lucene and solr)?

Hi James,
make sure the version of Lucene used to create your index is the same as the
libraries included in your version of SOLR. it should work.

it may be that an older lucene index works with a newer lucene-provided-in-solr
libs, but after using it you may not be able to go back , but i am not sure of
the details.

probably an FAQ by now - check the archives  :)

good luck,
B
_
{Beto|Norberto|Numard} Meijome

"He has no enemies, but is intensely disliked by his friends."
  Oscar Wilde

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Is there any other way to load the index beside using "http" connection?

2009-07-02 Thread Norberto Meijome
On Wed, 1 Jul 2009 15:07:12 -0700
Francis Yakin  wrote:

> 
> We have several thousands of  xml files in database that we load it to solr
> master The Database uses "http"  connection and transfer those files to solr
> master. Solr then  translate xml files to their lindex.
> 
> We are experiencing issue with close/open connection in the firewall and very
> very slow.
> 
> Is there any other way to load the data/index from Database to solr master
> beside using http connection, so it means we just scp/ftp the xml file  from
> Database system to solr master  and let solr convert those to lucene indexes?
> 

Francis,
after reading the whole thread, it seems you have :
  - Data source : Oracle DB, on separate location to your SOLR.
  - Data format : XML output.
  
definitely DIH is a great option, but since you are on 1.2, not available to 
you (you should look into upgrading if you can!). 

Have you tried connecting to  SOLR over HTTP from localhost, therefore avoiding 
any firewall issues and network latency ? it should work a LOT faster than from 
a remote site. Also make sure not to commit until you really needed.

Other alternatives are to transform the XML into csv and import it that way. Or 
write a simple app that will parse the xml and post it directly using the 
embedded solr method.

plenty of options, all of them documented @ solr's site.

good luck,
b 
_
{Beto|Norberto|Numard} Meijome

"People demand freedom of speech to make up for the freedom of thought which 
they avoid. " 
  Soren Aabye Kierkegaard

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Excluding Characters and SubStrings in a Faceted Wildcard Query

2009-07-01 Thread Norberto Meijome
On Mon, 29 Jun 2009 15:10:59 +0100
Ben  wrote:

> Hi Erik,
> 
> I'm not sure exactly how much context you need here, so I'll try to keep 
> it short and expand as needed.
> 
> The column I am faceting contains a comma deliniated set of vectors. 
> Each vector is made up of {Make,Year,Model} e.g. 
> _ford_1996_focus,mercedes_1996_clk,ford_2000_focus
> 
> I have a custom request handler, where if I want to find all the cars 
> from 1996 I pass in a facet query for the Year (1996) which is 
> transformed to a wildcard facet query :
> 
> _*_1996_*
> 
> In otherwords, it'll match any records whose vector column contains a 
> string, which somewhere has a car from 1996.
> 
> Why not put the Make, Year and Model in separate columns and do a facet 
> query of multiple columns?... because once we've selected 1996, we 
> should (in the above example) then be offering "ford and mercedes" as 
> further facet choices, and nothing more. If the parts were in their own 
> columns, there would be no way to tie the Makes and Models to specific 
> years, for example.
> 
[...]

Hi,
It must be late and I probably need more $coffee... but isn't what u just
described (search for 1996, show 'ford', 'mercedes') how facets DO work?
 once you have the facet on the make field, and solr told you that both 'ford'
 and 'mercedes' are available in that field, it is up to you to search for
 'make=ford and date=1996" if you ONLY want fords, generation 1996... 

cheers,
B
_
{Beto|Norberto|Numard} Meijome

"He has the attention span of a lightning bolt."
  Robert Redford

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Solr document security

2009-06-25 Thread Norberto Meijome
On Wed, 24 Jun 2009 23:20:26 -0700 (PDT)
pof  wrote:

> 
> Hi, I am wanting to add document-level security that works as following: An
> external process makes a query to the index, depending on their security
> allowences based of a login id a list of hits are returned minus any the
> user are meant to know even exist. I was thinking maybe a custom filter with
> a JDBC connection to check security of the user vs. the document. I'm not
> sure how I would add the filter or how to write the filter or how to get the
> login id from a GET parameter. Any suggestions, comments etc.?

Hi Brett,
(keeping in mind that i've been away from SOLR for 8 months, but i
dont think this was added of late)

standard approach is to manage security @ your
application layer, not @ SOLR. ie, search, return documents (which should
contain some kind of data to identify their ACL ) and then you can decide
whether to show it or not. 

HIH
_
{Beto|Norberto|Numard} Meijome

"They never open their mouths without subtracting from the sum of human
knowledge." Thomas Brackett Reed

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: How can i indexing MS-Outlook files?

2008-12-23 Thread Norberto Meijome
On Sun, 14 Dec 2008 19:22:00 -0800 (PST)
Otis Gospodnetic  wrote:

> Perhaps an easier alternative is to index not the MS-Outlook files
> themselves, but email messages pulled from the IMAP or POP servers, if that's
> where the original emails live.

PST files ('outlook files') are local to the end user and quite possibly their
contents aren't available in the server anymore.

Another alternative could be to access, from Exchange's
"file system" itself, the files that represent each object... I don't know
whether this is still possible in Exchange 2007, or whether it is 'sanctioned'
by MS... Possibly some kind of object interface with exchange itself would be
most desirable


_
{Beto|Norberto|Numard} Meijome

FAST, CHEAP, SECURE: Pick Any TWO

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: port of Nutch CommonGrams to Solr for help with slow phrase queries

2008-11-25 Thread Norberto Meijome
On Wed, 26 Nov 2008 10:08:03 +1100
Norberto Meijome <[EMAIL PROTECTED]> wrote:

> We didn't notice any severe performance hit but :
> - data set isn't huge ( ca 1 MM docs).
> - reindexed nightly via DIH from MS-SQL, so we can use a separate cache layer
> to lower the number of hits to SOLR.

To make this clear - there was a noticeable hit when we removed stop words, but
the nature of the beast forced our hand.

b

_
{Beto|Norberto|Numard} Meijome

"Peace can only be achieved by understanding."
   Albert Einstein

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: port of Nutch CommonGrams to Solr for help with slow phrase queries

2008-11-25 Thread Norberto Meijome
On Mon, 24 Nov 2008 13:31:39 -0500
"Burton-West, Tom" <[EMAIL PROTECTED]> wrote:

> The approach to this problem used by Nutch looks promising.  Has anyone
> ported the Nutch CommonGrams filter to Solr?
> 
> "Construct n-grams for frequently occuring terms and phrases while
> indexing. Optimize phrase queries to use the n-grams. Single terms are
> still indexed too, with n-grams overlaid."
> http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/C
> ommonGrams.html

Tom,
i haven't used Nutch's implementation, but used the current implementation
(1.3) of ngrams and shingles to address exactly the same issue ( database of
music albums and tracks). 
We didn't notice any severe performance hit but :
- data set isn't huge ( ca 1 MM docs).
- reindexed nightly via DIH from MS-SQL, so we can use a separate cache layer to
lower the number of hits to SOLR.

B
_
{Beto|Norberto|Numard} Meijome

"Truth has no special time of its own.  Its hour is now -- always."
   Albert Schweitzer

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Using Solr for indexing emails

2008-11-25 Thread Norberto Meijome
On Tue, 25 Nov 2008 03:59:31 +0200
Timo Sirainen <[EMAIL PROTECTED]> wrote:

> > would it be faster to say q=user: AND highestuid:[ * TO *]  ?  
> 
> Now that I read again what fq really did, yes, sounds like you're right.

you may want to compare them both to see which one is better... I just went
from memory :P

> > ( and i
> > guess you'd sort DESC and return 1 record only).  
> 
> No, I'd use the above for getting highestuid value for all mailboxes
> (there should be only one record per mailbox (each mailbox has separate
> uid values -> separate highestuid value)) so I can look at the returned
> highestuid values to see what mailboxes aren't fully indexed yet.

gotcha. It is an interesting use of SOLR, i must say... I for one am not used
to having to deal with up to the second update needs.

good luck,
B

_
{Beto|Norberto|Numard} Meijome

"Never offend people with style when you can offend them with substance."
  Sam Brown

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Using Solr for indexing emails

2008-11-24 Thread Norberto Meijome
On Mon, 24 Nov 2008 20:21:17 +0200
Timo Sirainen <[EMAIL PROTECTED]> wrote:

> I think I gave enough reasons above for why I don't like this
> solution. :) I also don't like adding new shared global state databases
> just for Solr. Solr should be the one shared global state database..

fair enough - it makes more sense to me now :)

[...]
> Store the per-mailbox highest indexed UID in a new unique field created
> like "//". Always update it by deleting the
> old one first and then adding the new one.

you mean delete, commit, add, commit? if you replace the record, simply
submitting the new document and committing would do (of course, you must ensure
the value of the  uniqueKey field matches, so SOLR replaces the old doc).

> So to find out the highest
> indexed UID for a mailbox just look it up using its unique field. For
> finding the highest indexed UID for a user's all mailboxes do a single
> query:
> 
>  - fl=highestuid
>  - q=highestuid:[* TO *]
>  - fq=user:

would it be faster to say q=user: AND highestuid:[ * TO *]  ?  ( and i
guess you'd sort DESC and return 1 record only).

> If messages are being simultaneously indexed by multiple processes the
> highest-uid value may sometimes (rarely) be set too low, but that
> doesn't matter. The next search will try to re-add some of the messages
> that were already in index, but because they'll have the same unique IDs
> than what already exists they won't get added again. The highest-uid
> gets updated and all is well.

B
_
{Beto|Norberto|Numard} Meijome

Mind over matter: if you don't mind, it doesn't matter

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Using Solr for indexing emails

2008-11-23 Thread Norberto Meijome
On Sun, 23 Nov 2008 16:02:16 +0200
Timo Sirainen <[EMAIL PROTECTED]> wrote:

> Hi,

Hi Timo,

> 
[...]

> The main problem is that before doing the search, I first have to check
> if there are any unindexed messages and then add them to Solr. This is
> done using a query like:
>  - fl=uid
>  - rows=1
>  - sort=uid desc
>  - q=uidv: box: user:

So, if I understand correctly, the process is :

1. user sends search query Q to search interface
2. interface checks highest indexed uidv in SOLR
3. checks in IMAP store for mailbox if there are any objects ('emails') newer
than uidv from 2.
4. anything found in 3. is processed, submitted to SOLR, committed.
5. interface submits search query Q to index, gets results
6. results are presented / returned to user

It strikes me that this may work ok in some situations but may not scale. I
would decouple the {find new documents / submit / commit } process from the
{ search / presentation} layer - SPECIALLY if you plan to have several
mailboxes in play now.

> So it returns the highest IMAP UID field (which is an always-ascending
> integer) for the given mailbox (you can ignore the uidvalidity). I can
> then add all messages with higher UIDs to Solr before doing the actual
> search.
> 
> When searching multiple mailboxes the above query would have to be sent
> to every mailbox separately. 

hmm...not sure what you mean by "query would have to be sent to every
MAILBOX" ... 

> That really doesn't seem like the best
> solution, especially when there are a lot of mailboxes. But I don't
> think Solr has a way to return "highest uid field for each
> box:"?

hmmm... maybe you can use facets on 'box' ... ? though you'd still have to
query for each box, i think...

> Is that above query even efficient for a single mailbox? 

i don't think so.

>I did consider
> using separate documents for storing the highest UID for each mailbox,
> but that causes annoying desynchronization possibilities. Especially
> because currently I can just keep sending documents to Solr without
> locking and let it drop duplicates automatically (should be rare). With
> per-mailbox highest-uid documents I can't really see a way to do this
> without locking or allowing duplicate fields to be added and later some
> garbage collection deleting all but the one highest value (annoyingly
> complex).

I have a feeling the issues arise from serialising the whole process (as I
described above... ). It makes more sense (to me)  to implement something
similar to DIH, where you load data as needed (even a 'delta query', which
would only return new data... I am not sure whether you could use DIH ( RSS
feed from IMAP store? )

> I could of course also keep track of what's indexed on Dovecot's side,
> but that could also lead to desynchronization issues and I'd like to
> avoid them.
> 
> I guess the ideal solution would be if it was somehow possible to create
> a SQL-like trigger that updates the per-mailbox highest-uid document
> whenever adding a new document with a higher UID value.

I am not sure how much effort you want to put into this...but I would think
that writing a lean app that periodically (for a period that makes sense for
your hardware and user's expectation... 5 minutes? 10?  1? ) crawls the IMAP
stores for UID, processes them and submits to SOLR, and keeps its own state
( dbm or sqlite ) may be a more flexible approach. Or, if dovecot support this,
a 'plugin / hook ' that sends a msg to your indexing app everytime a new
document is created.

I am interested to hear what you decide to go with, and why.

cheers,
B

_
{Beto|Norberto|Numard} Meijome

"All parts should go together without forcing. You must remember that the parts
you are reassembling were disassembled by you. Therefore, if you can't get them
together again, there must be a reason. By all means, do not use hammer." IBM
maintenance manual, 1975

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: [VOTE] Community Logo Preferences

2008-11-23 Thread Norberto Meijome
On Sun, 23 Nov 2008 11:59:50 -0500
Ryan McKinley <[EMAIL PROTECTED]> wrote:

> Please submit your preferences for the solr logo.

https://issues.apache.org/jira/secure/attachment/12394267/apache_solr_c_blue.jpg
https://issues.apache.org/jira/secure/attachment/12394263/apache_solr_a_blue.jpg
https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png
https://issues.apache.org/jira/secure/attachment/12394376/solr_sp.png
https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg

thanks!!
B

_
{Beto|Norberto|Numard} Meijome

"Tell a person you're the Metatron and they stare at you blankly. Mention 
something out of a Charleton Heston movie and suddenly everyone's a Theology 
scholar!"
   Dogma

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: How can i protect the SOLR Cores?

2008-11-20 Thread Norberto Meijome
On Wed, 19 Nov 2008 22:58:52 -0800 (PST)
RaghavPrabhu <[EMAIL PROTECTED]> wrote:

>  Im using multiple cores and all i need to do is,to make the each core in
> secure manner. If i am accessing the particular core via url,it should ask
> and validate the credentials say Username & Password for each core.

You should be able to handle this @ the servlet container level. What I did, 
using Jetty + starting from the example app, was :

1) modify web.xml (part of the sources of solr.war, which you'll have to 
rebuild)   to define the authentication constraints you want. 

[...]


  
   Default
/
  





  
AllowedQueries
/core1/select/*
/core2/select/*
/core3/select/*
  





Admin

/admin/*

/core1/admin/*
/core2/admin/*
/core3/admin/*

/_test_/*




Admin-role

FullAccess-role






RW

/core1/dataimport
/core2/dataimport
/core3/dataimport

/core1/update/*
/core2/update/*
/core3/update/*




RW-role

FullAccess-role





BASIC
SearchSvc


Admin-role


FullAccess-role


RW-role


[...]

2) in Jetty's jetty.xml (or in a context...i just used jetty.xml), define where 
to get the AUTH details from :
[...]

  


SearchSvc

/etc/searchsvc_access.properties




[...]


3) Read in jetty's documentation how to create the .properties file with the 
auth info...

I am not sure if this is the BEST way to do it ( i didn't have access to any 
stronger auth method than basic at the time), but it works exactly as intended.

b
_
{Beto|Norberto|Numard} Meijome

"I was born not knowing and have had only a little time to change that here and 
there." 
  Richard Feynman

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Use SOLR like the "MySQL LIKE"

2008-11-19 Thread Norberto Meijome
On Tue, 18 Nov 2008 14:26:02 +0100
"Aleksander M. Stensby" <[EMAIL PROTECTED]> wrote:

> Well, then I suggest you index the field in two different ways if you want  
> both possible ways of searching. One, where you treat the entire name as  
> one token (in lowercase) (then you can search for avera* and match on for  
> instance "average joe" etc.) And then another field where you tokenize on  
> whitespace for instance, if you want/need that possibility aswell. Look at  
> the solr copy fields and try it out, it works like a charm :)

You should also make extensive use of  analysis.jsp  to see how data in your
field (1) is tokenized, filtered and indexed, and how your search terms are
tokenized, filtered and matched against (1). 
Hint 1 : check all the checkboxes ;)
Hint 2: you don't need to reindex all your data, just enter test data in the
form and give it a go. You will of course have to tweak schema.xml and restart
your service when you do this.

good luck,
B
_
{Beto|Norberto|Numard} Meijome

"Intellectual: 'Someone who has been educated beyond his/her intelligence'"
   Arthur C. Clarke, from "3001, The Final Odyssey", Sources.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Solr Core Size limit

2008-11-12 Thread Norberto Meijome
On Tue, 11 Nov 2008 20:39:32 -0800 (PST)
Otis Gospodnetic <[EMAIL PROTECTED]> wrote:

> With Distributed Search you are limited to # of shards * Integer.MAX_VALUE.

yeah, makes sense. And i would suspect since this is PER INDEX , it applies to 
each core only ( so you could have n cores in m shards for n * m * 
integer.MAX_VALUE docs).


_
{Beto|Norberto|Numard} Meijome

"The more I see the less I know for sure." 
  John Lennon

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Solr Core Size limit

2008-11-12 Thread Norberto Meijome
On Tue, 11 Nov 2008 10:25:07 -0800 (PST)
Otis Gospodnetic <[EMAIL PROTECTED]> wrote:

> Doc ID gaps are zapped during segment merges and index optimization.
> 

thanks Otis :)
b
_
{Beto|Norberto|Numard} Meijome

"I didn't attend the funeral, but I sent a nice letter saying  I approved of 
it."
  Mark Twain

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Solr Core Size limit

2008-11-10 Thread Norberto Meijome
On Mon, 10 Nov 2008 10:24:47 -0800 (PST)
Otis Gospodnetic <[EMAIL PROTECTED]> wrote:

> I don't think there is a limit other than your hardware and the internal Doc
> ID which limits you to 2B docs on 32-bit machines.

Hi Otis,
just curious is this internal doc ID reused when an optimise happens? or 
gaps left and re-filled when 2B is reached ? 

cheers,
b

_
{Beto|Norberto|Numard} Meijome

"Whenever you find that you are on the side of the majority, it is time to 
reform."
   Mark Twain

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: How to use multicore feature in JBOSS

2008-11-05 Thread Norberto Meijome
On Tue, 4 Nov 2008 23:45:40 -0800 (PST)
con <[EMAIL PROTECTED]> wrote:

> But for the first question, I am still not clear.
> I think to use the multicore feature we should inform the server. In the
> Jetty server, we are starting the server using:   java
> -Dsolr.solr.home=multicore -jar start.jar
> Once the server is started I think it will take the parameters from
> multicore/solr.xml.
> 
> But I am confused on how and where to pass this argument to JBOSS. 

Con,
Sorry, i don't have a jboss available to test... what happens if you use the 
standard configuration ( with solr.xml @ the top level of your solr directory, 
NOT in multicore/ ) 

launch it, look @ the debug messages , see which cores are picked up (from the 
admin page ). 

FWIW, by having {solr_installation_directory}/solr.xml , I never had to tell 
jetty where solr.xml was.  IIRC, multicore/solr.xml is the layout in the 
example app , because the default config is 1-core only.

b

_
{Beto|Norberto|Numard} Meijome

"We must openly accept all ideologies and systems as  means of solving 
humanity's problems. One country, one nation, one ideology, one system is not 
sufficient."
   Dalai Lama.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: How to use multicore feature in JBOSS

2008-11-04 Thread Norberto Meijome
On Tue, 4 Nov 2008 09:55:38 -0800 (PST)
con <[EMAIL PROTECTED]> wrote:

> 1) Which all files do I need to edit to use the multicore feature?
> 2) Also, where can I specify the index directly so that we can point the
> indexed documents to a custom folder instead of jboss/bin?

Con, please check the wiki - the answers should be there 

(
 1) = solr.xml ( previously multicore.xml)
2) look in solrconfig.xml for each core
)
_
{Beto|Norberto|Numard} Meijome

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: DIH and rss feeds

2008-10-30 Thread Norberto Meijome
On Thu, 30 Oct 2008 20:46:16 -0700
"Lance Norskog" <[EMAIL PROTECTED]> wrote:

> Now: a few hours later there are a different 100 "lastest" documents. How do
> I add those to the index so I will have 200 documents?  'full-import' throws
> away the first 100. 'delta-import' is not implemented. What is the special
> trick here?  I'm using the Solr-1.3.0 release.
>  

Lance,

1) DIH has a "clean" parameter that, when set to true ( default, i think), will 
delete all existing docs in the index.

2) ensure your new documents have different values in your field defined as key 
( schema.xml) .

let us know how it goes,
B

_
{Beto|Norberto|Numard} Meijome

Lack of planning on your part does not constitute an emergency on ours.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Solr Searching on other fields which are not in query

2008-10-30 Thread Norberto Meijome
On Thu, 30 Oct 2008 15:50:58 -0300
"Jorge Solari" <[EMAIL PROTECTED]> wrote:

> 
> 
> in the schema file.

or use Dismax query handler.
b

_
{Beto|Norberto|Numard} Meijome

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: solr1.3 - testing language ?

2008-10-20 Thread Norberto Meijome
On Mon, 20 Oct 2008 08:16:50 -0700 (PDT)
sunnyfr <[EMAIL PROTECTED]> wrote:

> ok so straight by the admin part !

Hi Johanna - not sure what you mean by 'the admin part'. 

> it should work .. so it doesn't 

if you tell us what you did (what url you called) , what you expect to receive
back (sample of your indexed data) and what you get instead , we may be able to
offer better answers...

b

_
{Beto|Norberto|Numard} Meijome

Two things have come out of Berkeley, Unix and LSD.
It is uncertain which caused the other.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Advice on analysis/filtering?

2008-10-20 Thread Norberto Meijome
On Thu, 16 Oct 2008 16:09:17 +0200
Jarek Zgoda <[EMAIL PROTECTED]> wrote:

> They came to such expectations seeing Solr's own Spellcheck at work -  
> if it can suggest correct versions, it should be able to sanitize  
> broken words in documents and search them using sanitized input. For  
> me, this seemed reasonable request (of course, if this can be achieved  
> reasonably abusing solr's spellcheck component).

don't forget that the solr spellchecker finds its suggestions based on your
corpus. so if you don't have a correctly spelt version of wordA , you won't
receive back wordA as a 'spellchecked' version of that word. I think that's how
it works by default (which is all I've needed so far).
I *think* there is a way to use an external spellchecker (component or list) -
so you could have your full list of Polish words in a file, i guess

I agree playing with analysis.jsp is the best approach to solving these
problems ( tick all the boxes and see how the changes to your terms take place).

good luck - let us know what you come up with :)

B
_
{Beto|Norberto|Numard} Meijome

"You can discover what your enemy fears most by observing the means he uses to
frighten you." Eric Hoffer (1902 - 1983)

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: solr1.3 - testing language ?

2008-10-20 Thread Norberto Meijome
On Mon, 20 Oct 2008 06:25:09 -0700 (PDT)
sunnyfr <[EMAIL PROTECTED]> wrote:

> I implemented multi language search, but I didn't finished the website in
> PHP, how can I check it works properly?

maybe by sending to SOLR the queries you plan your PHP frontend to generate ? 

_
{Beto|Norberto|Numard} Meijome

"Always do right.  This will gratify some and astonish the rest."
  Mark Twain

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: query parsing issue + behavior as OR (solr 1.4-dev)

2008-10-20 Thread Norberto Meijome
On Mon, 20 Oct 2008 06:21:06 -0700 (PDT)
Sunil Sarje <[EMAIL PROTECTED]> wrote:

> I am working with nightly build of Oct 17, 2008  and found the issue that
> something wrong with LuceneQParserPlugin; It takes + as OR

Sunil, please do not hijack the thread :

http://en.wikipedia.org/wiki/Thread_hijacking

thanks,
B

_
{Beto|Norberto|Numard} Meijome

He could be a poster child for retroactive birth control.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Sorting performance

2008-10-20 Thread Norberto Meijome
On Mon, 20 Oct 2008 16:28:23 +0300
christophe <[EMAIL PROTECTED]> wrote:

> Hum. this mean I have to wait before I index new documents and avoid 
> indexing when they are created (I have about 50 000 new documents 
> created each day and I was planning to make those searchable ASAP).

you can always index + optimize out of band in a 'master' / RW server , and
then send the updated index to your slave (the one actually serving the
requests). 

This *will NOT* remove the need to refresh your cache, but it will remove any
delay introduced by commit/indexing + optimise.

> Too bad there is no way to have a centralized cache that can be shared 
> AND updated when new documents are created.

hmm not sure it makes sense like that... but maybe along the lines of having an
active cache that is used to serve queries, and new ones being prepared, and
then swapped when ready. 

Speaking of which (or not :P) , has anyone thought about / done any work on
using memcached for these internal solr caches? I guess it would make sense for
setups with several slaves ( or even a master updating memcached
too...)...though for a setup with shards it would be slightly more involved
(although it *could* be used to support several slaves per 'data shard' ).

All the best,
B
_
{Beto|Norberto|Numard} Meijome

RTFM and STFW before anything bad happens.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Synonym format not working

2008-10-20 Thread Norberto Meijome
On Mon, 20 Oct 2008 00:08:07 -0700 (PDT)
prerna07 <[EMAIL PROTECTED]> wrote:

> 
> 
> The issue with synonym arise when i have number in synonym defination:
> 
> ccc =>1,2 gives following result in debugQuery= true :
>  MultiPhraseQuery(all:" (1 ) (2 ccc )
> 3") 
>   all:" (1 ) (2 ccc ) 3" 
> 
> However fooaaa=> fooaaa, baraaa,bazaaa gives correct synonym results:
> 
>   all:fooaaa all:baraaa all:bazaaa 
>   all:fooaaa all:baraaa all:bazaaa 
> 
> Any pointers to solve the issue with numbers in synonyms?

Prerna,
in your first email you show your field type has :

[...]

[..]

generateNumberParts=1 will, AFAIK, generate a different token on a number. so
ccc1 will be indexed as "ccc", "1"  . If you use admin/analsys.jsp you can see
the step by step process taken by the tokenizer + filters for your data type -
you can then tweak it as necessary until you are happy with the results.

b
_
{Beto|Norberto|Numard} Meijome

Immediate success shouldn't be necessary as a motivation to do the right thing.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: search not working correctly

2008-10-20 Thread Norberto Meijome
On Mon, 20 Oct 2008 03:24:36 -0700 (PDT)
prerna07 <[EMAIL PROTECTED]> wrote:

> Yes, We want search on these incomplete words.

Look into the NGram token factory . works a treat - I don't think it's
explained a lot in the wiki, but has been discussed in this list in the past,
and you also have JavaDoc and the source itself.

FWIW, I had problems getting it to work properly with minNgram != maxNGram
- analysis.jsp shows a match, but it didn't work in the QH . It could
*definitely* have been myself or code @ the time I tested it (pre 1.3
release)... I'll test again to see if it is happening and log a bug if
needed.

B

_
{Beto|Norberto|Numard} Meijome

"There are two kinds of stupid people. One kind says,'This is old and therefore
good'. The other kind says, 'This is new, and therefore better.'"
 John Brunner, 'The Shockwave Rider'.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: SolrJ + HTTP caching

2008-10-15 Thread Norberto Meijome
On Wed, 15 Oct 2008 11:11:07 -0700
Matthew Runo <[EMAIL PROTECTED]> wrote:

> We've been using Varnish (http://varnish.projects.linpro.no/) in front  
> of our Solr servers, and have been seeing about a 70% hit rate for the  
> queries. We're using SolrJ, and have seen no bad effects of the cache.

FWIW : 
We also use Varnish in front of SOLR - we refresh the index daily, so we have a
fairly long TTL, but clear it at the end of the script which calls DIH.
The web app also caches rendered results (webpages :P)  in memcached.

B

_
{Beto|Norberto|Numard} Meijome

"Build a system that even a fool can use, and only a fool will want to use it."
   George Bernard Shaw

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Problem in using Unique key

2008-10-09 Thread Norberto Meijome
On Wed, 8 Oct 2008 03:45:20 -0700 (PDT)
con <[EMAIL PROTECTED]> wrote:

> But in that case, while doing a full-import I am getting the following
> error:
> 
> org.apache.solr.common.SolrException: QueryElevationComponent requires the
> schema to have a uniqueKeyField 

Con, if you don't use the Query Elevation component, you can disable it in 
solrconfig.xml . Not sure why uniqueField is needed for it though.

b

_
{Beto|Norberto|Numard} Meijome

"First they ignore you, then they laugh at you, then they fight you, then you 
win."
  Mahatma Gandhi.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: dismax and long phrases

2008-10-09 Thread Norberto Meijome
On Tue, 07 Oct 2008 09:27:30 -0700
Jon Drukman <[EMAIL PROTECTED]> wrote:

> > Yep, you can "fake" it by only using fieldsets (qf) that have a 
> > consistent set of stopwords.  
> 
> does that mean changing the query or changing the schema?

Jon,
- you change schema.xml to define which type each field is. The fieldType says 
whether you have stopwords or not.
- you change solrconfig.xml to define which fields will dismax query on.

i dont think you should have to change your query.

b

_
{Beto|Norberto|Numard} Meijome

"Mix a little foolishness with your serious plans;
it's lovely to be silly at the right moment."
   Horace

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Dismax , "query phrases"

2008-10-01 Thread Norberto Meijome
On Tue, 30 Sep 2008 11:43:57 -0700 (PDT)
Chris Hostetter <[EMAIL PROTECTED]> wrote:

> 
> : That's why I was wondering how Dismax breaks it all apart. It makes
> sense...I : suppose what I'd like to have is a way to tell dismax which
> fields NOT to : tokenize the input for. For these fields, it would pass the
> full q instead of : each part of it. Does this make sense? would it be useful
> at all? 
> 
> the *goal* makes sense, but the implementation would be ... problematic.
> 
> you have to remember the DisMax parser's whole way of working is to make 
> each "chunk" of input match against any qf field, and find the highest 
> scoring field for each chunk, with this input...
> 
>   q = some phase  & qf = a b c
> 
> ...you get...
> 
>   ( (a:some | b:some | c:some) (a:phrase | b:phrase | c:phrase) )
> 
> ...even if dismax could tell that "c" was a field that should only support 
> exact matches,

thanks Hoss,

it would by a configuration option. 

> how would it fit c:"some phrase" into that structure?

does this make sense?

 ( (a:some | b:some ) (a:phrase | b:phrase) ( c:"some phrase") )


> I've already kinda forgotten how this thread started ... 

trying to get *exact* matches to always score higher using dismax - keeping in
mind that I have multiple exact fields, with different boosts...

> but would it make 
> sense to just use your "exact" fields in the pf, and have inexact versions 
> of them in the qf?  then docs that match your input exactly should score 
> at the top, but less exact matches will also still match.

aha! right, i think that makes sense...i obviously haven't got my head properly
around all the different functionality of dismax.

I will try it when I'm back @ work... right now, i seem to have solved the
problem by using shingles -the fields are artists, song & albumtitles ,so high
matching on shingles is quite approximate to exact matching - except that I had
to remove stopwords, so that impacts on performance.

Thanks again :)
B
_
{Beto|Norberto|Numard} Meijome

Which is worse: ignorance or apathy?
Don't know. Don't care.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Create Indexes

2008-09-29 Thread Norberto Meijome
On Fri, 26 Sep 2008 18:58:14 +0530
Dinesh Gupta <[EMAIL PROTECTED]> wrote:

> Please tell me where to upload the files.

anywhere you have access to... your own website, somewhere anyone on the list 
can access the files >you< want to share to address your problems :)
b
_
{Beto|Norberto|Numard} Meijome

"Science Fiction...the only genuine consciousness expanding drug"
  Arthur C. Clarke

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Dismax , "query phrases"

2008-09-28 Thread Norberto Meijome
On Fri, 26 Sep 2008 10:42:42 -0700 (PDT)
Chris Hostetter <[EMAIL PROTECTED]> wrote:

> :  : class="solr.KeywordTokenizerFactory" /> 

Re: How to select one entity at a time?

2008-09-26 Thread Norberto Meijome
On Fri, 26 Sep 2008 02:35:18 -0700 (PDT)
con <[EMAIL PROTECTED]> wrote:

> What you meant is correct only. Please excuse for that I am new to solr. :-(

Con, have a read here :

http://www.ibm.com/developerworks/java/library/j-solr1/

it helped me pick up the basics a while back. it refers to 1.2, but the core 
concepts are relevant to 1.3 too.

b
_
{Beto|Norberto|Numard} Meijome

Hildebrant's Principle:
If you don't know where you are going,
any road will get you there.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: How to select one entity at a time?

2008-09-26 Thread Norberto Meijome
On Fri, 26 Sep 2008 02:35:18 -0700 (PDT)
con <[EMAIL PROTECTED]> wrote:

> What you meant is correct only. Please excuse for that I am new to solr. :-(

hi Con,
nothing to be excused for..but you may want to read the wiki , as it provides
quite a lot of information that should answer your questions. DIH is great, but
I wouldn't go near it until you understand how to create your own schema.xml
and solrconfig.xml .

http://wiki.apache.org/solr/FrontPage is the wiki

( everyone else ... is there a guide on getting started on SOLR ? step by step,
taking the example and changing it for your own use?  )

> I want to index all the query results. (I think this will be done by the
> data-config.xml) 

hmm...terminology :-) 
you index documents (similar to records in a database).

when you send a query to Solr, you will get results if your query 

> Now while accessing this indexed data, i need this filtering. ie. Either
> user or manager.
> I tried your suggestion:
> http://localhost:8983/solr/select/?q=user:bob&version=2.2&start=0&rows=10&indent=on&wt=json

the url LOOKS ok. do you have any document in your index with field user
containing 'bob; ? 

try this to get all results ( xml format, first 3 results only...

http://localhost:8983/solr/select/?q=*:*&rows=3

then, find a field with a value , then search for that value and see if you get
that document back - it should work...(with lots of caveats, yes)..

If you send us the result we can help u understand better why it isn't
working as you intend..
b
_
{Beto|Norberto|Numard} Meijome

"First they ignore you, then they laugh at you, then they fight you, then you
win." Mahatma Gandhi.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Create Indexes

2008-09-26 Thread Norberto Meijome
On Fri, 26 Sep 2008 16:32:05 +0530
Dinesh Gupta <[EMAIL PROTECTED]> wrote:

> Is it OK to create whole index by Solr web-app?
> If not than ,How can I create index?
> 
> I have attached some file that create index now.
> 

Dinesh,
you sent the same email 2 1/2 hours ago. sending it again will not give you 
more answers.

If you have a file you want to share, you should upload it to a webserver and 
share the URL - most mailing lists drop any file attachments.


_
{Beto|Norberto|Numard} Meijome

Never take Life too seriously, no one gets out alive anyway.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: How to select one entity at a time?

2008-09-26 Thread Norberto Meijome
On Fri, 26 Sep 2008 00:46:07 -0700 (PDT)
con <[EMAIL PROTECTED]> wrote:

> To be more specific:
> I have the data-config.xml just like: 
> 
>   
>   
>
>   
>   
>   
>   
> 
>   
>   
>   
> 

Con, I may be confused here...are you asking how to load only data from your 
USERS SQL table into SOLR, or how to search in your SOLR index for data about 
'USERS'.

data-config.xml is only relevant for the Data Import Handler...but your 
following question:

> 
> I have 3 search conditions. when the client wants to search all the users,
> only the entity, 'user' must be executed. And if he wants to search all
> managers, the entity, 'manager' must be executed.
> 
> How can i accomplish this through url?

*seems* to indicate you want to search on this .

If you want to search on a particular field from your SOLR schema, DIH is not 
involved. If you use the standard QH, you say ?q=user:Bob 

If I misunderstood your question, please explain...

cheers,
b
_
{Beto|Norberto|Numard} Meijome

"Everything is interesting if you go into it deeply enough"
  Richard Feynman

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Shingles , min size?

2008-09-24 Thread Norberto Meijome
hi guys,
I may have missed it ,but is it possible to tell the solr.ShingleFilterFactory 
the minimum number of grams to generate per shingle?  Similar to 
NGramTokenizerFactory's minGramSize="3" maxGramSize="3" 

thanks!
B
_
{Beto|Norberto|Numard} Meijome

"Ask not what's inside your head, but what your head's inside of."
   J. J. Gibson

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Dismax , "query phrases"

2008-09-24 Thread Norberto Meijome
On Wed, 24 Sep 2008 08:34:57 -0700 (PDT)
Otis Gospodnetic <[EMAIL PROTECTED]> wrote:

> What happens if you change ps from 100 to 1 and comment out that ord function?
> 
> 

Otis, I think what I am after is what Hoss described in his last paragraph in 
his reply to your email last year :

http://www.nabble.com/DisMax-and-REQUIRED-OR-REQUIRED-query-rewrite-td13395349.html#a13395349

ie, I want everything that Dismax does, BUT , on certain fields, I want it to 
search for all the terms in my q= , as a phrase.

I am thinking of modifying dismax to allow this to be passed as a configuration 
( eg, fieldsSearchExact=artist_exact, title_exact), but if I can avoid it 
that'd be great :).

any other ideas, anyone??

thanks!
B
_
{Beto|Norberto|Numard} Meijome

"Nature doesn't care how smart you are. You can still be wrong."
  Richard Feynman

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Defining custom schema

2008-09-24 Thread Norberto Meijome
On Wed, 24 Sep 2008 04:42:42 -0700 (PDT)
con <[EMAIL PROTECTED]> wrote:

> In the table we will be having various column names like CUSTOMER_NAME,
> CUSTOMER_PHONE etc. If we use the default schema.xml, we have to map these
> values to some the default values like cat, features etc. this will cause
> difficulty when we need to process the output.
> Instead can we set the column name and column type dynamically to the
> schema.xml so that the output will show something like, 
> markrmiller

Con,
the "default" schema you refer to is from the example application. You should 
definitely edit it and define your own fields.

b

_
{Beto|Norberto|Numard} Meijome

"In my opinion, we don't devote nearly enough scientific research to finding a 
cure for jerks." 
  Calvin

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: help required: how to design a large scale solr system

2008-09-24 Thread Norberto Meijome
On Wed, 24 Sep 2008 11:45:34 -0400
Mark Miller <[EMAIL PROTECTED]> wrote:

> Nothing to stop you from breaking up the tsv/csv files into multiple 
> tsv/csv files.

Absolutely agreeing with you ... in one system where I implemented  SOLR, I
have a process run through the file system and lazily pick up new files as they
come in.. if something breaks (and it will,as the files are user generated in
many cases...), report it / leave it for later...move on. 

b

_
{Beto|Norberto|Numard} Meijome

I used to hate weddings; all the Grandmas would poke me and
say, "You're next sonny!" They stopped doing that when i
started to do it to them at funerals.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Dismax , "query phrases"

2008-09-24 Thread Norberto Meijome
On Wed, 24 Sep 2008 08:34:57 -0700 (PDT)
Otis Gospodnetic <[EMAIL PROTECTED]> wrote:

> What happens if you change ps from 100 to 1 and comment out that ord function?
> 
> 
> Otis

Hi Otis,

no luck - without " " :
smashing pumpkins
smashing pumpkins

+((DisjunctionMaxQuery((genre:smash^0.2 | title_ngram2:"sm ma as sh hi in 
ng"^0.1 | artist_ngram2:"sm ma as sh hi in ng"^0.1 | title_ngram3:"sma mas ash 
shi hin ing"^4.5 | title:smash^6.0 | artist_ngram3:"sma mas ash shi hin 
ing"^3.5 | artist:smash^4.0 | artist_exact:smashing^100.0 | 
title_exact:smashing^200.0)~0.01) DisjunctionMaxQuery((genre:pumpkin^0.2 | 
title_ngram2:"pu um mp pk ki in ns"^0.1 | artist_ngram2:"pu um mp pk ki in 
ns"^0.1 | title_ngram3:"pum ump mpk pki kin ins"^4.5 | title:pumpkin^6.0 | 
artist_ngram3:"pum ump mpk pki kin ins"^3.5 | artist:pumpkin^4.0 | 
artist_exact:pumpkins^100.0 | title_exact:pumpkins^200.0)~0.01))~2) 
DisjunctionMaxQuery((title:"smash pumpkin"~1^2.0 | artist:"smash 
pumpkin"~1^0.8)~0.01)

___

+(((genre:smash^0.2 | title_ngram2:"sm ma as sh hi in ng"^0.1 | 
artist_ngram2:"sm ma as sh hi in ng"^0.1 | title_ngram3:"sma mas ash shi hin 
ing"^4.5 | title:smash^6.0 | artist_ngram3:"sma mas ash shi hin ing"^3.5 | 
artist:smash^4.0 | artist_exact:smashing^100.0 | 
title_exact:smashing^200.0)~0.01 (genre:pumpkin^0.2 | title_ngram2:"pu um mp pk 
ki in ns"^0.1 | artist_ngram2:"pu um mp pk ki in ns"^0.1 | title_ngram3:"pum 
ump mpk pki kin ins"^4.5 | title:pumpkin^6.0 | artist_ngram3:"pum ump mpk pki 
kin ins"^3.5 | artist:pumpkin^4.0 | artist_exact:pumpkins^100.0 | 
title_exact:pumpkins^200.0)~0.01)~2) (title:"smash pumpkin"~1^2.0 | 
artist:"smash pumpkin"~1^0.8)~0.01

Still OK if I include " "...

I am trying on another setup, with same data, to work with shingles rather than 
on 'exact' ... dismax seems to handle it much better...but it may be that I 
haven't added to that config all the ngram3 &ngram3 fields for substring 
matching...

the resulting params were :

2<-1 5<-2 6<90%
true
true
0.01
store_albums.xsl
___

title_exact^200.0 artist_exact^100.0 title^6.0 title_ngram3^4.5 artist^4.0 
artist_ngram3^3.5 title_ngram2^0.1 artist_ngram2^0.1 genre^0.2

*:*
true
xml
dismax
10
true
title^2.0 artist^0.8
all
*,score
1
1
true
all
xml
smashing pumpkins

thanks,
B
_
{Beto|Norberto|Numard} Meijome

"Don't remember what you can infer."
   Harry Tennant

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: help required: how to design a large scale solr system

2008-09-24 Thread Norberto Meijome
On Wed, 24 Sep 2008 07:46:57 -0400
Mark Miller <[EMAIL PROTECTED]> wrote:

> Yes. You will def see a speed increasing by avoiding http (especially 
> doc at a time http) and using the direct csv loader.
> 
> http://wiki.apache.org/solr/UpdateCSV

and the obvious reason that if, for whatever reason, something breaks while you
are indexing directly from memory, can you restart the import? it may be just
easier to keep in disk and keep track of where you are up to adding to the
index...
B
_
{Beto|Norberto|Numard} Meijome

Sysadmins can't be sued for malpractice, but surgeons don't have to deal with
patients who install new versions of their own innards.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Dismax , "query phrases"

2008-09-24 Thread Norberto Meijome
Hello,
I've seen references to this in the list, but not completely explained...my
apologies if this is FAQ (and for the length of the email).

I am using dismax across a number of fields on an index with data about music
albums & songs - the fields are quite full of stop words. I am trying to boost
'exact' matches - ie, if you search for 'The Doors', those documents with 'The
Doors' should be first. I've created the following fieldType and I use it for 
fields artist_exact and title_exact:





 

 




I then give artist_exact and title_exact pretty high boosts ( title_exact^200.0
artist_exact^100.0 )

Now, when I search with ?q=the doors , all the terms in my q= aren't used
together to build the dismaxQuery , so I never get a match on the _exact fields:

(there are a few other fields involved...pretty self explanatory)

the doors
the doors
___

+((DisjunctionMaxQuery((title_ngram2:"th he"^0.1 | artist_ngram2:"th he"^0.1 |
title_ngram3:the^4.5 | artist_ngram3:the^3.5 | artist_exact:the^100.0 |
title_exact:the^200.0)~0.01) DisjunctionMaxQuery((genre:door^0.2 |
title_ngram2:"do oo or rs"^0.1 | artist_ngram2:"do oo or rs"^0.1 |
title_ngram3:"doo oor ors"^4.5 | title:door^6.0 | artist_ngram3:"doo oor
ors"^3.5 | artist:door^4.0 | artist_exact:doors^100.0 |
title_exact:doors^200.0)~0.01))~2) DisjunctionMaxQuery((title:door^2.0 |
artist:door^0.8)~0.01) FunctionQuery((ord(release_year))^0.5) 

 +(((title_ngram2:"th he"^0.1 |
artist_ngram2:"th he"^0.1 | title_ngram3:the^4.5 | artist_ngram3:the^3.5 |
artist_exact:the^100.0 | title_exact:the^200.0)~0.01 (genre:door^0.2 |
title_ngram2:"do oo or rs"^0.1 | artist_ngram2:"do oo or rs"^0.1 |
title_ngram3:"doo oor ors"^4.5 | title:door^6.0 | artist_ngram3:"doo oor
ors"^3.5 | artist:door^4.0 | artist_exact:doors^100.0 |
title_exact:doors^200.0)~0.01)~2) (title:door^2.0 | artist:door^0.8)~0.01
(ord(release_year))^0.5


but, if I build my search as ?q="the doors" 


+DisjunctionMaxQuery((genre:door^0.2 | title_ngram2:"th he e   d do oo or
rs"^0.1 | artist_ngram2:"th he e   d do oo or rs"^0.1 | title_ngram3:"the he  e
d  do doo oor ors"^4.5 | title:door^6.0 | artist_ngram3:"the he  e d  do doo
oor ors"^3.5 | artist:door^4.0 | artist_exact:the doors^100.0 | title_exact:the
doors^200.0)~0.01) DisjunctionMaxQuery((title:door^2.0 | artist:door^0.8)~0.01)
FunctionQuery((ord(release_year))^0.5) 

 +(genre:door^0.2 | title_ngram2:"th he e   d
do oo or rs"^0.1 | artist_ngram2:"th he e   d do oo or rs"^0.1 |
title_ngram3:"the he  e d  do doo oor ors"^4.5 | title:door^6.0 |
artist_ngram3:"the he  e d  do doo oor ors"^3.5 | artist:door^4.0 |
artist_exact:the doors^100.0 | title_exact:the doors^200.0)~0.01
(title:door^2.0 | artist:door^0.8)~0.01 (ord(release_year))^0.5

I've tried with other queries that don't include stopwords (smashing pumpkins,
for example), and in all cases, if I don't use " ", only the LAST word is used
with my _exact fields ( tried with 1, 2 and 3 words, always the same against my
_exact fields..)

What is the reason for this behaviour? 

my full dismax config is :

2<-1 5<-2 6<90%
true
true
0.01

title_exact^200.0 artist_exact^100.0 title^6.0 title_ngram3^4.5 artist^4.0
artist_ngram3^3.5 title_ngram2^0.1 artist_ngram2^0.1 genre^0.2 
*:*
true
dismax
true
10
title^2.0 artist^0.8
all
*,score
ord(release_year)^0.5
1
100


TIA!
B
_
{Beto|Norberto|Numard} Meijome

"Never offend people with style when you can offend them with substance."
  Sam Brown

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Using Shingles to Increase Phrase Search Performance

2008-09-24 Thread Norberto Meijome
On Sat, 16 Aug 2008 15:39:44 -0700
"Chris Harris" <[EMAIL PROTECTED]> wrote:

[...]
> So finally I modified the Lucene ShingleFilter class to add an
> "outputUnigramIfNoNgram option". Basically, if you set that option,
> and also set outputUnigrams=false, then the filter will tokenize just
> as in Exhibit B, except that if the query is only one word long, it
> will return a corresponding single token, rather than zero tokens. In
> other words,
> 
> [Exhibit C]
> "please" ->
>   "please"
> 
> Things were still zippy. And, so far, I think I have seriously
> improved my phrase search performance without ruining anything.

hi Chris,
 is this change part of 1.3 ? 

I've tried 









but analysis.jsp shows no tokens generated when there is only 1 word. 

thanks!
B

_
{Beto|Norberto|Numard} Meijome

 I sense much NT in you.
 NT leads to Bluescreen.
 Bluescreen leads to downtime.
 Downtime leads to suffering.
 NT is the path to the darkside.
 Powerful Unix is.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Any way to extract most used keywords from an index (or a random set)

2008-09-22 Thread Norberto Meijome
On Mon, 22 Sep 2008 15:46:54 +0530
"Jacob Singh" <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> I'm trying to write a testing suite to gauge the performance of solr
> searches.  To do so, I'd like to be able to find out what keywords
> will get me search results.  Is there anyway to programaticaly do this
> with luke?  I'm trying to figure out what all it exposes, but I'm not
> seeing this.
> 

Hi Jacob,
are you after something that the following URLs don't provide ? 

http://host/solr/core/admin/luke?wt=xslt&tr=luke.xsl 

but I actually prefer the schema browser ( 1.3 ) to see the top n terms per 
field...

b
_
{Beto|Norberto|Numard} Meijome

If it's there, and you can see it, it's real.
If it's not there, and you can see it, it's virtual.
If it's there, and you can't see it, it's transparent.
If it's not there, and you can't see it, you erased it.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Special character matching 'x' ?

2008-09-18 Thread Norberto Meijome
On Thu, 18 Sep 2008 10:53:39 +0530
"Sanjay Suri" <[EMAIL PROTECTED]> wrote:

> One of my field values has  the name "R__ikk__nen"  which contains a special
> characters.
> 
> Strangely, as I see it anyway, it matches on the search query 'x' ?
> 
> Can someone explain or point me to the solution/documentation?

hi Sanjay,
Akshay should have given you an answer for this. In a more general way, if you
want to know WHY something is matching the way it is, run the query with
debugQuery=true . There are a few pages in the wiki which explain other
debugging techniques.

b
_
{Beto|Norberto|Numard} Meijome

"Ask not what's inside your head, but what your head's inside of."
   J. J. Gibson

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: about boost weight

2008-09-14 Thread Norberto Meijome
On Sat, 13 Sep 2008 16:17:12 +
zzh <[EMAIL PROTECTED]> wrote:

>I think this is a stupid method, because the search conditions is too
> long, and the search efficiency will be low, we hope you can help me to solve
> this problem.

Hi,
IMHO,a long set of conditions doesn't make it stupid. You may not be going the
best way about it though. You may find
http://wiki.apache.org/solr/DisMaxRequestHandler an interesting and useful
read :)

B
_
{Beto|Norberto|Numard} Meijome

"Quality is never an accident, it is always the result of intelligent effort."
  John Ruskin  (1819-1900)

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Regarding Indexing

2008-08-29 Thread Norberto Meijome
On Fri, 29 Aug 2008 02:37:10 -0700 (PDT)
sanraj25 <[EMAIL PROTECTED]> wrote:

> I want to store two independent datas in solr index. so I decided to create
> two index.But that's not possible.so  i go for multicore concept in solr
> .can u give me step by step procedure to create multicore in solr

Hi,
without specific questions, i doubt myself or others can give you any other
information than the documentation, which can be found at :

http://wiki.apache.org/solr/CoreAdmin

Please make sure you are using (a recent version of ) 1.3.

B
_
{Beto|Norberto|Numard} Meijome

Your reasoning is excellent -- it's only your basic assumptions that are wrong.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Regarding Indexing

2008-08-29 Thread Norberto Meijome
On Fri, 29 Aug 2008 00:31:13 -0700 (PDT)
sanraj25 <[EMAIL PROTECTED]> wrote:

> But still i cant maintain two index.
> please help me how to create two cores in solr

What specific problem do you have ?
B

_
{Beto|Norberto|Numard} Meijome

"Always listen to experts.  They'll tell you what can't be done, and why.  
Then do it."
  Robert A. Heinlein

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Storing two different files

2008-08-28 Thread Norberto Meijome
On Thu, 28 Aug 2008 02:01:05 -0700 (PDT)
sanraj25 <[EMAIL PROTECTED]> wrote:

> I want to  index two different files in solr.(for ex)  I want to store
> two tables like, job_post and job_profile in solr. But now both are stored
> in same place in solr.when i get data from job_post, data come from
> job_profile also.So i want to maintain the data of job_post and job_profile
> separately.

hi :)
you need to have 2 separate schemas, and therefore 2 separate indexes. You
should read about MultiCore in the wiki.

B

_
{Beto|Norberto|Numard} Meijome

"Unix is very simple, but it takes a genius to understand the simplicity."
   Dennis Ritchie

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Question about search suggestion

2008-08-26 Thread Norberto Meijome
On Tue, 26 Aug 2008 15:15:21 +0300
Aleksey Gogolev <[EMAIL PROTECTED]> wrote:

> 
> Hello.
> 
> I'm new to solr and I need to make a search suggest (like google
> suggestions).
> 

Hi Aleksey,
please search the archives of this list for subjects containing 'autocomplete'
or 'auto-suggest'. that should give you a few ideas and starting points.

best,
B

_
{Beto|Norberto|Numard} Meijome

"The more I see the less I know for sure." 
  John Lennon

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: dataimporthandler and mysql connector jar

2008-08-25 Thread Norberto Meijome
On Mon, 25 Aug 2008 17:11:47 +0200
Walter Ferrara <[EMAIL PROTECTED]> wrote:

> Launching a multicore solr with dataimporthandler using a mysql driver,
> (driver="com.mysql.jdbc.Driver") works fine if the mysql connector jar
> (mysql-connector-java-5.0.7-bin.jar) is in the classpath, either jdk
> classpath or inside the solr.war lib dir.
> While putting the mysql-connector-java-5.0.7-bin.jar in core0/lib
> directory, or in the multicore shared lib dir (specified in sharedLib
> attribute in solr.xml) result in exception, even if the jar is correctly
> loaded by the classloader:


Hi Walter,
As at nightly build of August 19th, the DIH failing to connect to the data
source on SOLR's startup does *not* kill SOLR anymore. I haven't tested
yesterday's ...it could be a regression bug, but i doubt it - the error used to
be different to yours  (about connectivity, not failure in document).

for what is worth,i only have 1 copy of the jdbc jar (MS SQL in my case), in
the SOLR's lib directory, used by several cores's own DIH. You can check if
it's picked up by SOLR's classpath in the Java Info page under admin/

You may also want to try with a valid but empty document definition in
data-config.xml to rule out syntax issues.

B
_
{Beto|Norberto|Numard} Meijome

"Any society that would give up a little liberty to gain a little security will
deserve neither and lose both." Benjamin Franklin

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: "Multicore" and snapshooter / snappuller

2008-08-25 Thread Norberto Meijome
On Fri, 22 Aug 2008 12:21:53 -0700
"Lance Norskog" <[EMAIL PROTECTED]> wrote:

>  Apparently the ZFS (Silicon Graphics
> originally) is great for really huge files. 

hi Lance,
You may be  confusing Sun's ZFS with SGI's XFS. The OP referred, i think, to 
ZFS.

B

_
{Beto|Norberto|Numard} Meijome

"The greatest dangers to liberty lurk in insidious encroachment by men of zeal, 
well-meaning but without understanding."
   Justice Louis D. Brandeis

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Querying Question

2008-08-21 Thread Norberto Meijome
On Thu, 21 Aug 2008 18:09:11 -0700
"Jake Conk" <[EMAIL PROTECTED]> wrote:

> I thought if I used  to copy my string field to a text
> field then I can search for words within it and not limited to the
> entire content. Did I misunderstand that?

but you need to search on the fields that are defined as fieldType=text...it 
seems you are searching on the string fields.

B

_
{Beto|Norberto|Numard} Meijome

"He has the attention span of a lightning bolt."
  Robert Redford

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: hello, a question about solr.

2008-08-20 Thread Norberto Meijome
On Wed, 20 Aug 2008 10:58:50 -0300
"Alexander Ramos Jardim" <[EMAIL PROTECTED]> wrote:

> A tiny but really explanation can be found here
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters


thanks Alexander - indeed, quite short, and focused on shingles ... which , if 
I understand correctly, are groups of terms of n size... the ngramtokizer 
creates tokens of n-characters from your input.

Searching for ngram or n-gram in the archives should bring more relevant 
information up, which isnt in the wiki yet.

B

_
{Beto|Norberto|Numard} Meijome

"All that is necessary for the triumph of evil is that good men do nothing."
  Edmund Burke

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Clarification on facets

2008-08-18 Thread Norberto Meijome
On Tue, 19 Aug 2008 10:18:12 +1200
"Gene Campbell" <[EMAIL PROTECTED]> wrote:

> Is this interpreted as meaning, there are 10 documents that will match
> with 'car' in the title, and likewise 6 'boat' and 2 'bike'?

Correct.

> If so, is there any way to get counts for the *number times* a value
> is found in a document.  I'm looking for a way to determine the number
> of times 'car' is repeated in the title, for example

Not sure - i would suggest that a field with a term repeated several times 
would receive a higher score when searching for that term, but not sure how you 
could get the information you seek...maybe with the Luke handler ? ( but on a 
per-document basis...slow... ? )

B
_
{Beto|Norberto|Numard} Meijome

Computers are like air conditioners; they can't do their job properly if you 
open windows.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: .wsdl for example....

2008-08-18 Thread Norberto Meijome
On Tue, 19 Aug 2008 11:23:48 +1000
Norberto Meijome <[EMAIL PROTECTED]> wrote:

> On Mon, 18 Aug 2008 19:08:24 -0300
> "Alexander Ramos Jardim" <[EMAIL PROTECTED]> wrote:
> 
> > Do you wanna a full web service for SOLR example? How a .wsdl will help you?
> > Why don't you use the HTTP interface SOLR provides?
> > 
> > Anyways, if you need to develop a web service (SOAP compliant) to access
> > SOLR, just remember to use an embedded core on your webservice.
> 
> On Mon, 18 Aug 2008 15:37:24 -0400
> Erik Hatcher <[EMAIL PROTECTED]> wrote:
> 
> > WSDL?   surely you jest.
> > 
> > Erik
> 
> :D I obviously said something terribly stupid, oh well, not the first time 
> and most likely wont be the last one either.
> 
> Anyway, the reason for my asking is : 
>  - I've put together a SOLR search service with a few cores. Nothing fancy, 
> it works great as is.
>  -  the .NET developer I am working with on this  asked for a .wsdl (or 
> .asmx) file to import into Visual Studio ... yes, he can access the service 
> directly, but he seems to prefer a more 'well defined' interface (haven't 
> really decided whether it is worth the effort, but that is another question 
> altogether)
> 
> The way I see it, SOLR is a  RESTful service. I am not looking into wrapping 
> the whole thing behind SOAP ( I actually much prefer REST than SOAP, but that 
> is entering into quasi-religious grounds...) - which should be able to be 
> defined with a .wsdl ( v 1.1 should suffice as only GET + POST are supported 
> in SOLR anyway).
> 
> Am I missing anything here ?
> 
> thanks in advance for your time + thoughts ,
> B

To be clear, i don't suggest we should have a .wsdl for example, simply asking 
if there would be any use in having one.

but given the responses I got, I'm curious now to understand what I have gotten 
wrong :)

Best,
B
_
{Beto|Norberto|Numard} Meijome

 I sense much NT in you.
 NT leads to Bluescreen.
 Bluescreen leads to downtime.
 Downtime leads to suffering.
 NT is the path to the darkside.
 Powerful Unix is.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: .wsdl for example....

2008-08-18 Thread Norberto Meijome
On Mon, 18 Aug 2008 19:08:24 -0300
"Alexander Ramos Jardim" <[EMAIL PROTECTED]> wrote:

> Do you wanna a full web service for SOLR example? How a .wsdl will help you?
> Why don't you use the HTTP interface SOLR provides?
> 
> Anyways, if you need to develop a web service (SOAP compliant) to access
> SOLR, just remember to use an embedded core on your webservice.

On Mon, 18 Aug 2008 15:37:24 -0400
Erik Hatcher <[EMAIL PROTECTED]> wrote:

> WSDL?   surely you jest.
> 
>   Erik

:D I obviously said something terribly stupid, oh well, not the first time and 
most likely wont be the last one either.

Anyway, the reason for my asking is : 
 - I've put together a SOLR search service with a few cores. Nothing fancy, it 
works great as is.
 -  the .NET developer I am working with on this  asked for a .wsdl (or .asmx) 
file to import into Visual Studio ... yes, he can access the service directly, 
but he seems to prefer a more 'well defined' interface (haven't really decided 
whether it is worth the effort, but that is another question altogether)

The way I see it, SOLR is a  RESTful service. I am not looking into wrapping 
the whole thing behind SOAP ( I actually much prefer REST than SOAP, but that 
is entering into quasi-religious grounds...) - which should be able to be 
defined with a .wsdl ( v 1.1 should suffice as only GET + POST are supported in 
SOLR anyway).

Am I missing anything here ?

thanks in advance for your time + thoughts ,
B
_
{Beto|Norberto|Numard} Meijome

"He has no enemies, but is intensely disliked by his friends."
  Oscar Wilde

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: hello, a question about solr.

2008-08-18 Thread Norberto Meijome
On Mon, 18 Aug 2008 23:07:19 +0800
"finy finy" <[EMAIL PROTECTED]> wrote:

> because i use chinese character, for example "ibm___"
> solr will parse it into a term "ibm" and a phraze "_ __"
> can i use solr to query with a term "ibm" and a term "_"  and a term 
> "__"?

Hi finy,
you should look into n-gram tokenizers. Not sure if it is documented in the 
wiki, but it has been discussed in the mailing list quite a few times.

in short, an n-gram tokenizer breaks your input into blocks of characters of 
size n , which are then used to compare in the index. I think for Chinese , 
bi-gram is the favoured approach.

good luck,
B
_
{Beto|Norberto|Numard} Meijome

I used to hate weddings; all the Grandmas would poke me and
say, "You're next sonny!" They stopped doing that when i
started to do it to them at funerals.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


.wsdl for example....

2008-08-18 Thread Norberto Meijome
hi :)

does anyone have a .wsdl definition for the example bundled with SOLR? 

if nobody has it, would it be useful to have one ?

cheers,
B
_
{Beto|Norberto|Numard} Meijome

Intelligence: Finding an error in a Knuth text.
Stupidity: Cashing that $2.56 check you got.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: hello, a question about solr.

2008-08-18 Thread Norberto Meijome
On Mon, 18 Aug 2008 15:33:02 +0800
"finy finy" <[EMAIL PROTECTED]> wrote:

> the name field is text,which is analysed, i use the query
> "name:ibmT63notebook"

why do you search with no spaces? is this free text entered by a user, or is it 
part of a link which you control ?

PS: please dont top-post

_
{Beto|Norberto|Numard} Meijome

Commitment is active, not passive. Commitment is doing whatever you can to 
bring about the desired result. Anything less is half-hearted.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: DIH - commit / optimize

2008-08-17 Thread Norberto Meijome
On Mon, 18 Aug 2008 09:34:56 +0530
"Shalin Shekhar Mangar" <[EMAIL PROTECTED]> wrote:

> Actually we have commit and optimize as separate request parameters
> defaulting to true for both full-import and delta-import. You can add a
> request parameter optimize=false for delta-import if you want to commit but
> not to optimize the index.

ah , now it makes perfect sense :) sorry, i should have checked the src myself.

thanks so much again :)
B

_
{Beto|Norberto|Numard} Meijome

What you are afraid to do is a clear indicator of the next thing you need to do.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: DIH - commit / optimize

2008-08-17 Thread Norberto Meijome
On Mon, 18 Aug 2008 10:14:32 +0800
"finy finy" <[EMAIL PROTECTED]> wrote:

> i use solr for 3 months, and i find some question follow:

Please do not hijack mail threads.

http://en.wikipedia.org/wiki/Thread_hijacking

_
{Beto|Norberto|Numard} Meijome

"Ask not what's inside your head, but what your head's inside of."
   J. J. Gibson

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


DIH - commit / optimize

2008-08-17 Thread Norberto Meijome
Hi again,

I see in the DIH wiki page :
[...]
full-import [..] 
commit: (default 'true'). Tells whether to commit+optimize after the operation 
[...]

but nothing for delta-import... I think it would be useful , a 'commit' 
(default=true) , 'optimize' (default=false) for the delta-import - these should 
most probably be separate ones, i think.

- for full-import , wouldn't it make sense to split commit + optimize into 2 
different options? Granted, if I do a clean=true,  i'd probably want (need!) an 
optimize... even then, optimize may be too slow / use too much memory at that 
point in time... ? ( not too sure about this argument..)

cheers,
B
_
{Beto|Norberto|Numard} Meijome

Never take Life too seriously, no one gets out alive anyway.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: DIH - calling spellchecker rebuild...

2008-08-17 Thread Norberto Meijome
On Sun, 17 Aug 2008 20:22:26 +0530
"Shalin Shekhar Mangar" <[EMAIL PROTECTED]> wrote:

> If it is only SpellCheckComponent that you are interested in, then see
> SOLR-622.
> 
> You can add this to your SCC config to rebuild SCC after every commit:
> true

ah great stuff , thanks Shalin.
B

_
{Beto|Norberto|Numard} Meijome

"Truth has no special time of its own.  Its hour is now -- always."
   Albert Schweitzer

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


DIH - calling spellchecker rebuild...

2008-08-17 Thread Norberto Meijome
Guys + gals,

just a question of form - would DIH itself be the right place to implement a 
"URLS to call after successfully completing a DIH full or partial load" - for 
example, to rebuild spellchecker when new items have been added?  Or should 
that be part of my external process (cron -> shell script, for example ) that 
calls DIH in the first place ?

cheers
B
_
{Beto|Norberto|Numard} Meijome

If you find a solution and become attached to it, the solution may become your 
next problem.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


DataImportHandler : more forgiving initialisation possible?

2008-08-17 Thread Norberto Meijome
hi guys,
First of all, thanks for DIH - it's great :)

One thing I noticed during my tests ( nightly, 2008-08-16) is that, if the DB 
is not available during SOLR startup time, the whole core won't initialise .- 
the error is shown below.

I was wondering,
1) would it be possible to have DIH bomb out in this situation, but not bring 
down the whole core from running?  I think it would be desirable , with a big 
warning , possibly... thoughts ?

2) How hard would it be to handle this more gracefully - for example, in case 
of error, leave the handler in an non-init state, and when being accessed, 
repeat the whole init process (and bomb out if it fails again ,of course)...

Thanks for your time on this email + DIH + all other features :)
B

[...]
Aug 17, 2008 11:25:48 PM org.apache.solr.handler.dataimport.DataImportHandler 
processConfiguration
INFO: Processing configuration from solrconfig.xml: {config=data-config.xml}
Aug 17, 2008 11:25:48 PM org.apache.solr.handler.dataimport.DataImporter 
loadDataConfig
INFO: Data Configuration loaded successfully
Aug 17, 2008 11:25:48 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 
call
INFO: Creating a connection for entity an_artist with URL: 
jdbc:sqlserver://a.b.c.d:1433;databaseName=DBNAME;user=usrname;password=magicpassword;responseBuffering=adaptive;
Aug 17, 2008 11:25:48 PM org.apache.solr.handler.dataimport.DataImportHandler 
inform
SEVERE: Exception while loading DataImporter
org.apache.solr.handler.dataimport.DataImportHandlerException: Failed to 
initialize DataSource: null Processing Documemt # 
at 
org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:306)
at 
org.apache.solr.handler.dataimport.DataImporter.addDataSource(DataImporter.java:273)
at 
org.apache.solr.handler.dataimport.DataImporter.initEntity(DataImporter.java:228)
at 
org.apache.solr.handler.dataimport.DataImporter.(DataImporter.java:98)
at 
org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106)
at 
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:294)
at org.apache.solr.core.SolrCore.(SolrCore.java:473)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:295)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207)
at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:107)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39)
at 
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:593)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at 
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1220)
at 
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:513)
at 
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39)
at 
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
at org.mortbay.jetty.Server.doStart(Server.java:222)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:977)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.mortbay.start.Main.invokeMain(Main.java:183)
at org.mortbay.start.Main.start(Main.java:497)
at org.mortbay.start.Main.main(Main.java:115)
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
Unable to create database connection Processing Documemt # 
at 
org.apache.solr.handler.dataimport.JdbcDataSource.init(JdbcDataSource.java:67)
at 
org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:303)
... 34 more
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP 
connection to the host  has failed. java.net.ConnectException: Connection 
refused
at 
com.micros

[SOLVED...]Re: Problems using saxon for XSLT transforms

2008-08-17 Thread Norberto Meijome
On Tue, 12 Aug 2008 23:36:32 +1000
Norberto Meijome <[EMAIL PROTECTED]> wrote:

> hi :)
> I'm trying to use SAXON instead of the default XSLT parser. I was pretty sure 
> i
> had it running fine on 1.2, but when I repeated the same steps (as per the
> wiki) on latest nightly build, i cannot see any sign of it being loaded or 
> use,
> although the classpath seems to be pointing to them (see below)
> 
[...]

well, although no explicit information is present about whether it IS using 
saxon, it obviously dies when saxon isn't present- I moved lib/saxon* out of 
the way, and any transformation dies with :


HTTP ERROR: 500

Provider net.sf.saxon.TransformerFactoryImpl not found

javax.xml.transform.TransformerFactoryConfigurationError: Provider 
net.sf.saxon.TransformerFactoryImpl not found
at 
javax.xml.transform.TransformerFactory.newInstance(TransformerFactory.java:108)
at 
org.apache.solr.util.xslt.TransformerProvider.(TransformerProvider.java:45)
at 
org.apache.solr.util.xslt.TransformerProvider.(TransformerProvider.java:43)
at 
org.apache.solr.request.XSLTResponseWriter.getTransformer(XSLTResponseWriter.java:117)
at 
org.apache.solr.request.XSLTResponseWriter.getContentType(XSLTResponseWriter.java:65)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:250)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1088)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:360)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:729)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:206)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:505)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:829)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:211)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:380)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:450)

RequestURI=/solr/tracks/select/


I guess not as clear as what I'd had hoped for, but should do for now :)

cheers,
B
_
{Beto|Norberto|Numard} Meijome

Computers are like air conditioners; they can't do their job properly if you 
open windows.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Best way to index without diacritics

2008-08-14 Thread Norberto Meijome
On Thu, 14 Aug 2008 11:34:47 -0400
"Steven A Rowe" <[EMAIL PROTECTED]> wrote:

[...]
> The kind of filter Walter is talking about - a generalized language-aware 
> character normalization Solr/Lucene filter - does not yet exist.  My guess is 
> that if/when it does materialize, both the Solr and the Lucene projects will 
> want to have it.  Historically, most functionality shared by Solr and Lucene 
> is eventually hosted by Lucene, since Solr has a Lucene dependency, but not 
> vice-versa.
> 
> So, yes, Solr would be responsible for hosting configuration for such a 
> filter, but the responsibility for doing something with the configuration 
> would be Lucene's responsibility, assuming that Lucene would (eventually) 
> host the filter and Solr would host a factory over the filter.
> 
> Steve

thanks for the thorough explanation ,Steve .
B

_
{Beto|Norberto|Numard} Meijome

"Throughout the centuries there were [people] who took first steps down new 
paths armed only with their own vision."
   Ayn Rand

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Best way to index without diacritics

2008-08-14 Thread Norberto Meijome
( 2 in 1 reply) 
On Wed, 13 Aug 2008 09:59:21 -0700
Walter Underwood <[EMAIL PROTECTED]> wrote:

> Stripping accents doesn't quite work. The correct translation
> is language-dependent. In German, o-dieresis should turn into
> "oe", but in English, it shoulde be "o" (as in "co__perate" or
> "M__tley Cr__e"). In Swedish, it should not be converted at all.

Hi Walter,
understood. This goes back to the question of language-specific field
definitions / parsers... more on this below.

> 
> There are other character-to-string conversions: ae-ligature
> to "ae", "__" to "ss", and so on. Luckily, those are independent
> of language.
> 
> wunder
> 
> On 8/13/08 9:16 AM, "Steven A Rowe" <[EMAIL PROTECTED]> wrote:
> 
> > Hi Norberto,
> > 
> > https://issues.apache.org/jira/browse/LUCENE-1343

hi Steve,
thanks for the pointer. this is a Lucene entry... I thought the Latin-filter
was a SOLR feature? I, for one, definitely meant a SOLR filter. 

Given what Walter rightly pointed out about differences in language, I suspect
it would be a SOLR-level thing - fieldType name="textDE" language="DE" would
apply the filter of unicode chars to {ascii?} with the appropriate mapping for
German, etc. 

Or is this that Lucene would / should take care of ?

B
_
{Beto|Norberto|Numard} Meijome

"I've dirtied my hands writing poetry, for the sake of seduction; that is,  for
the sake of a useful cause." Dostoevsky

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Spellcheker and Dismax both

2008-08-14 Thread Norberto Meijome
On Thu, 14 Aug 2008 12:21:13 +0530
"Shalin Shekhar Mangar" <[EMAIL PROTECTED]> wrote:

> The SpellCheckerRequestHandler is now deprecated with Solr 1.3 and it has
> been replaced by SpellCheckComponent.
> 
> http://wiki.apache.org/solr/SpellCheckComponent


which works quite well with dismax.
B

_
{Beto|Norberto|Numard} Meijome

Never attribute to malice what can adequately be explained by incompetence.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Searching Questions

2008-08-13 Thread Norberto Meijome
On Tue, 12 Aug 2008 13:26:26 -0700
"Jake Conk" <[EMAIL PROTECTED]> wrote:

> 1) I want to search only within a specific field, for instance
> `category`. Is there a way to do this?

of course. Please see http://wiki.apache.org/solr/SolrQuerySyntax (in 
particular, follow the link to Lucene syntax..)

> 
> 2) When searching for multiple results are the following identical
> since "*_facet" and "*_facet_mv" have their type's both set to string?
> 
> /select?q=tag_facet:%22John+McCain%22+OR+tag_facet:%22Barack+Obama%22
> /select?q=tag_facet_mv:%22John+McCain%22+OR+tag_facet_mv:%22Barack+Obama%22

Erik H. already answered this question , in another of your emails. Check your 
mailbox or the lists archives.

> 3) If I'm searching for something that is in a text field but I
> specify it as a facet string rather than a text type would it still
> search within text fields or would it just limit the search to string
> fields?

I am not sure what you mean by 'a facet string' . You facet on fields, SOLR 
automatically creates facets on those fields based on the results to your query 
. 

> 4) Is there a page that will show me different querying combinations
> or can someone post some more examples?

Have you check the wiki ? which page do you suggest needs more examples? 


> 5) Anyone else notice returning back the data in php (&wt=phps)
> doesn't unserialize? I am using PHP 5.3 w/ a nightly copy of Solr from
> last week.

sorry, haven't used PHP + SOLR

cheers,
B
_
{Beto|Norberto|Numard} Meijome

"All that is necessary for the triumph of evil is that good men do nothing."
  Edmund Burke

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Best way to index without diacritics

2008-08-12 Thread Norberto Meijome
On Tue, 12 Aug 2008 11:44:42 -0400
"Steven A Rowe" <[EMAIL PROTECTED]> wrote:

> Solr is Unicode aware.  The ISOLatin1AccentFilterFactory handles diacritics 
> for the ISO Latin-1 section of the Unicode character set.  UTF (do you mean 
> UTF-8?) is a (set of) Unicode serialization(s), and once Solr has 
> deserialized it, it is just Unicode characters (Java's in-memory UTF-16 
> representation).
> 
> So as long as you're only concerned about removing diacritics from the set of 
> Unicode characters that overlaps ISO Latin-1, and not about other Unicode 
> characters, then ISOLatin1AccentFilterFactory should work for you.

hi,
do you know if anyone has implemented a similar filter using icu and mapping (a 
lot more of) UTF-8 to ascii ? 

B

_
{Beto|Norberto|Numard} Meijome

"He has the attention span of a lightning bolt."
  Robert Redford

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: adds / delete within same 'transaction'..

2008-08-12 Thread Norberto Meijome
On Tue, 12 Aug 2008 20:53:12 -0400
"Yonik Seeley" <[EMAIL PROTECTED]> wrote:

> On Tue, Aug 12, 2008 at 1:48 AM, Norberto Meijome <[EMAIL PROTECTED]> wrote:
> > What happens if I issue:
> >  
> > 1  
> > 1new  
> >   
> >
> > will delete happen first, and then the add, or could it be that the add 
> > happens before delete  
> 
> Doesn't matter... it's an implementation detail.  Solr used to buffer
> deletes, and if it crashed at the right time one could get duplicates.
>  Now, Lucene does the buffering of deletes (internally lucene does the
> adds first and buffers the deletes until a segment flush) and it
> should be impossible to see more than one "1" or no "1" at all.

Thanks Yonik. I wasn't asking about the specific details, but of the 
consequence. I seem to remember (incorrectly , or v1.2 only maybe) , that if 
one wanted assurances that the case above happened in the right order, one had 
to commit after the deletes, and once more after the adds. 

This not being the case, I am happy :) 

Thanks again,
B
_
{Beto|Norberto|Numard} Meijome

"He has Van Gogh's ear for music."
  Billy Wilder

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: adds / delete within same 'transaction'..

2008-08-12 Thread Norberto Meijome
On Tue, 12 Aug 2008 11:21:50 -0700
Mike Klaas <[EMAIL PROTECTED]> wrote:

> > will delete happen first, and then the add, or could it be that the  
> > add happens before delete, in which case i end up with no more doc  
> > id=1 ?  
> 
> As long as you are sending these requests on the same thread, they  
> will occur in order.
> 
> -Mike

right, that is GREAT to know then :)

cheers,
b

_
{Beto|Norberto|Numard} Meijome

Life is not measured by the number of breaths we take, but by the moments that 
take our breath away.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Problems using saxon for XSLT transforms

2008-08-12 Thread Norberto Meijome
hi :)
I'm trying to use SAXON instead of the default XSLT parser. I was pretty sure i
had it running fine on 1.2, but when I repeated the same steps (as per the
wiki) on latest nightly build, i cannot see any sign of it being loaded or use,
although the classpath seems to be pointing to them (see below)

In my logs,i see :
INFO: created xslt: org.apache.solr.request.XSLTResponseWriter
Aug 12, 2008 11:20:07 PM org.apache.solr.request.XSLTResponseWriter init
INFO: xsltCacheLifetimeSeconds=5

which is the RH itself, then, on a hit that triggers the transform : 
Aug 12, 2008 11:21:25 PM org.apache.solr.util.xslt.TransformerProvider 
WARNING: The TransformerProvider's simplistic XSLT caching mechanism is not
appropriate for high load scenarios, unless a single XSLT transform is used and
xsltCacheLifetimeSeconds is set to a sufficiently high value.

This is where I would expect to see saxon...right?

I'm running SOLR 1.3, nightly from 2008-08-11, under FreeBSD 7 (stable), JDK
1.6.. I have 4 cores defined in this test environment. 

I start my service  with :

java -Xms64m -Xmx1024m -server
-Djavax.xml.transform.TransformerFactory=net.sf.saxon.TransformerFactoryImpl
-jar start.jar


the  /admin/get-properties.jsp shows

[]

javax.xml.transform.TransformerFactory = net.sf.saxon.TransformerFactoryImpl
java.specification.version = 1.6
[...]
java.class.path
= 
/solrhome:/solrhome/lib/saxon9-s9api.jar:/solrhome/lib/jetty-6.1.11.jar:/solrhome/lib/saxon9-jdom.jar:/solrhome/lib/saxon9-sql.jar:/solrhome/lib/servlet-api-2.5-6.1.11.jar:/solrhome/lib/saxon9-xqj.jar:/solrhome/lib/saxon9.jar:/solrhome/lib/jetty-util-6.1.11.jar:/solrhome/lib/saxon9-xom.jar:/solrhome/lib/saxon9-dom4j.jar:/solrhome/lib/saxon9-xpath.jar:/solrhome/lib/saxon9-dom.jar:/solrhome/lib/jsp-2.1/core-3.1.1.jar:/solrhome/lib/jsp-2.1/ant-1.6.5.jar:/solrhome/lib/jsp-2.1/jsp-2.1.jar:/solrhome/lib/jsp-2.1/jsp-api-2.1.jar:/solrhome/lib/management/jetty-management-6.1.11.jar:/solrhome/lib/naming/jetty-naming-6.1.11.jar:/solrhome/lib/naming/activation-1.1.jar:/solrhome/lib/naming/mail-1.4.jar:/solrhome/lib/plus/jetty-plus-6.1.11.jar:/solrhome/lib/xbean/jetty-xbean-6.1.11.jar:/solrhome/lib/annotations/geronimo-annotation_1.0_spec-1.0.jar:/solrhome/lib/annotations/jetty-annotations-6.1.11.jar:/solrhome/lib/ext/jetty-java5-threadpool-6.1.11.jar:/solrhome/lib/ext/jetty-sslengine-6
 
.1.11.jar:/solrhome/lib/ext/jetty-servlet-tester-6.1.11.jar:/solrhome/lib/ext/jetty-ajp-6.1.11.jar:/solrhome/lib/ext/jetty-setuid-6.1.11.jar:/solrhome/lib/ext/jetty-client-6.1.11.jar:/solrhome/lib/ext/jetty-html-6.1.11.jar

[...]

Any pointers to where I should check to confirm saxon is being used, or
to address the problem will be greatly appreciated.

TIA,
B
_
{Beto|Norberto|Numard} Meijome

"Nature doesn't care how smart you are. You can still be wrong."
  Richard Feynman

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


adds / delete within same 'transaction'..

2008-08-11 Thread Norberto Meijome
Hello :)

I *think* i know the answer, but i'd like to confirm :

Say I have 
1old

already indexed and commited (ie, 'live' ) 

What happens if I issue:

1
1new


will delete happen first, and then the add, or could it be that the add happens 
before delete, in which case i end up with no more doc id=1 ? 

thanks!!
B
_
{Beto|Norberto|Numard} Meijome

Anyone who isn't confused here doesn't really understand what's going on.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Can't Delete Record

2008-08-11 Thread Norberto Meijome
On Mon, 11 Aug 2008 06:48:05 -0700 (PDT)
Vj Ali <[EMAIL PROTECTED]> wrote:

>  i also sends  tag as well.

maybe you need 



instead of 
?


_
{Beto|Norberto|Numard} Meijome

"With sufficient thrust, pigs fly just fine. However, this is not necessarily a 
good idea. It is hard to be sure where they are going to land, and it could be 
dangerous sitting under them as they fly overhead."
   [RFC1925 - section 2, subsection 3]

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: unique key

2008-08-11 Thread Norberto Meijome
On Wed, 6 Aug 2008 12:25:34 +1000
Norberto Meijome <[EMAIL PROTECTED]> wrote:

> On Tue, 5 Aug 2008 14:41:08 -0300
> "Scott Swan" <[EMAIL PROTECTED]> wrote:
> 
> > I currently have multiple documents that i would like to index but i would 
> > like to combine two fields to produce the unique key.
> > 
> > the documents either have 1 or the other fields so by combining the two 
> > fields i will get a unique result.
> > 
> > is this possible in the solr schema? 
> >   
> 
> Hi Scott,
> you can't do that by the schema - you need to do it when you generate your 
> document, before posting it to SOLR.

Hi again,
after reading the DataImportHandler documentation, you could do this too with 
specific configuration in DIH itself. Of course, you have to be using DIH to 
load data into your SOLR ;)

B

_
{Beto|Norberto|Numard} Meijome

"Intellectual: 'Someone who has been educated beyond his/her intelligence'"
   Arthur C. Clarke, from "3001, The Final Odyssey", Sources.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Still no results after removing from stopwords

2008-08-11 Thread Norberto Meijome
On Sun, 10 Aug 2008 19:58:24 -0700 (PDT)
SoupErman <[EMAIL PROTECTED]> wrote:

> I needed to run a search with a query containing the word "not", so I removed
> "not" from the stopwords.txt file. Which seemed to work, at least as far as
> parsing the query. It was now successfully searching for that keyword, as
> noted in the query debugger. However it isn't returning any results where
> "not" is in the query, which suggests "not" hasn't been indexed. However
> looking at the listing for a particular item, "not" is listed as one of the
> keywords, so it should be finding it?

Hi Michael,
did you reindex your documents after 1) changing your settings and 2) 
restarting SOLR (to allow your settings to come into effect)?

B

_
{Beto|Norberto|Numard} Meijome

Real Programmers don't comment their code. If it was hard to write, it should 
be hard to understand and even harder to modify.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: HTML Standard Strip filter word boundary bug

2008-08-07 Thread Norberto Meijome
On Thu, 7 Aug 2008 00:50:59 -0700 (PDT)
matt connolly <[EMAIL PROTECTED]> wrote:

> Where do I file a bug report?

https://issues.apache.org/jira

thanks!
B

_
{Beto|Norberto|Numard} Meijome

Contrary to popular belief, Unix is user friendly. It just happens to be very 
selective about who it decides to make friends with.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: case preserving for data but not for indexing

2008-08-07 Thread Norberto Meijome
On Wed, 6 Aug 2008 21:35:47 -0700 (PDT)
Otis Gospodnetic <[EMAIL PROTECTED]> wrote:

> 
> 
> 
> 
> 2 Tokenizers?

i wondered about that too, but didn't have the time to test...
B

_
{Beto|Norberto|Numard} Meijome

"Always listen to experts.  They'll tell you what can't be done, and why.  
Then do it."
  Robert A. Heinlein

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: case preserving for data but not for indexing

2008-08-06 Thread Norberto Meijome
On Wed, 6 Aug 2008 20:21:28 -0400
"Ian Connor" <[EMAIL PROTECTED]> wrote:

> In order to preserve case for the data, but not for indexing, I have
> created two fields. One is type Author that is defined as:
> 
>  sortMissingLast="true" omitNorms="true">
>   
>   
>   
>   
>   
> 
> 
> and the other is just string:
> 
>  sortMissingLast="true" omitNorms="true"/>

Hi Ian,
the analyzers + filters apply to the data indexed (and to queries on the
field,of course), NOT what is stored. IOW, you don't have to do anything to have
SOLR return the data in your fields untouched. 

> this is used then for the author lists:
> omitNorms="true" multiValued="true"/>
> stored="true" omitNorms="true" multiValued="true"/>
> 
> Is there any other way than to have two fields like this? One for
> searching and one for displaying. 

Of course, you can do this but, for the reason you explained, it isn't needed.
As a matter of fact, you will be indexing and storing both... If you wanted to
have one field for indexing/search on and the other for retrieving, you'd have
to set the values of the indexed and stored properties accordingly.

> People's names can be vary case
> sensitive for display purpose (eg McDonald. DeBros) but I don't want
> people to miss results because they search for "lee" instead of "Lee".

your definition of typeField author:

>  sortMissingLast="true" omitNorms="true">
>   
>   
>   
>   
>   
> 

 should do that - it is telling SOLR (lucene?)  that, each piece of data stored
in a field of this type, to tokenize it., and then to change to lower case -
both at indexing and query time.

> 
> Also, can anyone see danger is using StandardTokenizerFactory for
> people's names?

I don't know, give it a try :) you can use the analysis page in /admin/ to see
how your date would be treated both at index and query time...

good luck,
B

_
{Beto|Norberto|Numard} Meijome

"As far as the laws of mathematics refer to reality, they are not certain, and
as far as they are certain, they do not refer to reality." Albert Einstein

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Solr Logo thought

2008-08-06 Thread Norberto Meijome
On Tue, 05 Aug 2008 16:02:51 -0400
Stephen Weiss <[EMAIL PROTECTED]> wrote:

> My issue with the logos presented was they made solr look like a  
> school project instead of the powerful tool that it is.  The tricked  
> out font or whatever just usually doesn't play well with the business  
> types... they want serious-looking software.  First impressions are  
> everything.  While the fiery colors are appropriate for something  
> named Solr, you can play with that without getting silly - take a look  
> at:

couldn't agree more. current logo needs improvement, but I think it can be done
much better... In particular thinking of small icons, print,etc... 

> http://www.ascsolar.com/images/asc_solar_splash_logo.gif
> http://www.logostick.com/images/EOS_InvestmentingLogo_lg.gif
> 
> (Luckily there are many businesses that do solar energy!)
> 
> They have the same elements but with a certain simplicity and elegance.
> 
> I know probably some people don't care if it makes the boss or client  
> happy, but, these are the kinds of seemingly insignificant things that 

Indeed - the way I see it, if you don't care either way, then you should be
happy to have a professional looking one :P

B
_
{Beto|Norberto|Numard} Meijome

"Caminante no hay camino, se hace camino al andar"
   Antonio Machado

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Sum of one field

2008-08-05 Thread Norberto Meijome
On Tue, 05 Aug 2008 18:58:42 -0300
Leonardo Dias <[EMAIL PROTECTED]> wrote:

> So I'm looking for a Ferrari. CarStore says that there are 5 ads for 
> Ferrari, but one ad has 2 Ferraris being sold, the other ad has 3 
> Ferraris and all the others have 1 Ferrari each, meaning that there are 
> 5 ads and 8 Ferraris. And yes, I'm doing an example with Fibonacci 
> numbers. ;)

why not create one separate document per car? It'll make it easier (for the 
client) to manage too when one of the cars is sold but not the other 4

B
_
{Beto|Norberto|Numard} Meijome

"With sufficient thrust, pigs fly just fine. However, this is not necessarily a 
good idea. It is hard to be sure where they are going to land, and it could be 
dangerous sitting under them as they fly overhead."
   [RFC1925 - section 2, subsection 3]

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: unique key

2008-08-05 Thread Norberto Meijome
On Tue, 5 Aug 2008 14:41:08 -0300
"Scott Swan" <[EMAIL PROTECTED]> wrote:

> I currently have multiple documents that i would like to index but i would 
> like to combine two fields to produce the unique key.
> 
> the documents either have 1 or the other fields so by combining the two 
> fields i will get a unique result.
> 
> is this possible in the solr schema? 
> 

Hi Scott,
you can't do that by the schema - you need to do it when you generate your 
document, before posting it to SOLR.

btw, please don't hijack topic threads.

http://en.wikipedia.org/wiki/Thread_hijacking

thanks!!
B
_
{Beto|Norberto|Numard} Meijome

Law of Conservation of Perversity: 
  we can't make something simpler without making something else more complex

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Diagnostic tools

2008-08-05 Thread Norberto Meijome
On Tue, 5 Aug 2008 11:43:44 -0500
"Kashyap, Raghu" <[EMAIL PROTECTED]> wrote:

> Hi,

Hi Kashyap,
please don't hijack topic threads.

http://en.wikipedia.org/wiki/Thread_hijacking

thanks!!
B
_
{Beto|Norberto|Numard} Meijome

Software QA is like cleaning my cat's litter box: Sift out the big chunks. Stir 
in the rest. Hope it doesn't stink.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: solr 1.3 ??

2008-08-04 Thread Norberto Meijome
On Mon, 4 Aug 2008 21:13:09 -0700 (PDT)
Vicky_Dev <[EMAIL PROTECTED]> wrote:

> Can we get solr 1.3 release as soon as possible? Otherwise some interim
> release (1.2.x) containing DataImportHandler will also a good option. 
> 
> Any Thoughts?


have you tried one of the nightly builds? I've been following it every so
often...sometimes there is a problem, but hardly ever... you can find a build
you are comfortable with, and it'll be far closer to the actual 1.3 when
released than 1.2 .

B

_
{Beto|Norberto|Numard} Meijome

Quantum Logic Chicken:
  The chicken is distributed probabalistically on all sides of the
  road until you observe it on the side of your course.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Solr Logo thought

2008-08-04 Thread Norberto Meijome
On Mon, 4 Aug 2008 09:29:30 -0700
Ryan McKinley <[EMAIL PROTECTED]> wrote:

> >
> > If there is a still room for new log design for Solr and the  
> > community is
> > open for it then I can try to come up with some proposal. Doing logo  
> > for
> > Mahout was really interesting experience.
> >
> 
> In my opinion, yes  I'd love to see more effort put towards  the  
> logo.  I have stayed out of this discussion since I don't really think  
> any of the logos under consideration are complete.  (I begged some  
> friends to do two of the three logos under consideration)  I would  
> love to refine them, but time... oooh time.

+1 

If we are going to change what we have, i'd love to see some more options , or
better quality - no offence meant , but those "logos" aren't really a huge
improvement or departure from the current one. 

I think whatever we change to we'll be wanting to use it for a long time.

B
_
{Beto|Norberto|Numard} Meijome

If you find a solution and become attached to it, the solution may become your
next problem.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


  1   2   3   >