Advice on analysis/filtering?

2008-10-16 Thread Jarek Zgoda

Hello, group.

I'm trying to create a search facility for documents in "broken"  
Polish (by broken I mean "not language rules compliant"), searchable  
by terms in "broken" Polish, but broken in many other ways than  
documents. See this example:


document text: "włatcy móch" (in proper Polish this would be "władcy  
much")
example terms that should match: "włatcy much", "wlatcy moch", "wladcy  
much"


This double brokeness ruled out any Polish stemmers currently  
available for Lucene and now I am at point 0. The search results do  
not have to be 100% accurate - some missing results are acceptable,  
but "false positives" are not. Is it at all possible using machinery  
provided by Solr (I do not own PHD in liguistics), or should I ask the  
business for lowering their expectations?


--
We read Knuth so you don't have to. - Tim Peters

Jarek Zgoda, R&D, Redefine
[EMAIL PROTECTED]



Re: error with delta import

2008-10-16 Thread Florian Aumeier

Noble Paul നോബിള്‍ नोब्ळ् schrieb:

The delta implementation is a bit fragile in DIH for complex queries

  
that's too bad. It's a nice interface and less complex to configure than 
to go the XML /update way.



Well, when doing the way you described below (full-import with the delta 
query), the '${dataimporter.last_index_time}' timestamp is empty:


Oct 16, 2008 10:14:53 AM org.apache.solr.handler.dataimport.DataImporter 
doFullImport

SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to 
execute query: SELECT a.id AS article_id,a.stub AS article_stub,a.ref AS 
article_ref,a.id_blogs,a.title AS article_title, a.normalized_text, 
au.url AS article_url, bu.url AS blog_url, b.title AS 
blog_title,b.subtitle AS blog_subtitle, r.rank, 
coalesce(a.updated,a.published,a.added) as ts, a.stub as article_stub 
FROM articles a join blogs b on a.id_blogs = b.id join urls au on 
a.id_urls = au.id join urls bu on b.id_urls = bu.id LEFT OUTER JOIN 
ranks r on a.id = r.id_articles WHERE b.id_urls is not null AND a.hidden 
is false AND b.hidden is false AND a.ref is not null AND b.ref is not 
null and (rankid in (SELECT rankid FROM ranks order by rankid desc limit 
1) OR rankid is null) AND coalesce(a.updated,a.published,a.added) > '' 
Processing Document # 1


Regards
Florian



I recommend you do delta-import using a full-import

it can be done as follows
define a diffferent entity





  
  

  
  
   
   
 


when you wish to do a full-import pass the request parameter
entity=articles-full

for delta-import use the request parameter
entity=articles-delta&clean=false (command has to be full-import only)



On Wed, Oct 15, 2008 at 1:42 PM, Florian Aumeier
<[EMAIL PROTECTED]> wrote:
  

Shalin Shekhar Mangar schrieb:


You are missing the "pk" field (primary key). This is used for delta
imports.

  

I added the pk field and rebuild the index yesterday. However, when I run
the delta-import, I still have this error message in the log:

INFO: Starting delta collection.
Oct 15, 2008 9:37:27 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Running ModifiedRowKey() for Entity: articles
Oct 15, 2008 9:37:27 AM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Creating a connection for entity articles with URL:
jdbc:postgresql://bm02:5432/bm
Oct 15, 2008 9:37:27 AM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Time taken for getConnection(): 43
Oct 15, 2008 9:37:36 AM org.apache.solr.core.SolrCore execute
INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0
Oct 15, 2008 9:44:51 AM org.apache.solr.core.SolrCore execute
INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0
Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed ModifiedRowKey for Entity: articles rows obtained : 4584
Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Running DeletedRowKey() for Entity: articles
Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed DeletedRowKey for Entity: articles rows obtained : 0
Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed parentDeltaQuery for Entity: articles
Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DataImporter
doDeltaImport
SEVERE: Delta Import Failed
java.lang.NullPointerException
  at
org.apache.solr.handler.dataimport.SqlEntityProcessor.getDeltaImportQuery(SqlEntityProcessor.java:153)
  at
org.apache.solr.handler.dataimport.SqlEntityProcessor.getQuery(SqlEntityProcessor.java:125)
  at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
  at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285)
  at
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:211)
  at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:133)
  at
org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:359)
  at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:388)
  at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
Oct 15, 2008 9:50:58 AM org.apache.solr.core.SolrCore execute
INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0

Regards
Florian






  



--
Media Ventures GmbH 
Entwicklung Blogmonitor.de


Jabber-ID [EMAIL PROTECTED]
Telefon +49 (0) 2236 480 10 22



Re: snapcleaner >> problem solr 1.3

2008-10-16 Thread Chris Haggstrom


On Oct 16, 2008, at 3:10 AM, sunnyfr wrote:


I've a wierd problem when I try  to fire snapcleaner manually :
Already : is it correct : [EMAIL PROTECTED]:/data/solr/video#
./bin/snapcleaner -V -D-1

To remove every snapshot older than one day.


You need to change "-D -1" to "-D 1".  Otherwise, you're trying to  
remove snapshots older than -1 days, which is an invalid argument to  
pass to 'find -mtime' as is shown in these lines of your debug output.



It doesn't remove older than one day obviously and debugger show me :

+ logMessage cleaning up snapshots more than -1 days old
++ find /data/solr/video/data -maxdepth 1 -name 'snapshot.*' -mtime  
+-1

find: invalid argument `+-1' to `-mtime'



-Chris


Re: error with delta import

2008-10-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Thu, Oct 16, 2008 at 2:08 PM, Florian Aumeier
<[EMAIL PROTECTED]> wrote:
> Noble Paul നോബിള്‍ नोब्ळ् schrieb:
>>
>> The delta implementation is a bit fragile in DIH for complex queries
>>
>>
>
> that's too bad. It's a nice interface and less complex to configure than to
> go the XML /update way.
>
>
> Well, when doing the way you described below (full-import with the delta
> query), the '${dataimporter.last_index_time}' timestamp is empty:
I guess this was fixed post 1.3 . probably you can take
dataimporthandler.jar from a nightly build (you may also need to add
slf4j.jar)
>
> Oct 16, 2008 10:14:53 AM org.apache.solr.handler.dataimport.DataImporter
> doFullImport
> SEVERE: Full Import failed
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
> execute query: SELECT a.id AS article_id,a.stub AS article_stub,a.ref AS
> article_ref,a.id_blogs,a.title AS article_title, a.normalized_text, au.url
> AS article_url, bu.url AS blog_url, b.title AS blog_title,b.subtitle AS
> blog_subtitle, r.rank, coalesce(a.updated,a.published,a.added) as ts, a.stub
> as article_stub FROM articles a join blogs b on a.id_blogs = b.id join urls
> au on a.id_urls = au.id join urls bu on b.id_urls = bu.id LEFT OUTER JOIN
> ranks r on a.id = r.id_articles WHERE b.id_urls is not null AND a.hidden is
> false AND b.hidden is false AND a.ref is not null AND b.ref is not null and
> (rankid in (SELECT rankid FROM ranks order by rankid desc limit 1) OR rankid
> is null) AND coalesce(a.updated,a.published,a.added) > '' Processing
> Document # 1
>
> Regards
> Florian
>
>
>> I recommend you do delta-import using a full-import
>>
>> it can be done as follows
>> define a diffferent entity
>>
>> 
>> > url="jdbc:postgresql://bm02:5432/bm" user="user" />
>>
>> 
>>  
>>  
>>
>>  > query="">
>>  
>>   
>>   
>>  
>> 
>>
>> when you wish to do a full-import pass the request parameter
>> entity=articles-full
>>
>> for delta-import use the request parameter
>> entity=articles-delta&clean=false (command has to be full-import only)
>>
>>
>>
>> On Wed, Oct 15, 2008 at 1:42 PM, Florian Aumeier
>> <[EMAIL PROTECTED]> wrote:
>>
>>>
>>> Shalin Shekhar Mangar schrieb:
>>>

 You are missing the "pk" field (primary key). This is used for delta
 imports.


>>>
>>> I added the pk field and rebuild the index yesterday. However, when I run
>>> the delta-import, I still have this error message in the log:
>>>
>>> INFO: Starting delta collection.
>>> Oct 15, 2008 9:37:27 AM org.apache.solr.handler.dataimport.DocBuilder
>>> collectDelta
>>> INFO: Running ModifiedRowKey() for Entity: articles
>>> Oct 15, 2008 9:37:27 AM
>>> org.apache.solr.handler.dataimport.JdbcDataSource$1
>>> call
>>> INFO: Creating a connection for entity articles with URL:
>>> jdbc:postgresql://bm02:5432/bm
>>> Oct 15, 2008 9:37:27 AM
>>> org.apache.solr.handler.dataimport.JdbcDataSource$1
>>> call
>>> INFO: Time taken for getConnection(): 43
>>> Oct 15, 2008 9:37:36 AM org.apache.solr.core.SolrCore execute
>>> INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0
>>> Oct 15, 2008 9:44:51 AM org.apache.solr.core.SolrCore execute
>>> INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0
>>> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder
>>> collectDelta
>>> INFO: Completed ModifiedRowKey for Entity: articles rows obtained : 4584
>>> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder
>>> collectDelta
>>> INFO: Running DeletedRowKey() for Entity: articles
>>> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder
>>> collectDelta
>>> INFO: Completed DeletedRowKey for Entity: articles rows obtained : 0
>>> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder
>>> collectDelta
>>> INFO: Completed parentDeltaQuery for Entity: articles
>>> Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DataImporter
>>> doDeltaImport
>>> SEVERE: Delta Import Failed
>>> java.lang.NullPointerException
>>>  at
>>>
>>> org.apache.solr.handler.dataimport.SqlEntityProcessor.getDeltaImportQuery(SqlEntityProcessor.java:153)
>>>  at
>>>
>>> org.apache.solr.handler.dataimport.SqlEntityProcessor.getQuery(SqlEntityProcessor.java:125)
>>>  at
>>>
>>> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
>>>  at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285)
>>>  at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:211)
>>>  at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:133)
>>>  at
>>>
>>> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:359)
>>>  at
>>>
>>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:388)
>>>  at
>>>
>>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
>>> Oct 15, 2008 9:50:58 AM org.apache.so

Re: error with delta import

2008-10-16 Thread Florian Aumeier

Noble Paul നോബിള്‍ नोब्ळ् schrieb:

Well, when doing the way you described below (full-import with the delta
query), the '${dataimporter.last_index_time}' timestamp is empty:


I guess this was fixed post 1.3 . probably you can take
dataimporthandler.jar from a nightly build (you may also need to add
slf4j.jar)
  



I replaced
dist/apache-solr-dataimporthandler-1.3.0.jar
dist/solrj-lib/slf4j-api-1.5.3.jar
dist/solrj-lib/slf4j-jdk14-1.5.3.jar

with their counterparts from the nightly build, but it did not help. 
Then I tried to enter the date kind of hard coded (now() - '12 
hours'::interval).

Everything looks fine, but there are no new documents in the index.

here is the log:

INFO: Starting Full Import
Oct 16, 2008 1:07:08 PM org.apache.solr.core.SolrCore executeINFO: 
[test] webapp=/solr path=/dataimport 
params={command=full-import&clean=false&entity=articles-delta} status=0 
QTime=0
Oct 16, 2008 1:07:08 PM 
org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Creating a connection for entity articles-delta with URL: 
jdbc:postgresql://bm02:5432/bm
Oct 16, 2008 1:07:08 PM 
org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Time taken 
for getConnection(): 45

Oct 16, 2008 1:14:53 PM org.apache.solr.core.SolrCore execute
INFO: [test] webapp=/solr path=/dataimport params={} status=0 QTime=1
Oct 16, 2008 1:16:11 PM org.apache.solr.handler.dataimport.SolrWriter 
readIndexerPropertiesINFO: Read dataimport.properties
Oct 16, 2008 1:16:11 PM org.apache.solr.handler.dataimport.SolrWriter 
persistStartTime

INFO: Wrote last indexed time to dataimport.properties
Oct 16, 2008 1:16:11 PM org.apache.solr.handler.dataimport.DocBuilder 
commitINFO: Full Import completed successfullyOct 16, 2008 1:16:11 PM 
org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=true,waitFlush=false,waitSearcher=true)Oct 
16, 2008 1:16:11 PM org.apache.solr.search.SolrIndexSearcher INFO: 
Opening [EMAIL PROTECTED] mainOct 16, 2008 1:16:11 PM 
org.apache.solr.update.DirectUpdateHandler2 commit

INFO: end_commit_flush
... (autowarming)
Oct 16, 2008 1:16:12 PM org.apache.solr.handler.dataimport.DocBuilder 
execute

INFO: Time taken = 0:9:3.231



Re: Solr search not displaying all the indexed values.

2008-10-16 Thread con


Yes. something similar to :




 
 

 
 


   

 
 

 
 


   





But the searching will not give all the results even if there is only one
result. whereas indexing is fine.
Thanks
con



Noble Paul നോബിള്‍ नोब्ळ् wrote:
> 
> do you have 2 queries in 2 different entities?
> 
> 
> On Thu, Oct 16, 2008 at 3:17 PM, con <[EMAIL PROTECTED]> wrote:
>>
>> I have two queries in my data-config.xml which takes values from multiple
>> tables, like:
>> select * from EMPLOYEE, CUSTOMER where EMPLOYEE.prod_id=
>> CUSTOMER.prod_id.
>>
>> When i do a full-import it is indexing all the rows as expected.
>>
>> But when i search it with a *:* , it is not displaying all the values.
>> Do I need any extra configurations?
>>
>> Thanks
>> con
>> --
>> View this message in context:
>> http://www.nabble.com/Solr-search-not-displaying-all-the-indexed-values.-tp20010401p20010401.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> --Noble Paul
> 
> 



-- 
View this message in context: 
http://www.nabble.com/Solr-search-not-displaying-all-the-indexed-values.-tp20010401p20011033.html
Sent from the Solr - User mailing list archive at Nabble.com.



dataimport, both splitBy and dateTimeFormat

2008-10-16 Thread David Smiley @MITRE.org

I'm trying out the dataimport capability.  I have a column that is a series
of dates separated by spaces like so:
"1996-00-00 1996-04-00"
And I'm trying to import it like so:


However this fails and the stack trace suggests it is first trying to apply
the dateTimeFormat before splitBy.  I think this is a bug... dataimport
should apply DateFormatTransformer and NumberFormatTransformer last.

~ David Smiley
-- 
View this message in context: 
http://www.nabble.com/dataimport%2C-both-splitBy-and-dateTimeFormat-tp20013006p20013006.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: dataimport, both splitBy and dateTimeFormat

2008-10-16 Thread Shalin Shekhar Mangar
Hi David,

I think you meant RegexTransformer instead of NumberFormatTransformer.
Anyhow, the order in which the transformers are applied is the same as the
order in which you specify them.

So make sure your entity has
transformers="RegexTransformer,DateFormatTransformer".

On Thu, Oct 16, 2008 at 6:14 PM, David Smiley @MITRE.org
<[EMAIL PROTECTED]>wrote:

>
> I'm trying out the dataimport capability.  I have a column that is a series
> of dates separated by spaces like so:
> "1996-00-00 1996-04-00"
> And I'm trying to import it like so:
> 
>
> However this fails and the stack trace suggests it is first trying to apply
> the dateTimeFormat before splitBy.  I think this is a bug... dataimport
> should apply DateFormatTransformer and NumberFormatTransformer last.
>
> ~ David Smiley
> --
> View this message in context:
> http://www.nabble.com/dataimport%2C-both-splitBy-and-dateTimeFormat-tp20013006p20013006.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: snapcleaner >> problem solr 1.3

2008-10-16 Thread sunnyfr

still nothing changed :

[EMAIL PROTECTED]:/data/solr/video# ./bin/snapcleaner -V -D 1
+ [[ -z 1 ]]
+ fixUser -V -D 1
+ [[ -z '' ]]
++ whoami
+ user=root
++ whoami
+ [[ root != root ]]
++ who -m
++ cut '-d ' -f1
++ sed '-es/^.*!//'
+ oldwhoami=root
+ [[ root == '' ]]
+ [[ -z /data/solr/video/data ]]
++ echo /data/solr/video/data
++ cut -c1
+ [[ / != \/ ]]
+ setStartTime
+ [[ Linux == \S\u\n\O\S ]]
++ date +%s
+ start=1224156482
+ logMessage started by root
++ timeStamp
++ date '+%Y/%m/%d %H:%M:%S'
+ echo 2008/10/16 13:28:02 started by root
+ [[ -n '' ]]
+ logMessage command: ./bin/snapcleaner -V -D 1
++ timeStamp
++ date '+%Y/%m/%d %H:%M:%S'
+ echo 2008/10/16 13:28:02 command: ./bin/snapcleaner -V -D 1
+ [[ -n '' ]]
+ trap 'echo "caught INT/TERM, exiting now but partial cleanup may have
already occured";logExit aborted 13' INT TERM
+ [[ -n 1 ]]
+ find /data/solr/video/data -maxdepth 0 -name foobar
+ '[' 0 = 0 ']'
+ maxdepth='-maxdepth 1'
+ logMessage cleaning up snapshots more than 1 days old
++ timeStamp
++ date '+%Y/%m/%d %H:%M:%S'
+ echo 2008/10/16 13:28:02 cleaning up snapshots more than 1 days old
+ [[ -n '' ]]
++ find /data/solr/video/data -maxdepth 1 -name 'snapshot.*' -mtime +1
-print
+ logExit ended 0
+ [[ Linux == \S\u\n\O\S ]]
++ date +%s
+ end=1224156482
++ expr 1224156482 - 1224156482
+ diff=0
++ timeStamp
++ date '+%Y/%m/%d %H:%M:%S'
+ echo '2008/10/16 13:28:02 ended (elapsed time: 0 sec)'
+ exit 0



Chris Haggstrom wrote:
> 
> 
> On Oct 16, 2008, at 3:10 AM, sunnyfr wrote:
>>
>> I've a wierd problem when I try  to fire snapcleaner manually :
>> Already : is it correct : [EMAIL PROTECTED]:/data/solr/video#
>> ./bin/snapcleaner -V -D-1
>>
>> To remove every snapshot older than one day.
> 
> You need to change "-D -1" to "-D 1".  Otherwise, you're trying to  
> remove snapshots older than -1 days, which is an invalid argument to  
> pass to 'find -mtime' as is shown in these lines of your debug output.
> 
>> It doesn't remove older than one day obviously and debugger show me :
>>
>> + logMessage cleaning up snapshots more than -1 days old
>> ++ find /data/solr/video/data -maxdepth 1 -name 'snapshot.*' -mtime  
>> +-1
>> find: invalid argument `+-1' to `-mtime'
> 
> 
> -Chris
> 
> 

-- 
View this message in context: 
http://www.nabble.com/snapcleaner-%3E%3E-problem-solr-1.3-tp20010689p20011746.html
Sent from the Solr - User mailing list archive at Nabble.com.



Solr search not displaying all the indexed values.

2008-10-16 Thread con

I have two queries in my data-config.xml which takes values from multiple
tables, like:
select * from EMPLOYEE, CUSTOMER where EMPLOYEE.prod_id= CUSTOMER.prod_id.

When i do a full-import it is indexing all the rows as expected.

But when i search it with a *:* , it is not displaying all the values.
Do I need any extra configurations?

Thanks
con
-- 
View this message in context: 
http://www.nabble.com/Solr-search-not-displaying-all-the-indexed-values.-tp20010401p20010401.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Advice on analysis/filtering?

2008-10-16 Thread Erick Erickson
Well, let me see. Your customers are telling you, in essence,
"for any random input, you cannot return false positives". Which
is nonsense, so I'd say you need to negotiate with your
customers. I flat guarantee that, for any algorithm you try,
you can write a counter-example in, oh, 15 seconds or so .

I think the best you can hope for is "reasonable results", but
getting your customers to agree to what is "reasonable" is...er...
often a challenge. Frequently when confronted by "close but
not perfect", customers aren't as unforgiving as their first
position would indicate since the inconvenience of the not-
quite-perfect results is often much less than people think
when starting out.

FuzzySearch tries to do some of this work for you, and that may be
acceptable, as this is a common issue. But it'll never be
perfect.

You might get some joy from ngrams, but I haven't
worked with it myself, just seen it recommended by people
whose opinions I respect...

Best
Erick


2008/10/16 Jarek Zgoda <[EMAIL PROTECTED]>

> Hello, group.
>
> I'm trying to create a search facility for documents in "broken" Polish (by
> broken I mean "not language rules compliant"), searchable by terms in
> "broken" Polish, but broken in many other ways than documents. See this
> example:
>
> document text: "włatcy móch" (in proper Polish this would be "władcy much")
> example terms that should match: "włatcy much", "wlatcy moch", "wladcy
> much"
>
> This double brokeness ruled out any Polish stemmers currently available for
> Lucene and now I am at point 0. The search results do not have to be 100%
> accurate - some missing results are acceptable, but "false positives" are
> not. Is it at all possible using machinery provided by Solr (I do not own
> PHD in liguistics), or should I ask the business for lowering their
> expectations?
>
> --
> We read Knuth so you don't have to. - Tim Peters
>
> Jarek Zgoda, R&D, Redefine
> [EMAIL PROTECTED]
>
>


Re: Advice on analysis/filtering?

2008-10-16 Thread Jarek Zgoda
Wiadomość napisana w dniu 2008-10-16, o godz. 15:54, przez Erick  
Erickson:



Well, let me see. Your customers are telling you, in essence,
"for any random input, you cannot return false positives". Which
is nonsense, so I'd say you need to negotiate with your
customers. I flat guarantee that, for any algorithm you try,
you can write a counter-example in, oh, 15 seconds or so .


They came to such expectations seeing Solr's own Spellcheck at work -  
if it can suggest correct versions, it should be able to sanitize  
broken words in documents and search them using sanitized input. For  
me, this seemed reasonable request (of course, if this can be achieved  
reasonably abusing solr's spellcheck component).



FuzzySearch tries to do some of this work for you, and that may be
acceptable, as this is a common issue. But it'll never be
perfect.

You might get some joy from ngrams, but I haven't
worked with it myself, just seen it recommended by people
whose opinions I respect...


Thank you for these suggestions.




Best
Erick


2008/10/16 Jarek Zgoda <[EMAIL PROTECTED]>


Hello, group.

I'm trying to create a search facility for documents in "broken"  
Polish (by

broken I mean "not language rules compliant"), searchable by terms in
"broken" Polish, but broken in many other ways than documents. See  
this

example:

document text: "włatcy móch" (in proper Polish this would be  
"władcy much")
example terms that should match: "włatcy much", "wlatcy moch",  
"wladcy

much"

This double brokeness ruled out any Polish stemmers currently  
available for
Lucene and now I am at point 0. The search results do not have to  
be 100%
accurate - some missing results are acceptable, but "false  
positives" are
not. Is it at all possible using machinery provided by Solr (I do  
not own

PHD in liguistics), or should I ask the business for lowering their
expectations?

--
We read Knuth so you don't have to. - Tim Peters

Jarek Zgoda, R&D, Redefine
[EMAIL PROTECTED]




--
We read Knuth so you don't have to. - Tim Peters

Jarek Zgoda, R&D, Redefine
[EMAIL PROTECTED]



Re: Advice on analysis/filtering?

2008-10-16 Thread Erick Erickson
You're welcome. I should have pointed out that I was responding
mostly to the "false hits are not acceptable" portion, which I don't
think is achievable

Best
Erick

2008/10/16 Jarek Zgoda <[EMAIL PROTECTED]>

> Wiadomość napisana w dniu 2008-10-16, o godz. 15:54, przez Erick Erickson:
>
>  Well, let me see. Your customers are telling you, in essence,
>> "for any random input, you cannot return false positives". Which
>> is nonsense, so I'd say you need to negotiate with your
>> customers. I flat guarantee that, for any algorithm you try,
>> you can write a counter-example in, oh, 15 seconds or so .
>>
>
> They came to such expectations seeing Solr's own Spellcheck at work - if it
> can suggest correct versions, it should be able to sanitize broken words in
> documents and search them using sanitized input. For me, this seemed
> reasonable request (of course, if this can be achieved reasonably abusing
> solr's spellcheck component).
>
>  FuzzySearch tries to do some of this work for you, and that may be
>> acceptable, as this is a common issue. But it'll never be
>> perfect.
>>
>> You might get some joy from ngrams, but I haven't
>> worked with it myself, just seen it recommended by people
>> whose opinions I respect...
>>
>
> Thank you for these suggestions.
>
>
>
>>
>> Best
>> Erick
>>
>>
>> 2008/10/16 Jarek Zgoda <[EMAIL PROTECTED]>
>>
>>  Hello, group.
>>>
>>> I'm trying to create a search facility for documents in "broken" Polish
>>> (by
>>> broken I mean "not language rules compliant"), searchable by terms in
>>> "broken" Polish, but broken in many other ways than documents. See this
>>> example:
>>>
>>> document text: "włatcy móch" (in proper Polish this would be "władcy
>>> much")
>>> example terms that should match: "włatcy much", "wlatcy moch", "wladcy
>>> much"
>>>
>>> This double brokeness ruled out any Polish stemmers currently available
>>> for
>>> Lucene and now I am at point 0. The search results do not have to be 100%
>>> accurate - some missing results are acceptable, but "false positives" are
>>> not. Is it at all possible using machinery provided by Solr (I do not own
>>> PHD in liguistics), or should I ask the business for lowering their
>>> expectations?
>>>
>>> --
>>> We read Knuth so you don't have to. - Tim Peters
>>>
>>> Jarek Zgoda, R&D, Redefine
>>> [EMAIL PROTECTED]
>>>
>>>
>>>
> --
> We read Knuth so you don't have to. - Tim Peters
>
> Jarek Zgoda, R&D, Redefine
> [EMAIL PROTECTED]
>
>


Re: Solr search not displaying all the indexed values.

2008-10-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
do you have 2 queries in 2 different entities?


On Thu, Oct 16, 2008 at 3:17 PM, con <[EMAIL PROTECTED]> wrote:
>
> I have two queries in my data-config.xml which takes values from multiple
> tables, like:
> select * from EMPLOYEE, CUSTOMER where EMPLOYEE.prod_id= CUSTOMER.prod_id.
>
> When i do a full-import it is indexing all the rows as expected.
>
> But when i search it with a *:* , it is not displaying all the values.
> Do I need any extra configurations?
>
> Thanks
> con
> --
> View this message in context: 
> http://www.nabble.com/Solr-search-not-displaying-all-the-indexed-values.-tp20010401p20010401.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul


Re: Advice on analysis/filtering?

2008-10-16 Thread Grant Ingersoll


On Oct 16, 2008, at 3:07 AM, Jarek Zgoda wrote:


Hello, group.

I'm trying to create a search facility for documents in "broken"  
Polish (by broken I mean "not language rules compliant"),


Can you explain what you mean here a bit more?  I don't know Polish,  
but most spoken languages can't be pinned down to a specific set of  
rules.  In other words, the exception is the rule.  Or, are you saying  
the documents use more dialog based, i.e. more informal, as in two  
people having a conversation?


searchable by terms in "broken" Polish, but broken in many other  
ways than documents. See this example:


document text: "włatcy móch" (in proper Polish this would be  
"władcy much")
example terms that should match: "włatcy much", "wlatcy moch",  
"wladcy much"


This double brokeness ruled out any Polish stemmers currently  
available for Lucene and now I am at point 0. The search results do  
not have to be 100% accurate - some missing results are acceptable,


but "false positives" are not.



There's no such thing in any language.  In your example above, what is  
matching that shouldn't?  Is this happening across a lot of documents,  
or just a few?



Is it at all possible using machinery provided by Solr (I do not own  
PHD in liguistics), or should I ask the business for lowering their  
expectations?


Well, I think there are a couple of approaches:
1. You can write your own filter/stemmer/analyzer that you think fixes  
these issues
2. You can protect the "broken" words and not have them filtered, or  
filter them differently.

3. You can lower expectations.

One thing to try out is Solr's analysis tool in the admin, and see if  
you can get a better handle on what is going wrong.



--
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
http://www.lucenebootcamp.com


Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











How to change a port?

2008-10-16 Thread Aleksey Gogolev

Hello.

Is there a way to change the port (8983) of solr example?
I want to run two solr examples simultaneously.

-- 
Aleksey Gogolev
developer, 
dev.co.ua
Aleksey



snapshooter and spellchecker

2008-10-16 Thread Geoffrey Young
hi all :)

I was surprised to find that snapshooter didn't account for the
spellcheck dictionary.  but then again, since you can call it whatever
you want I guess it couldn't.

so, how are people distributing their dictionaries across their slaves?
 since it takes so long to generate, I can't see it being practical to
generate it on each slave, especially as they'd all have the same data
as the master anyway.

tia

--Geoff


Re: How to change a port?

2008-10-16 Thread Ryan McKinley
that will depend on your servlet container.  (jetty, resin, tomcat,  
etc...)


If you are running jetty from the example, you can change the port by  
adding -Djetty.port=1234 to the command line.  The port is configured  
in example/etc/jetty.xml


the relevant line is:
  >



ryan


On Oct 16, 2008, at 10:30 AM, Aleksey Gogolev wrote:



Hello.

Is there a way to change the port (8983) of solr example?
I want to run two solr examples simultaneously.

--
Aleksey Gogolev
developer,
dev.co.ua
Aleksey





snapcleaner >> problem solr 1.3

2008-10-16 Thread sunnyfr

Hi guys,

I've a wierd problem when I try  to fire snapcleaner manually :
Already : is it correct : [EMAIL PROTECTED]:/data/solr/video#
./bin/snapcleaner -V -D-1

To remove every snapshot older than one day.
It doesn't remove older than one day obviously and debugger show me :

+ [[ -z -1 ]]
+ fixUser -V -D -1
+ [[ -z '' ]]
++ whoami
+ user=root
++ whoami
+ [[ root != root ]]
++ who -m
++ cut '-d ' -f1
++ sed '-es/^.*!//'
+ oldwhoami=root
+ [[ root == '' ]]
+ [[ -z /data/solr/video/data ]]
++ echo /data/solr/video/data
++ cut -c1
+ [[ / != \/ ]]
+ setStartTime
+ [[ Linux == \S\u\n\O\S ]]
++ date +%s
+ start=1224151299
+ logMessage started by root
++ timeStamp
++ date '+%Y/%m/%d %H:%M:%S'
+ echo 2008/10/16 12:01:39 started by root
+ [[ -n '' ]]
+ logMessage command: ./bin/snapcleaner -V -D -1
++ timeStamp
++ date '+%Y/%m/%d %H:%M:%S'
+ echo 2008/10/16 12:01:39 command: ./bin/snapcleaner -V -D -1
+ [[ -n '' ]]
+ trap 'echo "caught INT/TERM, exiting now but partial cleanup may have
already occured";logExit aborted 13' INT TERM
+ [[ -n -1 ]]
+ find /data/solr/video/data -maxdepth 0 -name foobar
+ '[' 0 = 0 ']'
+ maxdepth='-maxdepth 1'
+ logMessage cleaning up snapshots more than -1 days old
++ timeStamp
++ date '+%Y/%m/%d %H:%M:%S'
+ echo 2008/10/16 12:01:39 cleaning up snapshots more than -1 days old
+ [[ -n '' ]]
++ find /data/solr/video/data -maxdepth 1 -name 'snapshot.*' -mtime +-1
-print
find: invalid argument `+-1' to `-mtime'
+ logExit ended 0
+ [[ Linux == \S\u\n\O\S ]]
++ date +%s
+ end=1224151299
++ expr 1224151299 - 1224151299
+ diff=0
++ timeStamp
++ date '+%Y/%m/%d %H:%M:%S'
+ echo '2008/10/16 12:01:39 ended (elapsed time: 0 sec)'
+ exit 0

Any idea why?
thanks 
-- 
View this message in context: 
http://www.nabble.com/snapcleaner-%3E%3E-problem-solr-1.3-tp20010689p20010689.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Advice on analysis/filtering?

2008-10-16 Thread Jarek Zgoda
Wiadomość napisana w dniu 2008-10-16, o godz. 16:21, przez Grant  
Ingersoll:


I'm trying to create a search facility for documents in "broken"  
Polish (by broken I mean "not language rules compliant"),


Can you explain what you mean here a bit more?  I don't know Polish,  
but most spoken languages can't be pinned down to a specific set of  
rules.  In other words, the exception is the rule.  Or, are you  
saying the documents use more dialog based, i.e. more informal, as  
in two people having a conversation?


Some documents (around 15% of all pile) contain the texts entered by  
children from primary school's and that implies many syntactic and  
ortographic errors. The text is indexed "as is" and Solr is able to  
find exact occurences, but I'd like to be able to find also documents  
that contain other variations of errors and proper forms, too. And oh,  
the system will be used by the same aged children, who tends to make  
similar errors when entering search terms.


searchable by terms in "broken" Polish, but broken in many other  
ways than documents. See this example:


document text: "włatcy móch" (in proper Polish this would be  
"władcy much")
example terms that should match: "włatcy much", "wlatcy moch",  
"wladcy much"


This double brokeness ruled out any Polish stemmers currently  
available for Lucene and now I am at point 0. The search results do  
not have to be 100% accurate - some missing results are acceptable,


but "false positives" are not.



There's no such thing in any language.  In your example above, what  
is matching that shouldn't?  Is this happening across a lot of  
documents, or just a few?


Yea, I know that. By "not acceptable" I mean "not acceptable above  
some level". Sorry for this confusion.


Taking word "włatcy" from my example, I'd like to find documents  
containing words "wlatcy" (latin-2 accentuations stripped from  
original), "władcy" (proper form of this noun) and "wladcy" (latin-2  
accents stripped from proper form). The issue #1 (stripping  
accentuations from original) seems to be resolvable outside solr - I  
can index texts with accentuations stripped already. The issue #2  
(finding proper form for word) is the most interesting for me. Issue  
#3 depends on #1 and #2.


Is it at all possible using machinery provided by Solr (I do not  
own PHD in liguistics), or should I ask the business for lowering  
their expectations?


Well, I think there are a couple of approaches:
1. You can write your own filter/stemmer/analyzer that you think  
fixes these issues
2. You can protect the "broken" words and not have them filtered, or  
filter them differently.

3. You can lower expectations.


One thing to try out is Solr's analysis tool in the admin, and see  
if you can get a better handle on what is going wrong.


I'll see how far I could go with spellchecker and fuzzy searches.

--
We read Knuth so you don't have to. - Tim Peters

Jarek Zgoda, R&D, Redefine
[EMAIL PROTECTED]



Re: Solr search not displaying all the indexed values.

2008-10-16 Thread con


Yes. something similar to :

 
 

   

 
 

   

But the searching will not give all the results even if there is only one
result. whereas indexing is fine.
Thanks
con


Noble Paul നോബിള്‍ नोब्ळ् wrote:
> 
> do you have 2 queries in 2 different entities?
> 
> 
> On Thu, Oct 16, 2008 at 3:17 PM, con <[EMAIL PROTECTED]> wrote:
>>
>> I have two queries in my data-config.xml which takes values from multiple
>> tables, like:
>> select * from EMPLOYEE, CUSTOMER where EMPLOYEE.prod_id=
>> CUSTOMER.prod_id.
>>
>> When i do a full-import it is indexing all the rows as expected.
>>
>> But when i search it with a *:* , it is not displaying all the values.
>> Do I need any extra configurations?
>>
>> Thanks
>> con
>> --
>> View this message in context:
>> http://www.nabble.com/Solr-search-not-displaying-all-the-indexed-values.-tp20010401p20010401.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> --Noble Paul
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Solr-search-not-displaying-all-the-indexed-values.-tp20010401p20010927.html
Sent from the Solr - User mailing list archive at Nabble.com.



How Synonyms work in Solr

2008-10-16 Thread payalsharma

Hi,

Please explain that how the below mentioned synonyms patterns work in Solr
Search as there exists several seperators for synonym patterns:

1.

#Explicit mappings match any token sequence on the LHS of "=>"
#and replace with all alternatives on the RHS.  These types of mappings
#ignore the expand parameter in the schema.

#Examples:
i-pod, i pod => ipod, 
sea biscuit, sea biscit => seabiscuit


2.

#Equivalent synonyms may be separated with commas and give
#no explicit mapping.  In this case the mapping behavior will
#be taken from the expand parameter in the schema.  This allows
#the same synonym file to be used in different synonym handling strategies.

#Examples:
ipod, i-pod, i pod
foozball , foosball
universe , cosmos

3.
# If expand==true, "ipod, i-pod, i pod" is equivalent to the explicit
mapping:
ipod, i-pod, i pod => ipod, i-pod, i pod
# If expand==false, "ipod, i-pod, i pod" is equivalent to the explicit
mapping:
ipod, i-pod, i pod => ipod


4.
#multiple synonym mapping entries are merged.
foo => foo bar
foo => baz
#is equivalent to
foo => foo bar, baz


5. 
Explain the meaning of this pattern:

a\=>a => b\=>b
a\,a => b\,b

Questions:

A) Among the following what all characters works as delimeters :
Whitespace(" ") comma(",")  "=>"  "\"   "/"
B) Also, please let us know whether there exists certain other patterns
apart from the above mentioned ones.
C) In  the pattern : ipod, i-pod, i pod 
   Here how we will determine that "i pod" has to be treated as a single
word though it contains Whitespace.
-- 
View this message in context: 
http://www.nabble.com/How-Synonyms-work-in-Solr-tp20014192p20014192.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re[2]: How to change a port?

2008-10-16 Thread Aleksey Gogolev
Hello Ryan,

Thats exactly what I was looking for. Thanks!

RM> that will depend on your servlet container.  (jetty, resin, tomcat,  
RM> etc...)

RM> If you are running jetty from the example, you can change the port by  
RM> adding -Djetty.port=1234 to the command line.  The port is configured  
RM> in example/etc/jetty.xml

RM> the relevant line is:
RM>  >


RM> ryan


RM> On Oct 16, 2008, at 10:30 AM, Aleksey Gogolev wrote:

>>
>> Hello.
>>
>> Is there a way to change the port (8983) of solr example?
>> I want to run two solr examples simultaneously.
>>
>> -- 
>> Aleksey Gogolev
>> developer,
>> dev.co.ua
>> Aleksey
>>


RM> __ NOD32 3528 (20081016) Information __

RM> This message was checked by NOD32 antivirus system.
RM> http://www.eset.com




-- 
Aleksey Gogolev
developer, 
dev.co.ua
Aleksey mailto:[EMAIL PROTECTED]



Re: snapshooter and spellchecker

2008-10-16 Thread Otis Gospodnetic
Geoff, maybe this will help: https://issues.apache.org/jira/browse/SOLR-433

 
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Geoffrey Young <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Thursday, October 16, 2008 10:34:40 AM
> Subject: snapshooter and spellchecker
> 
> hi all :)
> 
> I was surprised to find that snapshooter didn't account for the
> spellcheck dictionary.  but then again, since you can call it whatever
> you want I guess it couldn't.
> 
> so, how are people distributing their dictionaries across their slaves?
> since it takes so long to generate, I can't see it being practical to
> generate it on each slave, especially as they'd all have the same data
> as the master anyway.
> 
> tia
> 
> --Geoff



Re: snapcleaner >> problem solr 1.3

2008-10-16 Thread Chris Haggstrom


On Oct 16, 2008, at 4:29 AM, sunnyfr wrote:



still nothing changed :


It looks like it worked better to me, in that it resulted in a valid  
find command for any snapshots with an -mtime of +1:


++ find /data/solr/video/data -maxdepth 1 -name 'snapshot.*' -mtime +1  
-print


instead of showing an error like before:

++ find /data/solr/video/data -maxdepth 1 -name 'snapshot.*' -mtime  
+-1 -print

find: invalid argument `+-1' to `-mtime'

But it didn't find any snapshots to remove.   Do you have any  
snapshots that haven't been modified in 2+ days?  Due to the way find - 
mtime works (looking at the modification day, and ignoring fractions  
of days), for a snapshot to match, it would have to not have been  
modified for a couple days.


-Chris


Re: How Synonyms work in Solr

2008-10-16 Thread Otis Gospodnetic
Hi,

It looks like you have not seen a pretty detailed page on Synonyms on the Solr 
wiki.  Have a look, I think you'll find answers to your questions there.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: payalsharma <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Thursday, October 16, 2008 9:55:55 AM
> Subject: How Synonyms work in Solr
> 
> 
> Hi,
> 
> Please explain that how the below mentioned synonyms patterns work in Solr
> Search as there exists several seperators for synonym patterns:
> 
> 1.
> 
> #Explicit mappings match any token sequence on the LHS of "=>"
> #and replace with all alternatives on the RHS.  These types of mappings
> #ignore the expand parameter in the schema.
> 
> #Examples:
> i-pod, i pod => ipod, 
> sea biscuit, sea biscit => seabiscuit
> 
> 
> 2.
> 
> #Equivalent synonyms may be separated with commas and give
> #no explicit mapping.  In this case the mapping behavior will
> #be taken from the expand parameter in the schema.  This allows
> #the same synonym file to be used in different synonym handling strategies.
> 
> #Examples:
> ipod, i-pod, i pod
> foozball , foosball
> universe , cosmos
> 
> 3.
> # If expand==true, "ipod, i-pod, i pod" is equivalent to the explicit
> mapping:
> ipod, i-pod, i pod => ipod, i-pod, i pod
> # If expand==false, "ipod, i-pod, i pod" is equivalent to the explicit
> mapping:
> ipod, i-pod, i pod => ipod
> 
> 
> 4.
> #multiple synonym mapping entries are merged.
> foo => foo bar
> foo => baz
> #is equivalent to
> foo => foo bar, baz
> 
> 
> 5. 
> Explain the meaning of this pattern:
> 
> a\=>a => b\=>b
> a\,a => b\,b
> 
> Questions:
> 
> A) Among the following what all characters works as delimeters :
> Whitespace(" ") comma(",")  "=>"  "\"   "/"
> B) Also, please let us know whether there exists certain other patterns
> apart from the above mentioned ones.
> C) In  the pattern : ipod, i-pod, i pod 
>Here how we will determine that "i pod" has to be treated as a single
> word though it contains Whitespace.
> -- 
> View this message in context: 
> http://www.nabble.com/How-Synonyms-work-in-Solr-tp20014192p20014192.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Advice on analysis/filtering?

2008-10-16 Thread Andrzej Bialecki

Jarek Zgoda wrote:

Wiadomość napisana w dniu 2008-10-16, o godz. 16:21, przez Grant Ingersoll:

I'm trying to create a search facility for documents in "broken" 
Polish (by broken I mean "not language rules compliant"),


Can you explain what you mean here a bit more?  I don't know Polish, 


Hi guys,

I do speak Polish :) maybe I can help here a bit.


Some documents (around 15% of all pile) contain the texts entered by 
children from primary school's and that implies many syntactic and 
ortographic errors.


document text: "włatcy móch" (in proper Polish this would be "władcy 
much")
example terms that should match: "włatcy much", "wlatcy moch", 
"wladcy much"


These examples can be classified as "sounds like", and typically 
soundexing algorithms are used to address this problem, in order to 
generate initial suggestions. After that you can use other heuristic 
rules to select the most probable correct forms.


AFAIK, there are no (public) soundex implementations for Polish, in 
particular in Java, although there was some research work done on the 
construction of a specifically Polish soundex. You could also use the 
Daitch-Mokotoff soundex, which comes close enough.



Taking word "włatcy" from my example, I'd like to find documents 
containing words


"wlatcy" (latin-2 accentuations stripped from original), 


This step is trivial.

"władcy" (proper form of this noun) and "wladcy" (latin-2 
accents stripped from proper form).


And this one is not. It requires using something like soundexing in 
order to look up possible similar terms. However ... in this process you 
inevitably collect false positives, and you don't have any way in the 
input text to determine that they should be rejected. You can only make 
this decision based on some external knowledge of Polish, such as:


* a morpho-syntactic analyzer that will determine which combinations of 
suggestions are more correct and more probable,


* a language model that for any given soundexed phrase can generate the 
most probable original phrases.


Also, knowing the context in which a query is asked may help, but 
usually you don't have this information (queries are short).


--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: How to retrieve all field names of index of one type

2008-10-16 Thread Otis Gospodnetic
Hi,

I don't have the sources handy, but look at the Luke request handler in Solr 
sources and you'll see how it can be done.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: prerna07 <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, October 15, 2008 2:51:48 AM
> Subject: How to retrieve all field names of index of one type
> 
> 
> Hi,
> 
> I want to retrieve all field names of one index type, is there any way solr
> can do this? 
> 
> For example: I have 3 index with the field name and value : 
> ProductVO
> I want to retrieve all other field names present in the indexes which have
> field name as index_type and value as "ProductVO".
> 
> Please let me know if you need more details.
> 
> Thanks,
> Prerna
> -- 
> View this message in context: 
> http://www.nabble.com/How-to-retrieve-all-field-names-of-index-of-one-type-tp19987807p19987807.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: updating documents in solr 1.3.0

2008-10-16 Thread Bill Au
This is being worked on for Solr 1.4:

https://issues.apache.org/jira/browse/SOLR-139

Bill

On Wed, Oct 15, 2008 at 7:47 PM, Walter Underwood <[EMAIL PROTECTED]>wrote:

> Neither Solr no Lucene support partial updates. "Update" means
> "add or replace". --wunder
>
> On 10/15/08 4:23 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >   I've been trying to find a way to post partial updates, updating
> only
> > some of the fields in a set of records, via POSTed XML messages to a solr
> > 1.3.0 index. In the wiki (http://wiki.apache.org/solr/UpdateXmlMessages
> ),
> > it almost seems like there's a special  root tag which isn't
> > mentioned anywhere else. Am I correct in assuming that no such 
> tag
> > exists?
> >
> > Thanks in advance,
> >
> > Evan Kelsey
> >
> >
> > http://www.mintel.com
> > providing insight + impact
> >
> > Chicago Office:
> > Mintel International Group Ltd (Mintel)
> > 351 Hubbard Street, Floor 8
> > Chicago, IL 60610
> > USA
> >
> > Tel: 312 932 0400
> > Fax: 312 932 0469
> >
> > London Office:
> > Mintel International Group Ltd (Mintel)
> > 18-19 Long Lane
> > London
> > EC1A 9PL
> > UK
> >
> > Tel: 020 7606 4533
> > Fax: 020 7606 5932
> >
> >
> > Notice
> > 
> > This email may contain information that is privileged,
> > confidential or otherwise protected from disclosure. It
> > must not be used by, or its contents copied or disclosed
> > to, persons other than the addressee. If you have received
> > this email in error please notify the sender immediately
> > and delete the email. Any views or opinions expressed in
> > this message are solely those of the author, and do not
> > necessarily reflect those of Mintel.
> >
> > No Mintel staff are authorised to make purchases using
> > email or over the internet, and any contracts so performed
> > are invalid.
> >
> > Warning
> > **
> > It is the responsibility of the recipient to ensure that
> > the onward transmission, opening or use of this message
> > and any attachments will not adversely affect their systems
> > or data. Please carry out such virus and other checks, as
> > you consider appropriate.
> >
>
>


RegexTransformer debugging (DIH)

2008-10-16 Thread Jon Baer
Is there a way to prevent this from occurring (or a way to nail down  
the doc which is causing it?):


INFO: [news] webapp=/solr path=/admin/dataimport  
params={command=status} status=0 QTime=0

Exception in thread "Thread-14" java.lang.StackOverflowError
at java.util.regex.Pattern$Single.match(Pattern.java:3313)
at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4763)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4637)
at java.util.regex.Pattern$All.match(Pattern.java:4079)
at java.util.regex.Pattern$Branch.match(Pattern.java:4538)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4578)
at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4767)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4637)
at java.util.regex.Pattern$All.match(Pattern.java:4079)
at java.util.regex.Pattern$Branch.match(Pattern.java:4538)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4578)
at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4767)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4637)
at java.util.regex.Pattern$All.match(Pattern.java:4079)

Thanks.

- Jon



Re: Tree Faceting Component

2008-10-16 Thread Jeremy Hinegardner
Erik,

After some more experiments, I can get it to perform incorrectly using the
sample solr data.

The example query from SOLR-792 ticket:
  
http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=cat&facet.tree=cat,inStock&wt=json&indent=on

Make a few altertions to the query:

1) swap the tree order - all tree facets are 0
  
http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=cat&facet.tree=inStock,cat&wt=json&indent=on

2) swap tree order and change facet.field to be the primary( inStock )
  
http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=inStock&facet.tree=inStock,cat&wt=json&indent=on

Also, can tree faceting work distributed?

enjoy,

-jeremy

On Wed, Oct 15, 2008 at 05:41:21PM -0700, Erik Hatcher wrote:
> Jeremy,
>
> What's the full request you're making to Solr?
>
> Do you get values when you facet normally on date_id and type?  
> &facet.field=date_id&facet.field=type
>
>   Erik
>
> p.s. this e-mail is not on the list (on a hotel net connection blocking 
> outgoing mail) - feel free to reply to this back on the list though.
>
> On Oct 15, 2008, at 5:29 PM, Jeremy Hinegardner wrote:
>
>> Hi all,
>>
>> I'm testing out using the Tree Faceting Component (SOLR-792) on top of 
>> Solr 1.3.
>>
>> It looks like it would do exactly what I want, but something is not 
>> working
>> correctly with my schema.  When I use the example schema, it works just 
>> fine,
>> but I swap out the example schema's and example index and then put in my 
>> index
>> and and schema,  tree facet does not work.
>>
>> Both of the fields I want to facet can be faceted individually, but when I 
>> say
>> facet.tree=date_id,type then all of the values are 0.
>>
>> Does anyone have any ideas on where I should start looking ?
>>
>> enjoy,
>>
>> -jeremy
>>
>> -- 
>> 
>> Jeremy Hinegardner  [EMAIL PROTECTED]
>

-- 

 Jeremy Hinegardner  [EMAIL PROTECTED] 



Different XML format for multi-valued fields?

2008-10-16 Thread oleg_gnatovskiy

Hello. I have an index built in Solr with several multi-value fields. When
the multi-value field has only one value for a document, the XML returned
looks like this: 

5693

However, when there are multiple values for the field, the XMl looks like
this: 
arr name="someIds">
11199
1722

Is there a reason for this difference? Also, how does faceting work with
multi-valued fields? It seems that I sometimes get facet results from
multi-valued fields, and sometimes I don't.

Thanks.
-- 
View this message in context: 
http://www.nabble.com/Different-XML-format-for-multi-valued-fields--tp20015951p20015951.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: dataimport, both splitBy and dateTimeFormat

2008-10-16 Thread David Smiley @MITRE.org

The wiki didn't mention I can specify multiple transformers.  BTW, it's
"transformer" (singular), not "transformers".  I did mean both NFT and DFT
because I was speaking of the general case, not just mine in particular.  I
thought that the built-in transformers were always in-effect and so I
expected NFT,DFT to occur last.  Sorry if I wasn't clear.

Thanks for your help; it worked.

~ David


Shalin Shekhar Mangar wrote:
> 
> Hi David,
> 
> I think you meant RegexTransformer instead of NumberFormatTransformer.
> Anyhow, the order in which the transformers are applied is the same as the
> order in which you specify them.
> 
> So make sure your entity has
> transformers="RegexTransformer,DateFormatTransformer".
> 
> On Thu, Oct 16, 2008 at 6:14 PM, David Smiley @MITRE.org
> <[EMAIL PROTECTED]>wrote:
> 
>>
>> I'm trying out the dataimport capability.  I have a column that is a
>> series
>> of dates separated by spaces like so:
>> "1996-00-00 1996-04-00"
>> And I'm trying to import it like so:
>> 
>>
>> However this fails and the stack trace suggests it is first trying to
>> apply
>> the dateTimeFormat before splitBy.  I think this is a bug... dataimport
>> should apply DateFormatTransformer and NumberFormatTransformer last.
>>
>> ~ David Smiley
>> --
>> View this message in context:
>> http://www.nabble.com/dataimport%2C-both-splitBy-and-dateTimeFormat-tp20013006p20013006.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 
-- 
View this message in context: 
http://www.nabble.com/dataimport%2C-both-splitBy-and-dateTimeFormat-tp20013006p20016178.html
Sent from the Solr - User mailing list archive at Nabble.com.



Reduction of open files

2008-10-16 Thread Paul deGrandis
I have been working with SOLR for a few months now.  According to some
documentation I read, segment files only have one set of all the other
lingustic module type of stuff (normalization, frequency), is there a
way to remove/reduce the files not associated with a segment besides
optimizing the index?

I set my mergeFactor to 2 for sake of trying to tease out a solution.
I have tried readercycle thinking it was just stale readers.  That did
not work.

If anyone has any experience or knows of any documentation that can
get me closer to achieving this, I would greatly appreciate it.

Paul


Re: Reduction of open files

2008-10-16 Thread Grant Ingersoll

Are you using the compound file format?

-Grant

On Oct 16, 2008, at 3:28 PM, Paul deGrandis wrote:


I have been working with SOLR for a few months now.  According to some
documentation I read, segment files only have one set of all the other
lingustic module type of stuff (normalization, frequency), is there a
way to remove/reduce the files not associated with a segment besides
optimizing the index?

I set my mergeFactor to 2 for sake of trying to tease out a solution.
I have tried readercycle thinking it was just stale readers.  That did
not work.

If anyone has any experience or knows of any documentation that can
get me closer to achieving this, I would greatly appreciate it.

Paul


--
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
http://www.lucenebootcamp.com


Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











Re: Reduction of open files

2008-10-16 Thread Paul deGrandis
I currently am not.

The document collection is highly volatile (3000 modifications a
minute) and from reading thought it would be too much of a performance
penalty but never tested it.

What behavior in terms of file creation and open fd is seen when
useCompoundFile is set to true?

Paul


On Thu, Oct 16, 2008 at 4:16 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> Are you using the compound file format?
>
> -Grant
>
> On Oct 16, 2008, at 3:28 PM, Paul deGrandis wrote:
>
>> I have been working with SOLR for a few months now.  According to some
>> documentation I read, segment files only have one set of all the other
>> lingustic module type of stuff (normalization, frequency), is there a
>> way to remove/reduce the files not associated with a segment besides
>> optimizing the index?
>>
>> I set my mergeFactor to 2 for sake of trying to tease out a solution.
>> I have tried readercycle thinking it was just stale readers.  That did
>> not work.
>>
>> If anyone has any experience or knows of any documentation that can
>> get me closer to achieving this, I would greatly appreciate it.
>>
>> Paul
>
> --
> Grant Ingersoll
> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
> http://www.lucenebootcamp.com
>
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>
>


Re: Reduction of open files

2008-10-16 Thread Paul deGrandis
My biggest concern is why do the remaining files stay open even if my
mergeFactor is 2.

I would expect to see one or two segment files and one or two sets of
accompanying file (.nrm, .frq, etc), based on the documentation.

Paul

On Thu, Oct 16, 2008 at 4:23 PM, Paul deGrandis
<[EMAIL PROTECTED]> wrote:
> I currently am not.
>
> The document collection is highly volatile (3000 modifications a
> minute) and from reading thought it would be too much of a performance
> penalty but never tested it.
>
> What behavior in terms of file creation and open fd is seen when
> useCompoundFile is set to true?
>
> Paul
>
>
> On Thu, Oct 16, 2008 at 4:16 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
>> Are you using the compound file format?
>>
>> -Grant
>>
>> On Oct 16, 2008, at 3:28 PM, Paul deGrandis wrote:
>>
>>> I have been working with SOLR for a few months now.  According to some
>>> documentation I read, segment files only have one set of all the other
>>> lingustic module type of stuff (normalization, frequency), is there a
>>> way to remove/reduce the files not associated with a segment besides
>>> optimizing the index?
>>>
>>> I set my mergeFactor to 2 for sake of trying to tease out a solution.
>>> I have tried readercycle thinking it was just stale readers.  That did
>>> not work.
>>>
>>> If anyone has any experience or knows of any documentation that can
>>> get me closer to achieving this, I would greatly appreciate it.
>>>
>>> Paul
>>
>> --
>> Grant Ingersoll
>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
>> http://www.lucenebootcamp.com
>>
>>
>> Lucene Helpful Hints:
>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>


RE: error with delta import

2008-10-16 Thread Lance Norskog
If you make a database view with the query, it is easy to examine the data you 
want to index. Then, your solr import query would just pull the view.  The Solr 
setup file is much simpler this way.

-Original Message-
From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 15, 2008 2:46 AM
To: solr-user@lucene.apache.org
Subject: Re: error with delta import

The delta implementation is a bit fragile in DIH for complex queries

I recommend you do delta-import using a full-import

.



Synonym format not working

2008-10-16 Thread prerna07

Hi,

I am facing issue in synonym search of solr. The synonym.txt contain the
format:

ccc => 1,2,ccc
ccc => 3

I am not getting any search result for ccc. I have created indexes with
string value.

Do i need to change anything in schema .xml ?

 String tag from Schema.xml : 
 
 







  


Any pointers to solve the issue?

Thanks,
Prerna


-- 
View this message in context: 
http://www.nabble.com/Synonym--format-not-working-tp20026988p20026988.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: error with delta import

2008-10-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
the last-index_time is available only from second time onwards that is
. It expects a full-import to be done first
It knows that by the presence of dataimport.properties in the  config
directory. Did you check if it is present?


On Thu, Oct 16, 2008 at 5:33 PM, Florian Aumeier
<[EMAIL PROTECTED]> wrote:
> Noble Paul നോബിള്‍ नोब्ळ् schrieb:
>>>
>>> Well, when doing the way you described below (full-import with the delta
>>> query), the '${dataimporter.last_index_time}' timestamp is empty:
>>>
>>
>> I guess this was fixed post 1.3 . probably you can take
>> dataimporthandler.jar from a nightly build (you may also need to add
>> slf4j.jar)
>>
>>>
> I replaced
> dist/apache-solr-dataimporthandler-1.3.0.jar
> dist/solrj-lib/slf4j-api-1.5.3.jar
> dist/solrj-lib/slf4j-jdk14-1.5.3.jar
>
> with their counterparts from the nightly build, but it did not help. Then I
> tried to enter the date kind of hard coded (now() - '12 hours'::interval).
> Everything looks fine, but there are no new documents in the index.
>
> here is the log:
>
> INFO: Starting Full Import
> Oct 16, 2008 1:07:08 PM org.apache.solr.core.SolrCore executeINFO: [test]
> webapp=/solr path=/dataimport
> params={command=full-import&clean=false&entity=articles-delta} status=0
> QTime=0
> Oct 16, 2008 1:07:08 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
> call
> INFO: Creating a connection for entity articles-delta with URL:
> jdbc:postgresql://bm02:5432/bm
> Oct 16, 2008 1:07:08 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
> callINFO: Time taken for getConnection(): 45
> Oct 16, 2008 1:14:53 PM org.apache.solr.core.SolrCore execute
> INFO: [test] webapp=/solr path=/dataimport params={} status=0 QTime=1
> Oct 16, 2008 1:16:11 PM org.apache.solr.handler.dataimport.SolrWriter
> readIndexerPropertiesINFO: Read dataimport.properties
> Oct 16, 2008 1:16:11 PM org.apache.solr.handler.dataimport.SolrWriter
> persistStartTime
> INFO: Wrote last indexed time to dataimport.properties
> Oct 16, 2008 1:16:11 PM org.apache.solr.handler.dataimport.DocBuilder
> commitINFO: Full Import completed successfullyOct 16, 2008 1:16:11 PM
> org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: start commit(optimize=true,waitFlush=false,waitSearcher=true)Oct 16,
> 2008 1:16:11 PM org.apache.solr.search.SolrIndexSearcher INFO: Opening
> [EMAIL PROTECTED] mainOct 16, 2008 1:16:11 PM
> org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: end_commit_flush
> ... (autowarming)
> Oct 16, 2008 1:16:12 PM org.apache.solr.handler.dataimport.DocBuilder
> execute
> INFO: Time taken = 0:9:3.231
>
>



-- 
--Noble Paul


Re: Tree Faceting Component

2008-10-16 Thread Jeremy Hinegardner
After a bit more investigating, it appears that any facet tree where the first
item is numerical or boolean or some non-textual type does not produce any
secondary facets.  This includes sint, sfloat, boolean and such.

For instance, on the sample index:

  facet.tree=sku,cat => works
  facet.tree=cat,sku => works
  facet.tree=manu_exact,cat => works
  facet.tree=cat,manu_exact => works
  facet.tree=popularity,inStock => fails
  facet.tree=inStock,popularity => fails
  facet.tree=manu_exact,weight => works
  facet.tree=weight,manu_exact => fails

I'm not very familiar with the Solr / Lucene Java API, so this is slow going
here.  Maybe I'm barking up the wrong tree, but is the TermQuery for the
secondary SimpleFacet messing up some how?  I tried to dig into the code, but
was unsuccessful.  

It appears to me that the searcher never returns a docSet for any TermQuery
where the field being searched has a type that is non-textual. 

As a final test, I changed the schema and made the inStock field a 'text' field
instead of 'boolean'.  When I did that, and reindexed the sample data then the
tree facet would work correctly as either facet.tree=cat,inStock or
facet.tree=inStock,cat.  Whereas before it would only work in the former. 

enjoy,

-jeremy

On Thu, Oct 16, 2008 at 10:55:49AM -0600, Jeremy Hinegardner wrote:
> Erik,
> 
> After some more experiments, I can get it to perform incorrectly using the
> sample solr data.
> 
> The example query from SOLR-792 ticket:
>   
> http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=cat&facet.tree=cat,inStock&wt=json&indent=on
> 
> Make a few altertions to the query:
> 
> 1) swap the tree order - all tree facets are 0
>   
> http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=cat&facet.tree=inStock,cat&wt=json&indent=on
> 
> 2) swap tree order and change facet.field to be the primary( inStock )
>   
> http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=inStock&facet.tree=inStock,cat&wt=json&indent=on
> 
> Also, can tree faceting work distributed?
> 
> enjoy,
> 
> -jeremy
> 
> On Wed, Oct 15, 2008 at 05:41:21PM -0700, Erik Hatcher wrote:
> > Jeremy,
> >
> > What's the full request you're making to Solr?
> >
> > Do you get values when you facet normally on date_id and type?  
> > &facet.field=date_id&facet.field=type
> >
> > Erik
> >
> > p.s. this e-mail is not on the list (on a hotel net connection blocking 
> > outgoing mail) - feel free to reply to this back on the list though.
> >
> > On Oct 15, 2008, at 5:29 PM, Jeremy Hinegardner wrote:
> >
> >> Hi all,
> >>
> >> I'm testing out using the Tree Faceting Component (SOLR-792) on top of 
> >> Solr 1.3.
> >>
> >> It looks like it would do exactly what I want, but something is not 
> >> working
> >> correctly with my schema.  When I use the example schema, it works just 
> >> fine,
> >> but I swap out the example schema's and example index and then put in my 
> >> index
> >> and and schema,  tree facet does not work.
> >>
> >> Both of the fields I want to facet can be faceted individually, but when I 
> >> say
> >> facet.tree=date_id,type then all of the values are 0.
> >>
> >> Does anyone have any ideas on where I should start looking ?
> >>
> >> enjoy,
> >>
> >> -jeremy
> >>
> >> -- 
> >> 
> >> Jeremy Hinegardner  [EMAIL PROTECTED]
> >
> 
> -- 
> 
>  Jeremy Hinegardner  [EMAIL PROTECTED] 
> 

-- 

 Jeremy Hinegardner  [EMAIL PROTECTED] 



Re: dataimport, both splitBy and dateTimeFormat

2008-10-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
Thanks David,
I have updated the wiki documentation
http://wiki.apache.org/solr/DataImportHandler#transformer

The default transformers do not have any special privilege it is like
any normal user provided transformer.We just identified some commonly
found usecases and added transformers for that.

 Applying a transformer is not very 'cheap' it has to do extra checks
to know whether to apply or not.

On Fri, Oct 17, 2008 at 12:26 AM, David Smiley @MITRE.org
<[EMAIL PROTECTED]> wrote:
>
> The wiki didn't mention I can specify multiple transformers.  BTW, it's
> "transformer" (singular), not "transformers".  I did mean both NFT and DFT
> because I was speaking of the general case, not just mine in particular.  I
> thought that the built-in transformers were always in-effect and so I
> expected NFT,DFT to occur last.  Sorry if I wasn't clear.
>
> Thanks for your help; it worked.
>
> ~ David
>
>
> Shalin Shekhar Mangar wrote:
>>
>> Hi David,
>>
>> I think you meant RegexTransformer instead of NumberFormatTransformer.
>> Anyhow, the order in which the transformers are applied is the same as the
>> order in which you specify them.
>>
>> So make sure your entity has
>> transformers="RegexTransformer,DateFormatTransformer".
>>
>> On Thu, Oct 16, 2008 at 6:14 PM, David Smiley @MITRE.org
>> <[EMAIL PROTECTED]>wrote:
>>
>>>
>>> I'm trying out the dataimport capability.  I have a column that is a
>>> series
>>> of dates separated by spaces like so:
>>> "1996-00-00 1996-04-00"
>>> And I'm trying to import it like so:
>>> 
>>>
>>> However this fails and the stack trace suggests it is first trying to
>>> apply
>>> the dateTimeFormat before splitBy.  I think this is a bug... dataimport
>>> should apply DateFormatTransformer and NumberFormatTransformer last.
>>>
>>> ~ David Smiley
>>> --
>>> View this message in context:
>>> http://www.nabble.com/dataimport%2C-both-splitBy-and-dateTimeFormat-tp20013006p20013006.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>>
> --
> View this message in context: 
> http://www.nabble.com/dataimport%2C-both-splitBy-and-dateTimeFormat-tp20013006p20016178.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul


Re: RegexTransformer debugging (DIH)

2008-10-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
If it is a normal exception it is logged with the number of document
where it failed and you can put it on debugger with start=&rows=1

We do not catch a throwable or Error so it gets slipped through.

if you are adventurous enough wrap the RegexTranformer with your own
and apply that say transformer="my.ReegexWrapper" and catch a
throwable and print out the row.




On Thu, Oct 16, 2008 at 9:49 PM, Jon Baer <[EMAIL PROTECTED]> wrote:
> Is there a way to prevent this from occurring (or a way to nail down the doc
> which is causing it?):
>
> INFO: [news] webapp=/solr path=/admin/dataimport params={command=status}
> status=0 QTime=0
> Exception in thread "Thread-14" java.lang.StackOverflowError
>at java.util.regex.Pattern$Single.match(Pattern.java:3313)
>at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4763)
>at java.util.regex.Pattern$GroupTail.match(Pattern.java:4637)
>at java.util.regex.Pattern$All.match(Pattern.java:4079)
>at java.util.regex.Pattern$Branch.match(Pattern.java:4538)
>at java.util.regex.Pattern$GroupHead.match(Pattern.java:4578)
>at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4767)
>at java.util.regex.Pattern$GroupTail.match(Pattern.java:4637)
>at java.util.regex.Pattern$All.match(Pattern.java:4079)
>at java.util.regex.Pattern$Branch.match(Pattern.java:4538)
>at java.util.regex.Pattern$GroupHead.match(Pattern.java:4578)
>at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4767)
>at java.util.regex.Pattern$GroupTail.match(Pattern.java:4637)
>at java.util.regex.Pattern$All.match(Pattern.java:4079)
>
> Thanks.
>
> - Jon
>
>



-- 
--Noble Paul


Re: Different XML format for multi-valued fields?

2008-10-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
The component that writes out the values do not know if it is
multivalued or not. So if it finds only a single value it writes it
out as such


On Thu, Oct 16, 2008 at 10:52 PM, oleg_gnatovskiy
<[EMAIL PROTECTED]> wrote:
>
> Hello. I have an index built in Solr with several multi-value fields. When
> the multi-value field has only one value for a document, the XML returned
> looks like this:
> 
> 5693
> 
> However, when there are multiple values for the field, the XMl looks like
> this:
> arr name="someIds">
> 11199
> 1722
> 
> Is there a reason for this difference? Also, how does faceting work with
> multi-valued fields? It seems that I sometimes get facet results from
> multi-valued fields, and sometimes I don't.
>
> Thanks.
> --
> View this message in context: 
> http://www.nabble.com/Different-XML-format-for-multi-valued-fields--tp20015951p20015951.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul


Re: Reduction of open files

2008-10-16 Thread Otis Gospodnetic
Out of curiosity, how many files are held open when you hit the limit?  What 
does ulimit show?  And what does lsof show?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Paul deGrandis <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Thursday, October 16, 2008 3:28:29 PM
> Subject: Reduction of open files
> 
> I have been working with SOLR for a few months now.  According to some
> documentation I read, segment files only have one set of all the other
> lingustic module type of stuff (normalization, frequency), is there a
> way to remove/reduce the files not associated with a segment besides
> optimizing the index?
> 
> I set my mergeFactor to 2 for sake of trying to tease out a solution.
> I have tried readercycle thinking it was just stale readers.  That did
> not work.
> 
> If anyone has any experience or knows of any documentation that can
> get me closer to achieving this, I would greatly appreciate it.
> 
> Paul



Re: Synonym format not working

2008-10-16 Thread Otis Gospodnetic
I can't see the problem at the moment.  What do you see when you use 
&debugQuery=true in the URL?  Do you see the query that includes synonyms?  Can 
you give us the actual query and actual synonyms?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: prerna07 <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Friday, October 17, 2008 12:36:40 AM
> Subject: Synonym  format not working
> 
> 
> Hi,
> 
> I am facing issue in synonym search of solr. The synonym.txt contain the
> format:
> 
> ccc => 1,2,ccc
> ccc => 3
> 
> I am not getting any search result for ccc. I have created indexes with
> string value.
> 
> Do i need to change anything in schema .xml ?
> 
> String tag from Schema.xml : 
> 
> omitNorms="true">
> 
> 
> 
> ignoreCase="true" expand="true"/>
> 
> words="stopwords.txt"/>
> 
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> 
> 
> protected="protwords.txt"/>
> 
>   
> 
> 
> Any pointers to solve the issue?
> 
> Thanks,
> Prerna
> 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/Synonym--format-not-working-tp20026988p20026988.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Synonym format not working

2008-10-16 Thread prerna07

Actual synonym :
ccc => 1,2
ccc=>3

The result when i added &dubugQuery=true is:

  
- 
- 
  0 
  15 
- 
  10 
  0 
  on 
  ccc 
  true 
  2.2 
  
  
   
- 
  ccc 
  ccc 
  MultiPhraseQuery(all:" (1 ) (2 ccc )
3") 
  all:" (1 ) (2 ccc ) 3" 
   
  OldLuceneQParser 
- 
  8.0 
- 
  2.0 
- 
  1.0 
  
- 
  0.0 
  
- 
  0.0 
  
- 
  0.0 
  
- 
  0.0 
  
  
- 
  4.0 
- 
  2.0 
  
- 
  0.0 
  
- 
  0.0 
  
- 
  0.0 
  
- 
  2.0 
  
  
  
  
  



Otis Gospodnetic wrote:
> 
> I can't see the problem at the moment.  What do you see when you use
> &debugQuery=true in the URL?  Do you see the query that includes synonyms? 
> Can you give us the actual query and actual synonyms?
> 
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
>> From: prerna07 <[EMAIL PROTECTED]>
>> To: solr-user@lucene.apache.org
>> Sent: Friday, October 17, 2008 12:36:40 AM
>> Subject: Synonym  format not working
>> 
>> 
>> Hi,
>> 
>> I am facing issue in synonym search of solr. The synonym.txt contain the
>> format:
>> 
>> ccc => 1,2,ccc
>> ccc => 3
>> 
>> I am not getting any search result for ccc. I have created indexes with
>> string value.
>> 
>> Do i need to change anything in schema .xml ?
>> 
>> String tag from Schema.xml : 
>> 
>> omitNorms="true">
>> 
>> 
>> 
>> ignoreCase="true" expand="true"/>
>> 
>> words="stopwords.txt"/>
>> 
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>> 
>> 
>> protected="protwords.txt"/>
>> 
>>   
>> 
>> 
>> Any pointers to solve the issue?
>> 
>> Thanks,
>> Prerna
>> 
>> 
>> -- 
>> View this message in context: 
>> http://www.nabble.com/Synonym--format-not-working-tp20026988p20026988.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Synonym--format-not-working-tp20026988p20027720.html
Sent from the Solr - User mailing list archive at Nabble.com.