Re: FW: DIH relating multiple DataSources

2011-03-28 Thread Jeffrey Chang
I'll reply the solution to this thread on my own (with a different email
address).

Did some debugging on 1.4.1 source code, my issue is in the data-config.xml
file, the field column name when stored in Map object uses the DBs column
casing (e.g. ID --> id):



...
   
...

The config above does not work because the parentEntity.ID, the "ID" token
when compared in the Map object is stored as 'id'. If I change
parentEntity.ID to parentEntity.id (lower case id) then it works.

Perhaps the class *VariableResolverImpl *should consider a case insensitive
Map get?
Thanks,
Jeff
On Mon, Mar 28, 2011 at 3:18 PM,  wrote:

>
>
> -Original Message-
> From: Jeffrey Chang (IS-TW)
> Sent: Saturday, March 26, 2011 9:00 PM
> To: solr-user@lucene.apache.org
> Subject: DIH relating multiple DataSources
>
> Hi All,
>
> I'm a newbie to SOLR and is hoping to get some help.
>
> I was able to get DIH to work with one datasource. What I'm trying to
> achieve is using two datasources to build my document. Below is my
> data-config:
>
> 
>  url="jdbc:mysql://localhost:3306/ebook" user="ebook" password="masked"
> batchSize="1" />
>  url="jdbc:mysql://tw-stntlab1:3306/test" user="root" password="masked"
> batchSize="1" />
>
> pk="ID" query="select * from epub">
>
>
>
> query="select TESTCOLUMN from jctest where ID='${epub.ID}'">
>
>
>
>
> 
>
> If the above possible? I can't seem to get my "title" field above populated
> from a second datasource but the fields identified in my rootEntity using
> the first datasource works perfectly fine.
>
> Thanks,
> Jeff
> TREND MICRO EMAIL NOTICE
> The information contained in this email and any attachments is confidential
> and may be subject to copyright or other intellectual property protection.
> If you are not the intended recipient, you are not authorized to use or
> disclose this information, and we request that you notify us by reply mail
> or telephone and delete the original message from your mail system.
>
>


ComplexPhraseQueryParser and wildcards

2011-03-28 Thread jmr
Hi,

I'm using ComplexPhraseQueryParser and I'm quite happy with it.
However, there are some queries using wildcards nor working.

Exemple: I want to do a proximity search between the word compiler and the
expression 'cross linker' or 'cross linking' or 'cross linked' ...

("cross-linker compiler"~50 OR "cross-linking compiler"~50) is working OK
but ("cross-link* compiler"~50) is not working (returns nothing)

Is there another syntax allowing to do sucj query ?

Thanks
JMR

--
View this message in context: 
http://lucene.472066.n3.nabble.com/ComplexPhraseQueryParser-and-wildcards-tp2742244p2742244.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ComplexPhraseQueryParser and wildcards

2011-03-28 Thread Chandan Tamrakar
did you get any exceptions ?
usually wild card term you mentioned would be expanded before being actually
searched .

thanks.

On Mon, Mar 28, 2011 at 1:24 PM, jmr  wrote:

> Hi,
>
> I'm using ComplexPhraseQueryParser and I'm quite happy with it.
> However, there are some queries using wildcards nor working.
>
> Exemple: I want to do a proximity search between the word compiler and the
> expression 'cross linker' or 'cross linking' or 'cross linked' ...
>
> ("cross-linker compiler"~50 OR "cross-linking compiler"~50) is working OK
> but ("cross-link* compiler"~50) is not working (returns nothing)
>
> Is there another syntax allowing to do sucj query ?
>
> Thanks
> JMR
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/ComplexPhraseQueryParser-and-wildcards-tp2742244p2742244.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Chandan Tamrakar
*
*


Re: problem with snowballporterfilterfactory

2011-03-28 Thread anurag.walia
Thanks Erick for replied,

I used "protwords.txt" for matching the result  for singular and plural
words like bag and bags.


Regards
Anurag Walia


--
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-with-snowballporterfilterfactory-tp2729589p2742365.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ComplexPhraseQueryParser and wildcards

2011-03-28 Thread jmr

Chandan Tamrakar-2 wrote:
> 
> did you get any exceptions ?
> usually wild card term you mentioned would be expanded before being
> actually
> searched .
> 

No exception. Just no results returned.
JMR



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ComplexPhraseQueryParser-and-wildcards-tp2742244p2742403.html
Sent from the Solr - User mailing list archive at Nabble.com.


AW: stopwords not working in multicore setup

2011-03-28 Thread Martin Rödig
Hi,

you must encode the umlaut in the URL. In your case it must be &q=title:f%FCr 
then it must be work.




Von: Christopher Bottaro [mailto:cjbott...@onespot.com]
Gesendet: Freitag, 25. März 2011 18:48
An: solr-user@lucene.apache.org
Cc: Martin Rödig
Betreff: Re: stopwords not working in multicore setup

Ahh, thank you for the hints Martin... German stopwords without Umlaut work 
correctly.

So I'm trying to figure out where the UTF-8 chars are getting messed up.  Using 
the Solr admin web UI, I did a search for title:für and the xml (or json) 
output in the browser shows the query with the proper encoding, but the Solr 
logs show this:

INFO: [page_30d_de] webapp=/solr path=/select 
params={explainOther=&fl=*,score&indent=on&start=0&q=title:f?r&hl.fl=&qt=standard&wt=xml&fq=&version=2.2&rows=10}
 hits=76 status=0 QTime=2

Notice the title:f?r.  How do I fix that?  I'm using Jetty btw...

Thanks for the help.

On Fri, Mar 25, 2011 at 3:05 AM, Martin Rödig 
mailto:r...@shi-gmbh.com>> wrote:
I have some questions about your config:

Is the stopwords-de.txt in the same diractory as the shema.xml?
Is the title field from type text?
Have you the same problem with german stopwords with out Umlaut (ü,ö,ä) like 
the word "denn"?

A Problem can be that the stopwords-de.txt is not save as UTF-8, so the filter 
can not read the umlaut ü in the file.


Mit freundlichen Grüßen
M.Sc. Dipl.-Inf. (FH) Martin Rödig

SHI Elektronische Medien GmbH
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - -
AKTUELL - NEU - AB SOFORT
Solr/Lucene Schulung vom 19. - 21. April in Berlin

Als erster zertifizierter Trainingspartner von Lucid Imagination in
Deutschland, Österreich und Schweiz bietet SHI ab sofort
deutschsprachige Solr Schulungen an.
Weitere Informationen: 
www.shi-gmbh.com/services/solr-training
Achtung: Die Anzahl der Plätze ist beschränkt!
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - -
Postadresse: Watzmannstr. 23, 86316 Friedberg
Besuchsadresse: Curt-Frenzel-Str. 12, 86167 Augsburg
Tel.: 0821 7482633 18
Tel.: 0821 7482633 0 (Zentrale)
Fax: 0821 7482633 29

Internet: http://www.shi-gmbh.com
Registergericht Augsburg HRB 17382
Geschäftsführer: Peter Spiske
Steuernummer: 103/137/30412

-Ursprüngliche Nachricht-
Von: Christopher Bottaro 
[mailto:cjbott...@onespot.com]
Gesendet: Freitag, 25. März 2011 05:37
An: solr-user@lucene.apache.org
Betreff: stopwords not working in multicore setup

Hello,

I'm running a Solr server with 5 cores.  Three are for English content and two 
are for German content.  The default stopwords setup works fine for the English 
cores, but the German stopwords aren't working.

The German stopwords file is stopwords-de.txt and resides in the same directory 
as stopwords.txt.  The German cores use a different schema (named
schema.page.de.xml) which has the following text field definition:
http://pastie.org/1711866

The stopwords-de.txt file looks like this:  http://pastie.org/1711869

The query I'm doing is this:  q => "title:für"

And it's returning documents with für in the title.  Title is a text field 
which should use the stopwords-de.txt, as seen in the aforementioned pastie.

Any ideas?  Thanks for the help.



RamBufferSize and AutoCommit

2011-03-28 Thread Isan Fulia
Hi all ,

I would like to know is there any relation between autocommit and
rambuffersize.
My solr config does not  contain rambuffersize which mean its
deault(32mb).Autocommit setting are after 500 docs or 80 sec
whichever is first.
Solr starts with Xmx 2700M .Total Ram is 4 GB.
Does the rambufferSize is alloted outside the heap memory(2700M)?
How does rambuffersize is related to out of memory errors.
What is the optimal value for rambuffersize.

-- 
Thanks & Regards,
Isan Fulia.


Re: RamBufferSize and AutoCommit

2011-03-28 Thread Li Li
there are 3 conditions that will trigger an auto flushing in lucene
1. size of index in ram is larger than ram buffer size
2. documents in mamory is larger than the number set by setMaxBufferedDocs.
3. deleted term number is larger than the ratio set by
setMaxBufferedDeleteTerms.

auto flushing by time interval is added by solr

rambufferSize  will use estimated size and the real used memory may be
larger than this value. So if  your Xmx is 2700m, setRAMBufferSizeMB.
should set value less than it. if you setRAMBufferSizeMB to 2700m and
the other 3 conditions are not
triggered, I think it will hit OOM exception.

2011/3/28 Isan Fulia :
> Hi all ,
>
> I would like to know is there any relation between autocommit and
> rambuffersize.
> My solr config does not  contain rambuffersize which mean its
> deault(32mb).Autocommit setting are after 500 docs or 80 sec
> whichever is first.
> Solr starts with Xmx 2700M .Total Ram is 4 GB.
> Does the rambufferSize is alloted outside the heap memory(2700M)?
> How does rambuffersize is related to out of memory errors.
> What is the optimal value for rambuffersize.
>
> --
> Thanks & Regards,
> Isan Fulia.
>


Cant retrieve data

2011-03-28 Thread Merlin Morgenstern
Hi there,

I am new to solr and have just installed it on a suse box with mysql
backend.

Install and MySQL connector seem to be running. I can see the solr admin
interface.
Now I tried to index a table with about 0.5 Mio rows. That seemed to
work as well. However, I do get 0 results doing a querie on it.
Something seemes to be wrong. I also did a commit of the full import.

Here is the response from import.


−

0
3

−

−

data-config.xml


full-import
idle

−

1
404575
0
2011-03-28 12:47:36
−

Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.

2011-03-28 12:47:42
2011-03-28 12:47:42
0
0:0:6.141

−

This response format is experimental.  It is likely to change in the
future.



Data-config.xml looks like this:












Thank you for any hint to get this running.

-- 
http://www.fastmail.fm - Email service worth paying for. Try it for free



Re: Cant retrieve data

2011-03-28 Thread Upayavira
What query are you doing? 

Try q=*:*

Also, what does /solr/admin/stats.jsp report for number of docs?

Upayavira

On Mon, 28 Mar 2011 04:28 -0700, "Merlin Morgenstern"
 wrote:
> Hi there,
> 
> I am new to solr and have just installed it on a suse box with mysql
> backend.
> 
> Install and MySQL connector seem to be running. I can see the solr admin
> interface.
> Now I tried to index a table with about 0.5 Mio rows. That seemed to
> work as well. However, I do get 0 results doing a querie on it.
> Something seemes to be wrong. I also did a commit of the full import.
> 
> Here is the response from import.
> 
> 
> −
> 
> 0
> 3
> 
> −
> 
> −
> 
> data-config.xml
> 
> 
> full-import
> idle
> 
> −
> 
> 1
> 404575
> 0
> 2011-03-28 12:47:36
> −
> 
> Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
> 
> 2011-03-28 12:47:42
> 2011-03-28 12:47:42
> 0
> 0:0:6.141
> 
> −
> 
> This response format is experimental.  It is likely to change in the
> future.
> 
> 
> 
> Data-config.xml looks like this:
> 
> 
> driver="com.mysql.jdbc.Driver"
>url="jdbc:mysql://192.168.0.109/test"
>user="solr"
>password="bOgKk0Kg"/>
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Thank you for any hint to get this running.
> 
> -- 
> http://www.fastmail.fm - Email service worth paying for. Try it for free
> 
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source



Re: RamBufferSize and AutoCommit

2011-03-28 Thread Erick Erickson
Also note that making RAMBufferSize too big isn't useful. Lucid
recommends 128M as the point over which you hit diminishing
returns. But unless you're having problems speed-wise with the
default, why change it?

And are you actually getting OOMs or is this a background question?

Best
Erick

On Mon, Mar 28, 2011 at 6:23 AM, Li Li  wrote:
> there are 3 conditions that will trigger an auto flushing in lucene
> 1. size of index in ram is larger than ram buffer size
> 2. documents in mamory is larger than the number set by setMaxBufferedDocs.
> 3. deleted term number is larger than the ratio set by
> setMaxBufferedDeleteTerms.
>
> auto flushing by time interval is added by solr
>
> rambufferSize  will use estimated size and the real used memory may be
> larger than this value. So if  your Xmx is 2700m, setRAMBufferSizeMB.
> should set value less than it. if you setRAMBufferSizeMB to 2700m and
> the other 3 conditions are not
> triggered, I think it will hit OOM exception.
>
> 2011/3/28 Isan Fulia :
>> Hi all ,
>>
>> I would like to know is there any relation between autocommit and
>> rambuffersize.
>> My solr config does not  contain rambuffersize which mean its
>> deault(32mb).Autocommit setting are after 500 docs or 80 sec
>> whichever is first.
>> Solr starts with Xmx 2700M .Total Ram is 4 GB.
>> Does the rambufferSize is alloted outside the heap memory(2700M)?
>> How does rambuffersize is related to out of memory errors.
>> What is the optimal value for rambuffersize.
>>
>> --
>> Thanks & Regards,
>> Isan Fulia.
>>
>


copyField destination does not exist

2011-03-28 Thread Merlin Morgenstern
Hi there,

I am trying to get solr indexing mysql tables. Seems like I have
misconfigured schema.xml:

HTTP ERROR: 500

Severe errors in solr configuration.

-
org.apache.solr.common.SolrException: copyField destination :'text' does
not exist
at

org.apache.solr.schema.IndexSchema.registerCopyField(IndexSchema.java:685)


My config looks like this:

 



 

 id
 
 phrase


What is wrong within this config? The type schould be OK.

-- 
http://www.fastmail.fm - Choose from over 50 domains or use your own



delete by query

2011-03-28 Thread Gastone Penzo
Hi,
i want to use delete by query method to delete indexes.
i try for example:

http://10.0.0.178:8983/solr/update?stream.body=
field1:value

and it works

but how can delete indexes by 2 filters?

http://10.0.0.178:8983/solr/update?stream.body=field1:value1
AND field2:value2

it doesn't work. i need a logic AND cause i want solr delete indexes which
have field1 with value1 and field2 with value2.
is it possible?

thanx


-- 
Gastone Penzo
*www.solr-italia.it*
*The first italian blog about Apache Solr*


Re: delete by query

2011-03-28 Thread Gastone Penzo
i resolved:
http://10.0.0.178:8983/solr/update?stream.body=
(field1:value1)AND(field2:value2)

Thanx

2011/3/28 Gastone Penzo 

> Hi,
> i want to use delete by query method to delete indexes.
> i try for example:
>
> http://10.0.0.178:8983/solr/update?stream.body=
> field1:value
>
> and it works
>
> but how can delete indexes by 2 filters?
>
> http://10.0.0.178:8983/solr/update?stream.body=field1:value1
> AND field2:value2
>
> it doesn't work. i need a logic AND cause i want solr delete indexes which
> have field1 with value1 and field2 with value2.
> is it possible?
>
> thanx
>
>
> --
> Gastone Penzo
> *www.solr-italia.it*
> *The first italian blog about Apache Solr*
>
>


-- 
Gastone Penzo
*www.solr-italia.it*
*The first italian blog about Apache Solr*


Re: copyField destination does not exist

2011-03-28 Thread Geert-Jan Brits
The error is saying you have a copyfield-directive in schema.xml that wants
to copy the value of a field to the destination field 'text' that doesn't
exist (which indeed is the case given your supplied fields) Search your
schema.xml for 'copyField'. There's probably something configured related to
copyfield functionality that you don't want.  Perhaps you de-commented the
copyfield-portion of schema.xml by accident?

hth,
Geert-Jan

2011/3/28 Merlin Morgenstern 

> Hi there,
>
> I am trying to get solr indexing mysql tables. Seems like I have
> misconfigured schema.xml:
>
> HTTP ERROR: 500
>
> Severe errors in solr configuration.
>
> -
> org.apache.solr.common.SolrException: copyField destination :'text' does
> not exist
>at
>
>  org.apache.solr.schema.IndexSchema.registerCopyField(IndexSchema.java:685)
>
>
> My config looks like this:
>
>  
>required="true"/>
>required="true"/>
>required="true"/>
>  
>
>  id
>  
>  phrase
>
>
> What is wrong within this config? The type schould be OK.
>
> --
> http://www.fastmail.fm - Choose from over 50 domains or use your own
>
>


Re: Default operator

2011-03-28 Thread Brian Lamb
Thank you both for your input. I ended up using Ahmet's way because it seems
to fit better with the rest of the application.

On Sat, Mar 26, 2011 at 6:02 AM, lboutros  wrote:

> The other way could be to extend the SolrQueryParser to read a per field
> default operator in the solr config file. Then it should be possible to
> override this functions :
>
> setDefaultOperator
> getDefaultOperator
>
> and this two which are using the default operator :
>
> getFieldQuery
> addClause
>
> The you just have to declare it in the solr config file and configure your
> default operators.
>
> Ludovic.
>
>
>
> -
> Jouve
> France.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Default-operator-tp2732237p2734931.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Cant retrieve data

2011-03-28 Thread Jeffrey Chang
I'm also new but I was able to get DIH working.

>From your response your have:
...
Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
...
0

I believe your fetch (db source and query) is correct based on the response
but perhaps your mapping isn't. I would check required fields on your
schema.xml and see if they are properly mapped. Also are you getting any
exceptions?
- Jeff


On Mon, Mar 28, 2011 at 8:12 PM, Upayavira  wrote:

> What query are you doing?
>
> Try q=*:*
>
> Also, what does /solr/admin/stats.jsp report for number of docs?
>
> Upayavira
>
> On Mon, 28 Mar 2011 04:28 -0700, "Merlin Morgenstern"
>  wrote:
> > Hi there,
> >
> > I am new to solr and have just installed it on a suse box with mysql
> > backend.
> >
> > Install and MySQL connector seem to be running. I can see the solr admin
> > interface.
> > Now I tried to index a table with about 0.5 Mio rows. That seemed to
> > work as well. However, I do get 0 results doing a querie on it.
> > Something seemes to be wrong. I also did a commit of the full import.
> >
> > Here is the response from import.
> >
> > 
> > -
> > 
> > 0
> > 3
> > 
> > -
> > 
> > -
> > 
> > data-config.xml
> > 
> > 
> > full-import
> > idle
> > 
> > -
> > 
> > 1
> > 404575
> > 0
> > 2011-03-28 12:47:36
> > -
> > 
> > Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
> > 
> > 2011-03-28 12:47:42
> > 2011-03-28 12:47:42
> > 0
> > 0:0:6.141
> > 
> > -
> > 
> > This response format is experimental.  It is likely to change in the
> > future.
> > 
> > 
> >
> > Data-config.xml looks like this:
> >
> > 
> >  >driver="com.mysql.jdbc.Driver"
> >url="jdbc:mysql://192.168.0.109/test"
> >user="solr"
> >password="bOgKk0Kg"/>
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >
> > Thank you for any hint to get this running.
> >
> > --
> > http://www.fastmail.fm - Email service worth paying for. Try it for free
> >
> ---
> Enterprise Search Consultant at Sourcesense UK,
> Making Sense of Open Source
>
>


problems indexing web content

2011-03-28 Thread Charles Wardell
Hi Everyone,

I setup a server and began to index my data. I have two questions I am hoping 
someone can help me with. Many of my files seem to index without any problems. 
Others, I get a host of different errors. I am indexing primarily web based 
content and have identified my text field as follows:
 




   









q1) Errors while indexing.

* SimplePostTool: WARNING: Unexpected response from Solr: '' does not contain '0'

* SEVERE: Error processing "legacy" update 
command:com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character ' ' 
(code 32) in content after '<' (malformed start element?). at [row,col 
{unknown-source}]: [1591,90] at 
com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:648)

* Although I can't find the actual error, I recall solr giving me an error when 
it came across a string &What - The error was something like expecting 
semicolon after "What"


q2) If my file has 1000 documents and I submit it with post.jar, if it comes 
across any of the above errors, will it break the processing of the whole file, 
or just the document with the error?


Thanks in advance. 
Your help is very much appreciated.

Charlie

  

Re: problems indexing web content

2011-03-28 Thread Jan Høydahl
Hi,

I assume you try to post HTML files from post.jar, and use HTMLStripCharFilter 
to sanitize the HTML.

But you refer to "my file" as if you have multiple docs in one file? XML or 
HTML? Multiple files?
To what UpdateRequestHandler are you posting? /update/xml or /update/extract ?
For us to understand what you're trying to achieve, please describe your 
project in more detail.


To give some concrete feedback too: First off, your analyzer for "text" is 
wrong. All charFilter's need to be before the tokenizer. You also lack an 
analyzer with type="query". If I were you I'd try the simplest case first, get 
rid of mappingCharFilter, StopFilter, WordDelimFilter and Stemmer - just do the 
most basic stuff you can and go from there.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 28. mars 2011, at 18.52, Charles Wardell wrote:

> Hi Everyone,
> 
> I setup a server and began to index my data. I have two questions I am hoping 
> someone can help me with. Many of my files seem to index without any 
> problems. Others, I get a host of different errors. I am indexing primarily 
> web based content and have identified my text field as follows:
> 
> 
>
>
> mapping="mapping.txt"/>
>  
> words="stopwords.txt"/>
> generateWordParts="1" generateNumberParts="1" catenateWords="1" 
> catenateNumbers="1" catenateAll="0"/>
>
> protected="protwords.txt"/>
>
>
>
> 
> 
> q1) Errors while indexing.
> 
> * SimplePostTool: WARNING: Unexpected response from Solr: ' status="0">' does not contain '0'
> 
> * SEVERE: Error processing "legacy" update 
> command:com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character ' 
> ' (code 32) in content after '<' (malformed start element?). at [row,col 
> {unknown-source}]: [1591,90] at 
> com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:648)
> 
> * Although I can't find the actual error, I recall solr giving me an error 
> when it came across a string &What - The error was something like expecting 
> semicolon after "What"
> 
> 
> q2) If my file has 1000 documents and I submit it with post.jar, if it comes 
> across any of the above errors, will it break the processing of the whole 
> file, or just the document with the error?
> 
> 
> Thanks in advance. 
> Your help is very much appreciated.
> 
> Charlie
> 



Re: problems indexing web content

2011-03-28 Thread Charles Wardell
Jan,

thank you for such a quick reply. I have a feed coming in that I convert to an 

Here is the type for text including index and query with the changes suggested.




   

















Here is the snippit of the file I generate.

?xml version="1.0" encoding="UTF-8"?>


http://twitter.com/uswautis/statuses/51997364122165249
E X I T
uswautis (Hasanah Uswa)


http://twitter.com/uswautis
U
2011-03-27T13:21:52Z
2011-03-27T13:22:13Z

http://twitter.com/uswautis/statuses/51997364122165249
text/html

null
0
MICROBLOG
E X I T
text/html
zlib
mime_type: "text/html"
data: ""

[]



http://twitter.com/imsuperangelica/statuses/51997364050862080
I want the sweater i saw in mango so bad.
imsuperangelica (angelica marie)


http://twitter.com/imsuperangelica
en
2011-03-27T13:21:52Z
2011-03-27T13:22:13Z

http://twitter.com/imsuperangelica/statuses/51997364050862080
text/html

null
0
MICROBLOG
I want the sweater i saw in mango so bad.
text/html
zlib
mime_type: "text/html"
data: ""

[]











On Mar 28, 2011, at 1:02 PM, Jan Høydahl wrote:

> Hi,
> 
> I assume you try to post HTML files from post.jar, and use 
> HTMLStripCharFilter to sanitize the HTML.
> 
> But you refer to "my file" as if you have multiple docs in one file? XML or 
> HTML? Multiple files?
> To what UpdateRequestHandler are you posting? /update/xml or /update/extract ?
> For us to understand what you're trying to achieve, please describe your 
> project in more detail.
> 
> 
> To give some concrete feedback too: First off, your analyzer for "text" is 
> wrong. All charFilter's need to be before the tokenizer. You also lack an 
> analyzer with type="query". If I were you I'd try the simplest case first, 
> get rid of mappingCharFilter, StopFilter, WordDelimFilter and Stemmer - just 
> do the most basic stuff you can and go from there.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
> On 28. mars 2011, at 18.52, Charles Wardell wrote:
> 
>> Hi Everyone,
>> 
>> I setup a server and began to index my data. I have two questions I am 
>> hoping someone can help me with. Many of my files seem to index without any 
>> problems. Others, I get a host of different errors. I am indexing primarily 
>> web based content and have identified my text field as follows:
>> 
>> 
>>   
>>   
>>   > mapping="mapping.txt"/>
>> 
>>   > words="stopwords.txt"/>
>>   > generateWordParts="1" generateNumberParts="1" catenateWords="1" 
>> catenateNumbers="1" catenateAll="0"/>
>>   
>>   > protected="protwords.txt"/>
>>   
>>   
>>   
>> 
>> 
>> q1) Errors while indexing.
>> 
>> * SimplePostTool: WARNING: Unexpected response from Solr: '> status="0">' does not contain '0'
>> 
>> * SEVERE: Error processing "legacy" update 
>> command:com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character ' 
>> ' (code 32) in content after '<' (malformed start element?). at [row,col 
>> {unknown-source}]: [1591,90] at 
>> com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:648)
>> 
>> * Although I can't find the actual error, I recall solr giving me an error 
>> when it came across a string &What - The error was something like expecting 
>> semicolon after "What"
>> 
>> 
>> q2) If my file has 1000 documents and I submit it with post.jar, if it comes 
>> across any of the above errors, will it break the processing of the whole 
>> file, or just the document with the error?
>> 
>> 
>> Thanks in advance. 
>> Your help is very much appreciated.
>> 
>> Charlie
>> 
> 



Re: problems indexing web content

2011-03-28 Thread Markus Jelsma
The analyzer order doesn't really matter, char filters are regardless of 
position in the analyzer always executed first.  Multiple filters of the same 
type, however, are affected by order. Also, your error is not caused by a 
faulty analyzer, there is something wrong in your XML.

Anyway, according to your error, check row 1591 column 90 of your XML input, 
there seems to be a loose space somewhere.

> Jan,
> 
> thank you for such a quick reply. I have a feed coming in that I convert to
> an  Here is the type for text including index
> and query with the changes suggested.
> 
> 
>  positionIncrementGap="100"> 
> 
> 
>  protected="protwords.txt"/>  class="solr.RemoveDuplicatesTokenFilterFactory"/>  class="solr.WhitespaceTokenizerFactory"/> 
> 
>  synonyms="synonyms.txt" ignoreCase="true" expand="true"/>  class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>  generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0"/> 
>  protected="protwords.txt"/>  class="solr.RemoveDuplicatesTokenFilterFactory"/>  class="solr.WhitespaceTokenizerFactory"/> 
> 
> 
> 
> Here is the snippit of the file I generate.
> 
> ?xml version="1.0" encoding="UTF-8"?>
> 
> 
>  name="guid">http://twitter.com/uswautis/statuses/51997364122165249
> E X I T
> uswautis (Hasanah Uswa)
> 
> 
> http://twitter.com/uswautis
> U
> 2011-03-27T13:21:52Z
> 2011-03-27T13:22:13Z
> 
>  name="feedURL">http://twitter.com/uswautis/statuses/51997364122165249 ld> text/html
> 
> null
> 0
> MICROBLOG
> E X I T
> text/html
> zlib
> mime_type: "text/html"
> data: ""
> 
> []
> 
> 
> 
>  name="guid">http://twitter.com/imsuperangelica/statuses/51997364050862080<
> /field> I want the sweater i saw in mango so
> bad. imsuperangelica (angelica
> marie)
> 
> 
> http://twitter.com/imsuperangelica
> en
> 2011-03-27T13:21:52Z
> 2011-03-27T13:22:13Z
> 
>  name="feedURL">http://twitter.com/imsuperangelica/statuses/519973640508620
> 80 text/html
> 
> null
> 0
> MICROBLOG
> I want the sweater i saw in mango so
> bad. text/html
> zlib
> mime_type: "text/html"
> data: ""
> 
> []
> 
> 
> 
> 
> On Mar 28, 2011, at 1:02 PM, Jan Høydahl wrote:
> > Hi,
> > 
> > I assume you try to post HTML files from post.jar, and use
> > HTMLStripCharFilter to sanitize the HTML.
> > 
> > But you refer to "my file" as if you have multiple docs in one file? XML
> > or HTML? Multiple files? To what UpdateRequestHandler are you posting?
> > /update/xml or /update/extract ? For us to understand what you're trying
> > to achieve, please describe your project in more detail.
> > 
> > 
> > To give some concrete feedback too: First off, your analyzer for "text"
> > is wrong. All charFilter's need to be before the tokenizer. You also
> > lack an analyzer with type="query". If I were you I'd try the simplest
> > case first, get rid of mappingCharFilter, StopFilter, WordDelimFilter
> > and Stemmer - just do the most basic stuff you can and go from there.
> > 
> > --
> > Jan Høydahl, search solution architect
> > Cominvent AS - www.cominvent.com
> > 
> > On 28. mars 2011, at 18.52, Charles Wardell wrote:
> >> Hi Everyone,
> >> 
> >> I setup a server and began to index my data. I have two questions I am
> >> hoping someone can help me with. Many of my files seem to index without
> >> any problems. Others, I get a host of different errors. I am indexing
> >> primarily web based content and have identified my text field as
> >> follows:
> >> 
> >>  >> positionIncrementGap="100">
> >> 
> >>   
> >>   
> >>   
> >>>>   mapping="mapping.txt"/>  >>   class="solr.HTMLStripCharFilterFactory"/>  >>   class="solr.StopFilterFactory" ignoreCase="true"
> >>   words="stopwords.txt"/>  >>   class="solr.WordDelimiterFilterFactory"
> >>   generateWordParts="1" generateNumberParts="1"
> >>   catenateWords="1" catenateNumbers="1" catenateAll="0"/>
> >>   
> >>>>   protected="protwords.txt"/>  >>   class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >>   
> >>   
> >>   
> >>   
> >> 
> >> q1) Errors while indexing.
> >> 
> >> * SimplePostTool: WARNING: Unexpected response from Solr: ' >> status="0">' does not contain '0'
> >> 
> >> * SEVERE: Error processing "legacy" update
> >> command:com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected
> >> character ' ' (code 32) in content after '<' (malformed start
> >> element?). at [row,col {unknown-source}]: [1591,90] at
> >> com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:64
> >> 8)
> >> 
> >> * Although I can't find the actual error, I recall solr giving me an
> >> error when it came across a string &What - The error was something like
> >> expecting semicolon after "What"
> >> 
> >> 
> >> q2) If my file 

Re: problems indexing web content

2011-03-28 Thread Markus Jelsma
Also, don't forget to encode entities or wrap them in CDATA.

> Jan,
> 
> thank you for such a quick reply. I have a feed coming in that I convert to
> an  Here is the type for text including index
> and query with the changes suggested.
> 
> 
>  positionIncrementGap="100"> 
> 
> 
>  protected="protwords.txt"/>  class="solr.RemoveDuplicatesTokenFilterFactory"/>  class="solr.WhitespaceTokenizerFactory"/> 
> 
>  synonyms="synonyms.txt" ignoreCase="true" expand="true"/>  class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>  generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0"/> 
>  protected="protwords.txt"/>  class="solr.RemoveDuplicatesTokenFilterFactory"/>  class="solr.WhitespaceTokenizerFactory"/> 
> 
> 
> 
> Here is the snippit of the file I generate.
> 
> ?xml version="1.0" encoding="UTF-8"?>
> 
> 
>  name="guid">http://twitter.com/uswautis/statuses/51997364122165249
> E X I T
> uswautis (Hasanah Uswa)
> 
> 
> http://twitter.com/uswautis
> U
> 2011-03-27T13:21:52Z
> 2011-03-27T13:22:13Z
> 
>  name="feedURL">http://twitter.com/uswautis/statuses/51997364122165249 ld> text/html
> 
> null
> 0
> MICROBLOG
> E X I T
> text/html
> zlib
> mime_type: "text/html"
> data: ""
> 
> []
> 
> 
> 
>  name="guid">http://twitter.com/imsuperangelica/statuses/51997364050862080<
> /field> I want the sweater i saw in mango so
> bad. imsuperangelica (angelica
> marie)
> 
> 
> http://twitter.com/imsuperangelica
> en
> 2011-03-27T13:21:52Z
> 2011-03-27T13:22:13Z
> 
>  name="feedURL">http://twitter.com/imsuperangelica/statuses/519973640508620
> 80 text/html
> 
> null
> 0
> MICROBLOG
> I want the sweater i saw in mango so
> bad. text/html
> zlib
> mime_type: "text/html"
> data: ""
> 
> []
> 
> 
> 
> 
> On Mar 28, 2011, at 1:02 PM, Jan Høydahl wrote:
> > Hi,
> > 
> > I assume you try to post HTML files from post.jar, and use
> > HTMLStripCharFilter to sanitize the HTML.
> > 
> > But you refer to "my file" as if you have multiple docs in one file? XML
> > or HTML? Multiple files? To what UpdateRequestHandler are you posting?
> > /update/xml or /update/extract ? For us to understand what you're trying
> > to achieve, please describe your project in more detail.
> > 
> > 
> > To give some concrete feedback too: First off, your analyzer for "text"
> > is wrong. All charFilter's need to be before the tokenizer. You also
> > lack an analyzer with type="query". If I were you I'd try the simplest
> > case first, get rid of mappingCharFilter, StopFilter, WordDelimFilter
> > and Stemmer - just do the most basic stuff you can and go from there.
> > 
> > --
> > Jan Høydahl, search solution architect
> > Cominvent AS - www.cominvent.com
> > 
> > On 28. mars 2011, at 18.52, Charles Wardell wrote:
> >> Hi Everyone,
> >> 
> >> I setup a server and began to index my data. I have two questions I am
> >> hoping someone can help me with. Many of my files seem to index without
> >> any problems. Others, I get a host of different errors. I am indexing
> >> primarily web based content and have identified my text field as
> >> follows:
> >> 
> >>  >> positionIncrementGap="100">
> >> 
> >>   
> >>   
> >>   
> >>>>   mapping="mapping.txt"/>  >>   class="solr.HTMLStripCharFilterFactory"/>  >>   class="solr.StopFilterFactory" ignoreCase="true"
> >>   words="stopwords.txt"/>  >>   class="solr.WordDelimiterFilterFactory"
> >>   generateWordParts="1" generateNumberParts="1"
> >>   catenateWords="1" catenateNumbers="1" catenateAll="0"/>
> >>   
> >>>>   protected="protwords.txt"/>  >>   class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >>   
> >>   
> >>   
> >>   
> >> 
> >> q1) Errors while indexing.
> >> 
> >> * SimplePostTool: WARNING: Unexpected response from Solr: ' >> status="0">' does not contain '0'
> >> 
> >> * SEVERE: Error processing "legacy" update
> >> command:com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected
> >> character ' ' (code 32) in content after '<' (malformed start
> >> element?). at [row,col {unknown-source}]: [1591,90] at
> >> com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:64
> >> 8)
> >> 
> >> * Although I can't find the actual error, I recall solr giving me an
> >> error when it came across a string &What - The error was something like
> >> expecting semicolon after "What"
> >> 
> >> 
> >> q2) If my file has 1000 documents and I submit it with post.jar, if it
> >> comes across any of the above errors, will it break the processing of
> >> the whole file, or just the document with the error?
> >> 
> >> 
> >> Thanks in advance.
> >> Your help is very much appreciated.
> >> 
> >> Charlie


Role of the "name" in spellchecker declaration. Can there be multiple instances of it?

2011-03-28 Thread Teruhiko Kurosaka
In the spellchecker search component declaration:
http://wiki.apache.org/solr/SpellCheckComponent#Configuration

What role does the "name" play, which is "default" in this
sample?  Can this be any arbitrary name? Should this name
match with something else in the configuration files?


I came to this question because I was experimenting to
have multiple instances of the SpellChecker components
and so far it didn't work. (Exceptions)  I'd like to
have multiple instances so that one instance would spell check
on an English text fields, another on a Spanish field, etc.
Can there be more than one spellchecker search components?

T. "Kuro" Kurosaka, 415-227-9600x122, 617-386-7122(direct)







Re: problems indexing web content

2011-03-28 Thread Charles Wardell
I have about 1000 documents per xml file. I am not really doing anything with 
the data other than putting the xml tags around it.
So essentially the data is okay with the exception of a few documents that are 
causing the errors.

Let's say document # 47 in the xml file has a problem, is the whole file 
skipped when using post.jar?
I will add the CDATA to my xml generator.

Sometimes the data will come in as a string of pretty funky looking characters. 
I am assuming this is UTF-8. Is there any specialized data type I need to 
declare for this data?

One other thing I noticed is that sometimes I may get data in binary compreseed 
format. Like an image or something. Obviously I am not looking to index it, but 
is there a data type this can be stored as in Solr so I can retrieve and render 
easily?


On Mar 28, 2011, at 1:38 PM, Markus Jelsma wrote:

> Also, don't forget to encode entities or wrap them in CDATA.
> 
>> Jan,
>> 
>> thank you for such a quick reply. I have a feed coming in that I convert to
>> an  Here is the type for text including index
>> and query with the changes suggested.
>> 
>> 
>>> positionIncrementGap="100"> 
>>
>>
>>> protected="protwords.txt"/> > class="solr.RemoveDuplicatesTokenFilterFactory"/> > class="solr.WhitespaceTokenizerFactory"/> 
>>
>>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>> > generateNumberParts="1" catenateWords="0" catenateNumbers="0"
>> catenateAll="0"/> 
>>> protected="protwords.txt"/> > class="solr.RemoveDuplicatesTokenFilterFactory"/> > class="solr.WhitespaceTokenizerFactory"/> 
>>
>> 
>> 
>> Here is the snippit of the file I generate.
>> 
>> ?xml version="1.0" encoding="UTF-8"?>
>> 
>> 
>> > name="guid">http://twitter.com/uswautis/statuses/51997364122165249
>> E X I T
>> uswautis (Hasanah Uswa)
>> 
>> 
>> http://twitter.com/uswautis
>> U
>> 2011-03-27T13:21:52Z
>> 2011-03-27T13:22:13Z
>> 
>> > name="feedURL">http://twitter.com/uswautis/statuses/51997364122165249> ld> text/html
>> 
>> null
>> 0
>> MICROBLOG
>> E X I T
>> text/html
>> zlib
>> mime_type: "text/html"
>> data: ""
>> 
>> []
>> 
>> 
>> 
>> > name="guid">http://twitter.com/imsuperangelica/statuses/51997364050862080<
>> /field> I want the sweater i saw in mango so
>> bad. imsuperangelica (angelica
>> marie)
>> 
>> 
>> http://twitter.com/imsuperangelica
>> en
>> 2011-03-27T13:21:52Z
>> 2011-03-27T13:22:13Z
>> 
>> > name="feedURL">http://twitter.com/imsuperangelica/statuses/519973640508620
>> 80 text/html
>> 
>> null
>> 0
>> MICROBLOG
>> I want the sweater i saw in mango so
>> bad. text/html
>> zlib
>> mime_type: "text/html"
>> data: ""
>> 
>> []
>> 
>> 
>> 
>> 
>> On Mar 28, 2011, at 1:02 PM, Jan Høydahl wrote:
>>> Hi,
>>> 
>>> I assume you try to post HTML files from post.jar, and use
>>> HTMLStripCharFilter to sanitize the HTML.
>>> 
>>> But you refer to "my file" as if you have multiple docs in one file? XML
>>> or HTML? Multiple files? To what UpdateRequestHandler are you posting?
>>> /update/xml or /update/extract ? For us to understand what you're trying
>>> to achieve, please describe your project in more detail.
>>> 
>>> 
>>> To give some concrete feedback too: First off, your analyzer for "text"
>>> is wrong. All charFilter's need to be before the tokenizer. You also
>>> lack an analyzer with type="query". If I were you I'd try the simplest
>>> case first, get rid of mappingCharFilter, StopFilter, WordDelimFilter
>>> and Stemmer - just do the most basic stuff you can and go from there.
>>> 
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>> 
>>> On 28. mars 2011, at 18.52, Charles Wardell wrote:
 Hi Everyone,
 
 I setup a server and began to index my data. I have two questions I am
 hoping someone can help me with. Many of my files seem to index without
 any problems. Others, I get a host of different errors. I am indexing
 primarily web based content and have identified my text field as
 follows:
 
 >>> positionIncrementGap="100">
 
  
 
  
  >>>  mapping="mapping.txt"/> >>>  class="solr.HTMLStripCharFilterFactory"/> >>>  class="solr.StopFilterFactory" ignoreCase="true"
  words="stopwords.txt"/> >>>  class="solr.WordDelimiterFilterFactory"
  generateWordParts="1" generateNumberParts="1"
  catenateWords="1" catenateNumbers="1" catenateAll="0"/>
  
  >>>  protected="protwords.txt"/> >>>  class="solr.RemoveDuplicatesTokenFilterFactory"/>
 
  
 
  
 
 q1) Errors while indexing.
 
 * SimplePostTool: WARNING: Unexpected response from Solr: '>>> status="0">' does not contain '0

Re: [WKT] Spatial Searching

2011-03-28 Thread Smiley, David W.
(This is one of those messages that I would have responded to at the time if I 
only noticed it.)

There is not yet indexing of arbitrary shapes (i.e. your data can only be 
points), but with SOLR-2155 you can query via WKT thanks to JTS.  If you want 
to index shapes then you'll have to wait a month or two for work that is 
underway right now.  It's coming; be patient.

I don't see the LGPL licensing as a problem; it's *L*GPL, not GPL, after all.  
In SOLR-2155 the patch I take measures to download this library dynamically at 
build time and compile against it.  JTS need not ship with Solr; the user can 
get it themselves if they want this capability.  Non-JTS query shapes should 
work without the presence of JTS.

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/

On Feb 8, 2011, at 11:18 PM, Adam Estrada wrote:

> I just came across a ~nudge post over in the SIS list on what the status is 
> for that project. This got me looking more in to spatial mods with Solr4.0.  
> I found this enhancement in Jira. 
> https://issues.apache.org/jira/browse/SOLR-2155. In this issue, David 
> mentions that he's already integrated JTS in to Solr4.0 for querying on 
> polygons stored as WKT. 
> 
> It's relatively easy to get WKT strings in to Solr but does the Field type 
> exist yet? Is there a patch or something that I can test out? 
> 
> Here's how I would do it using GDAL/OGR and the already existing csv update 
> handler. http://www.gdal.org/ogr/drv_csv.html
> 
> ogr2ogr -f CSV output.csv input.shp -lco GEOMETRY=AS_WKT
> This converts a shapefile to a csv with the geometries in tact in the form of 
> WKT. You can then get the data in to Solr by running the following command.
> curl 
> "http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,attr1,attr2,attr3,geom&stream.file=C:\tmp\output.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8";
> There are lots of flavors of geometries so I suspect that this will be a 
> daunting task but because JTS recognizes each geometry type it should be 
> possible to work with them. 
> Does anyone know of a patch or even when this functionality might be included 
> in to Solr4.0? I need to query for polygons ;-)
> Thanks,
> Adam


Re: problems indexing web content

2011-03-28 Thread Markus Jelsma

> I have about 1000 documents per xml file. I am not really doing anything
> with the data other than putting the xml tags around it. So essentially
> the data is okay with the exception of a few documents that are causing
> the errors.
> 
> Let's say document # 47 in the xml file has a problem, is the whole file
> skipped when using post.jar? I will add the CDATA to my xml generator.

I am not sure actually, i never tried, but i think it's thrown away.

> 
> Sometimes the data will come in as a string of pretty funky looking
> characters. I am assuming this is UTF-8. Is there any specialized data
> type I need to declare for this data?

Well, all data needs to be UTF-8 encoded. Anyway, wrong encoded text data is 
just indexed as is and won't throw an error. Except for entities of course.

> 
> One other thing I noticed is that sometimes I may get data in binary
> compreseed format. Like an image or something. Obviously I am not looking
> to index it, but is there a data type this can be stored as in Solr so I
> can retrieve and render easily?

Yes, use the binary field type [1]. You have to base64 encode the data.

[1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/BinaryField.html

> 
> On Mar 28, 2011, at 1:38 PM, Markus Jelsma wrote:
> > Also, don't forget to encode entities or wrap them in CDATA.
> > 
> >> Jan,
> >> 
> >> thank you for such a quick reply. I have a feed coming in that I convert
> >> to an  Here is the type for text including
> >> index and query with the changes suggested.
> >> 
> >> >> 
> >> positionIncrementGap="100"> 
> >> 
> >>
> >>
> >> >> 
> >> protected="protwords.txt"/>  >> class="solr.RemoveDuplicatesTokenFilterFactory"/>  >> class="solr.WhitespaceTokenizerFactory"/> 
> >> 
> >>
> >>
> >> >> 
> >> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>  >> class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
> >>  >> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> >> catenateAll="0"/> 
> >> 
> >> >> 
> >> protected="protwords.txt"/>  >> class="solr.RemoveDuplicatesTokenFilterFactory"/>  >> class="solr.WhitespaceTokenizerFactory"/> 
> >> 
> >>
> >> 
> >> Here is the snippit of the file I generate.
> >> 
> >> ?xml version="1.0" encoding="UTF-8"?>
> >> 
> >> 
> >>  >> name="guid">http://twitter.com/uswautis/statuses/51997364122165249 >> d> E X I T
> >> uswautis (Hasanah Uswa)
> >> 
> >> 
> >> http://twitter.com/uswautis
> >> U
> >> 2011-03-27T13:21:52Z
> >> 2011-03-27T13:22:13Z
> >> 
> >>  >> name="feedURL">http://twitter.com/uswautis/statuses/51997364122165249 >> ie ld> text/html
> >> 
> >> null
> >> 0
> >> MICROBLOG
> >> E X I T
> >> text/html
> >> zlib
> >> mime_type: "text/html"
> >> data: ""
> >> 
> >> []
> >> 
> >> 
> >> 
> >>  >> name="guid">http://twitter.com/imsuperangelica/statuses/5199736405086208
> >> 0< /field> I want the sweater i saw in mango so
> >> bad. imsuperangelica (angelica
> >> marie)
> >> 
> >> 
> >> http://twitter.com/imsuperangelica
> >> en
> >> 2011-03-27T13:21:52Z
> >> 2011-03-27T13:22:13Z
> >> 
> >>  >> name="feedURL">http://twitter.com/imsuperangelica/statuses/5199736405086
> >> 20 80 text/html
> >> 
> >> null
> >> 0
> >> MICROBLOG
> >> I want the sweater i saw in mango so
> >> bad. text/html
> >> zlib
> >> mime_type: "text/html"
> >> data: ""
> >> 
> >> []
> >> 
> >> 
> >> 
> >> 
> >> On Mar 28, 2011, at 1:02 PM, Jan Høydahl wrote:
> >>> Hi,
> >>> 
> >>> I assume you try to post HTML files from post.jar, and use
> >>> HTMLStripCharFilter to sanitize the HTML.
> >>> 
> >>> But you refer to "my file" as if you have multiple docs in one file?
> >>> XML or HTML? Multiple files? To what UpdateRequestHandler are you
> >>> posting? /update/xml or /update/extract ? For us to understand what
> >>> you're trying to achieve, please describe your project in more detail.
> >>> 
> >>> 
> >>> To give some concrete feedback too: First off, your analyzer for "text"
> >>> is wrong. All charFilter's need to be before the tokenizer. You also
> >>> lack an analyzer with type="query". If I were you I'd try the simplest
> >>> case first, get rid of mappingCharFilter, StopFilter, WordDelimFilter
> >>> and Stemmer - just do the most basic stuff you can and go from there.
> >>> 
> >>> --
> >>> Jan Høydahl, search solution architect
> >>> Cominvent AS - www.cominvent.com
> >>> 
> >>> On 28. mars 2011, at 18.52, Charles Wardell wrote:
>  Hi Everyone,
>  
>  I setup a server and began to index my data. I have two questions I am
>  hoping someone can help me with. Many of my files seem to index
>  without any problems. Others, I get a host of different errors. I am
>  indexing primarily web based content and have identified my text
>  field as follows:
>  
>    positionIncrementGap="100">
>  
>   
>   
>   
> 

Re: Cant retrieve data

2011-03-28 Thread Gora Mohanty
On Mon, Mar 28, 2011 at 4:58 PM, Merlin Morgenstern
 wrote:
[...]

You should probably hide passwords when posting to
public lists.

>        
>        
>            
>            
>            
[...]

Your select does not seem to have the ID field.

Regards,
Gora


Re: Solrcore.properties

2011-03-28 Thread Jayendra Patil
Can you please attach the other files.
It doesn't seem to find the enable.master property, so you may want to
check the properties file exists on the box having issues

We have the following configuration in the core :-

Core -
- solrconfig.xml - Master & Slave


 ${enable.master:false}
 commit
 solrcore_slave.properties:solrcore.properties,solrconfig.xml,schema.xml


 ${enable.slave:false}
 http://master_host:port/solr/corename/replication



- solrcore.properties - Master
enable.master=true
enable.slave=false

- solrcore_slave.properties - Slave
enable.master=false
enable.slave=true

We have the default values and separate properties file for Master and slave.
Replication is enabled for the solrcore.proerties file.

Regards,
Jayendra

On Mon, Mar 28, 2011 at 2:06 PM, Ezequiel Calderara  wrote:
> Hi all, i'm having problems when deploying solr in the production machines.
>
> I have a master solr, and 3 slaves.
> The master replicates the schema and the solrconfig for the slaves (this
> file in the master is named like solrconfig_slave.xml).
> The solrconfig of the slaves has for example the ${data.dir} and other
> values in the solrtcore.properties
>
> I think that solr isn't recognizing that file, because i get this error:
>>
>> HTTP Status 500 - Severe errors in solr configuration. Check your log
>> files for more detailed information on what may be wrong. If you want solr
>> to continue after configuration errors, change:
>> false in null
>> -
>> org.apache.solr.common.SolrException: No system property or default value
>> specified for enable.master at
>> org.apache.solr.common.util.DOMUtil.substituteProperty(DOMUtil.java:311)
>> ... MORE STACK TRACE INFO...
>
> But here is the thing:
> org.apache.solr.common.SolrException: No system property or default value
> specified for enable.master
>
> I'm attaching the master schema, the master solr config, the solr config of
> the slaves and the solrcore.properties.
>
> If anyone has any info on this i would be more than appreciated!...
>
> Thanks
>
>
> --
> __
> Ezequiel.
>
> Http://www.ironicnet.com
>


Re: [WKT] Spatial Searching

2011-03-28 Thread Estrada Groups
Outstanding! Thanks David...I can't wait to take a look at it.

Adam

Sent from my iPhone

On Mar 28, 2011, at 2:16 PM, "Smiley, David W."  wrote:

> (This is one of those messages that I would have responded to at the time if 
> I only noticed it.)
> 
> There is not yet indexing of arbitrary shapes (i.e. your data can only be 
> points), but with SOLR-2155 you can query via WKT thanks to JTS.  If you want 
> to index shapes then you'll have to wait a month or two for work that is 
> underway right now.  It's coming; be patient.
> 
> I don't see the LGPL licensing as a problem; it's *L*GPL, not GPL, after all. 
>  In SOLR-2155 the patch I take measures to download this library dynamically 
> at build time and compile against it.  JTS need not ship with Solr; the user 
> can get it themselves if they want this capability.  Non-JTS query shapes 
> should work without the presence of JTS.
> 
> ~ David Smiley
> Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
> 
> On Feb 8, 2011, at 11:18 PM, Adam Estrada wrote:
> 
>> I just came across a ~nudge post over in the SIS list on what the status is 
>> for that project. This got me looking more in to spatial mods with Solr4.0.  
>> I found this enhancement in Jira. 
>> https://issues.apache.org/jira/browse/SOLR-2155. In this issue, David 
>> mentions that he's already integrated JTS in to Solr4.0 for querying on 
>> polygons stored as WKT. 
>> 
>> It's relatively easy to get WKT strings in to Solr but does the Field type 
>> exist yet? Is there a patch or something that I can test out? 
>> 
>> Here's how I would do it using GDAL/OGR and the already existing csv update 
>> handler. http://www.gdal.org/ogr/drv_csv.html
>> 
>> ogr2ogr -f CSV output.csv input.shp -lco GEOMETRY=AS_WKT
>> This converts a shapefile to a csv with the geometries in tact in the form 
>> of WKT. You can then get the data in to Solr by running the following 
>> command.
>> curl 
>> "http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,attr1,attr2,attr3,geom&stream.file=C:\tmp\output.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8";
>> There are lots of flavors of geometries so I suspect that this will be a 
>> daunting task but because JTS recognizes each geometry type it should be 
>> possible to work with them. 
>> Does anyone know of a patch or even when this functionality might be 
>> included in to Solr4.0? I need to query for polygons ;-)
>> Thanks,
>> Adam


Re: Solr 1.4.1 and Tika 0.9 - some tests not passing

2011-03-28 Thread Andreas Kemkes
I'm still interested on what steps I could take to get to the bottom of the 
failing tests.  Is there additional information that I should provide?

Some of the output below got mangled in the email - here are the (hopefully) 
complete lines:

This has a http://www.apache.org";>link. (Tika 0.9)
This has a link. (Tika 0.4)




From: Andreas Kemkes 
To: solr-user@lucene.apache.org
Sent: Tue, March 22, 2011 10:30:57 AM
Subject: Solr 1.4.1 and Tika 0.9 - some tests not passing

Due to some PDF indexing issues with the Solr 1.4.1 distribution, we would like 
to upgrade it to Tika 0.9, as the issues are not occurring in Tika 0.9.

With the changes we made to Solr 1.4.1, we can successfully index the 
previously 

failing PDF documents.

Unfortunately we cannot get the HTML-related tests to pass.

The following asserts in ExtractingRequestHandlerTest.java are failing:

assertQ(req("title:Welcome"), "//*[@numFound='1']");
assertQ(req("+id:simple2 +t_href:[* TO *]"), "//*[@numFound='1']");
assertQ(req("t_href:http"), "//*[@numFound='2']");
assertQ(req("t_href:http"), "//doc[1]/str[.='simple3']");
assertQ(req("+id:simple4 +t_content:Solr"), "//*[@numFound='1']");
assertQ(req("defaultExtr:http\\://www.apache.org"), "//*[@numFound='1']");
assertQ(req("+id:simple2 +t_href:[* TO *]"), "//*[@numFound='1']");
assertTrue(val + " is not equal to " + "linkNews", val.equals("linkNews") == 
true);//there are two  tags, and they get collapesd

Below are the differences in output from Tika 0.4 and Tika 0.9 for simple.html.

Tika 0.9 has additional meta tags, a shape attribute, and some additional white 
space.  Is this what throws it off?  

What do we need to consider so that Solr 1.4.1 will process the Tika 0.9 output 
correctly?

Do we need to configure different filters and tokenizers?  Which ones?

Or is it something else entirely?

Thanks in advance for any help,

Andreas

$ java -jar tika-app-0.4.jar 
../../../apache-solr-1.4.1-with-tika-0.9/contrib/extraction/src/test/resources/simple.html




Welcome to Solr



  Here is some text


Here is some text in a div
This has a link'>http://www.apache.org";>link.





$ java -jar tika-app-0.9.jar 
../../../apache-solr-1.4.1-with-tika-0.9/contrib/extraction/src/test/resources/simple.html
 







Welcome to Solr



  Here is some text


Here is some text in a div

This has a link'>http://www.apache.org";>link.




Re: SOLR - problems with non-english symbols when extracting HTML

2011-03-28 Thread kushti

Grijesh wrote:
> 
> Try to send HTML data using format CDATA .
> 
Doesn't work with 


> $content = "";
> 

And my goal is not to avoid extraction, but have no problems with
non-english chars


--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-problems-with-non-english-symbols-when-extracting-HTML-tp2729126p2733858.html
Sent from the Solr - User mailing list archive at Nabble.com.


copyField at search time / multi-language support

2011-03-28 Thread Tom Mortimer
Hi,

Here's my problem: I'm indexing a corpus with text in a variety of
languages. I'm planning to detect these at index time and send the
text to one of a suitably-configured field (e.g. "mytext_de" for
German, "mytext_cjk" for Chinese/Japanese/Korean etc.)

At search time I want to search all of these fields. However, there
will be at least 12 of them, which could lead to a very long query
string. (Also I need to use the standard query parser rather than
dismax, for full query syntax.)

Therefore I was wondering if there was a way to copy fields at search
time, so I can have my mytext query in a single field and have it
copied to mytext_de, mytext_cjk etc. Something like:

   
   
  ...

If this is not currently possible, could someone give me some pointers
for hacking Solr to support it? Should I subclass solr.SearchHandler?
I know nothing about Solr internals at the moment...

thanks,
Tom


Highlighting Problem

2011-03-28 Thread pottwal1
dear solr specialists,

my data looks like this:

j]s(dh)fjk [hf]sjkadh asdj(kfh) [skdjfh aslkfjhalwe uigfrhj bsd bsdfga sjfg 
asdlfj.

if I want to query for the first "word", the following queries must match:

j]s(dh)fjk
j]s(dhfjk
j]sdhfjk
jsdhfjk
dhf

So the matching should ignore some characters like ( ) [ ] and should match 
substrings.

So far I have the following field definition in the schema.xml:

    
  
    
    
    
    
     
  
  
    
    
      
    
     
  
    


With this definition the matching works as planned. But not for highlighting, 
there the special characters seem to move the  tags to wrong positions, for 
example searching for "jsdhfjk" misses the last 3 letters of the words ( = 3 
special characters from PatternReplaceFilterFactory)

j]s(dh)fjk

Solr has so many bells and whistles - what must I do to get a correctly working 
highlighting?

kind regards,
F.


---
Zeigen Sie uns Ihre beste Seite und gewinnen Sie ein iPad!
Machen Sie mit beim freenet Homepage Award 2011

Question about the message Indexing failed. Rolled back all changes

2011-03-28 Thread Firdous Ali
Hi,
I m unable to index data, looks like the datasource is not even read by 
solr, even created an empty dataimport.properties file at /conf but the problem 
persists.

Following is the response text:

−

0
0

−

−

/home/username/data-config.xml


full-import
debug

idle
Configuration Re-loaded sucessfully
−

0:0:0.0
0
0
0
0
2011-03-28 15:13:41
Indexing failed. Rolled back all changes.
2011-03-28 15:13:41

−

This response format is experimental.  It is likely to change in the future.



Thanks in advance,
Firdous


  

Re: Cant retrieve data

2011-03-28 Thread Walter Andreas Pucko


On Mon, 28 Mar 2011 13:12 +0100, "Upayavira"  wrote:
> What query are you doing? 
> 

/solr/select/?q=welpe%0D%0A&version=2.2&start=0&rows=10&indent=on

> Try q=*:*
> 

returns:

−

0
5
−

*:*






> Also, what does /solr/admin/stats.jsp report for number of docs?

That's a good question. Core states: 0, however rows fetched is about
400K?!
Isn't that the same? If no, what must I do to get the documents from the
rows?

Here are the stats:

Solr Statistics: (example)
snake.fritz.box
Category
[Core] [Cache] [Query] [Update] [Highlighting] [Other]
Current Time: Mon Mar 28 14:46:41 CEST 2011
Server Start Time: Mon Mar 28 12:47:27 CEST 2011

Core

name:   core  
class:   
version:1.0  
description:SolrCore  
stats:  coreName :
startTime : Mon Mar 28 12:47:27 CEST 2011
refCount : 2
aliases : []

name:   searcher  
class:  org.apache.solr.search.SolrIndexSearcher  
version:1.0  
description:index searcher  
stats:  searcherName : Searcher@1cf662f main
caching : true
numDocs : 0
maxDoc : 0
reader :
SolrIndexReader{this=1d8f162,r=ReadOnlyDirectoryReader@1d8f162,refCnt=1,segments=0}
readerDir :
org.apache.lucene.store.NIOFSDirectory@/home/andy/sw/apache-solr-1.4.1/example/solr/data/index
indexVersion : 1301253679802
openedAt : Mon Mar 28 12:48:00 CEST 2011
registeredAt : Mon Mar 28 12:48:00 CEST 2011
warmupTime : 3

name:   Searcher@1cf662f main  
class:  org.apache.solr.search.SolrIndexSearcher  
version:1.0  
description:index searcher  
stats:  searcherName : Searcher@1cf662f main
caching : true
numDocs : 0
maxDoc : 0
reader :
SolrIndexReader{this=1d8f162,r=ReadOnlyDirectoryReader@1d8f162,refCnt=1,segments=0}
readerDir :
org.apache.lucene.store.NIOFSDirectory@/home/andy/sw/apache-solr-1.4.1/example/solr/data/index
indexVersion : 1301253679802
openedAt : Mon Mar 28 12:48:00 CEST 2011
registeredAt : Mon Mar 28 12:48:00 CEST 2011
warmupTime : 3


Query Handlers

name:   /admin/properties  
class:  org.apache.solr.handler.admin.PropertiesRequestHandler  
version:$Revision: 790580 $  
description:Get System Properties  
stats:  handlerStart : 1301309248506
requests : 0
errors : 0
timeouts : 0
totalTime : 0
avgTimePerRequest : NaN
avgRequestsPerSecond : 0.0

name:   /update/csv  
class:  Lazy[solr.CSVRequestHandler]  
version:$Revision: 817165 $  
description:Lazy[solr.CSVRequestHandler]  
stats:  note : not initialized yet

name:   /admin/file  
class:  org.apache.solr.handler.admin.ShowFileRequestHandler  
version:$Revision: 790580 $  
description:Admin Get File -- view config files directly  
stats:  handlerStart : 1301309248509
requests : 0
errors : 0
timeouts : 0
totalTime : 0
avgTimePerRequest : NaN
avgRequestsPerSecond : 0.0

name:   org.apache.solr.handler.dataimport.DataImportHandler  
class:  org.apache.solr.handler.dataimport.DataImportHandler  
version:1.0  
description:Manage data import from databases to Solr  
stats:  Status : IDLE
Documents Processed : 0
Requests made to DataSource : 1
Rows Fetched : 404575
Documents Deleted : 0
Documents Skipped : 0
Total Documents Processed : 0
Total Requests made to DataSource : 2
Total Rows Fetched : 809150
Total Documents Deleted : 0
Total Documents Skipped : 0
handlerStart : 1301309248131
requests : 4
errors : 0
timeouts : 0
totalTime : 24
avgTimePerRequest : 6.0
avgRequestsPerSecond : 5.591581E-4

name:   org.apache.solr.handler.DumpRequestHandler  
class:  org.apache.solr.handler.DumpRequestHandler  
version:$Revision: 954340 $  
description:Dump handler (debug)  
stats:  handlerStart : 1301309248115
requests : 0
errors : 0
timeouts : 0
totalTime : 0
avgTimePerRequest : NaN
avgRequestsPerSecond : 0.0

name:   /admin/threads  
class:  org.apache.solr.handler.admin.ThreadDumpHandler  
version:$Revision: 790580 $  
description:Thread Dump  
stats:  handlerStart : 1301309248505
requests : 0
errors : 0
timeouts : 0
totalTime : 0
avgTimePerRequest : NaN
avgRequestsPerSecond : 0.0

name:   tvrh  
class:  org.apache.solr.handler.component.SearchHandler  
version:$Revision: 766412 $  
description:Search using components:
org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.TermVectorComponent,org.apache.solr.handler.component.DebugComponent,
 
stats:  handlerStart : 1301309247958
requests : 0
errors : 0
timeouts : 0
totalTime : 0
avgTimePerRequest : NaN
avgRequestsPerSecond : 0.0

name:   /debug/dump  
class:  org.apache.solr.handler.DumpRequestHandler  
version:$Revision: 954340 $  
description:Dump handler (debug)  
stats:  handlerStart : 1301309248115
requests : 0
errors : 0
timeouts :

Javabin to JSON convertor

2011-03-28 Thread paulohess
Hi guys,

I have an Javabin object ( it is actually a List data structure) and I need
to convert that to a JSon object.
I am using "Gson" and pass my List to it: gson.toJson(myList); , the rerturn
is just the same with couple of (") added to the begining and and end 

could anybody help here, Thanks Paulo.
PS> here is a sample of my List:
myList=

[numFound:7496,type:[Book (6997), Periodical (396), Visual Material (75),
Mixed Material (23),
 Archival/Manuscript Material (4), Electronic Resource (1)], name:[Grosz,
George
 (39), Ludorff, A. (30), Baedeker, Karl (29), Merian, Matthaeus (18),
Zeiller, M
artin (16), Mehling, Marianne (15), Renger-Patzsch, Albert (15), Scha?fke,
Werne
r (15), Schinkel, Karl Friedrich (15), Engel, Helmut (13), Giersberg,
Hans-Joach

convert TO:


{"numFound":7496,"facet_fields":{"type":{"Book":6997,"Periodical":396,"Visual
Ma
terial":75,"Mixed Material":23,"Archival/Manuscript Material":4,"Electronic
Reso
urce":1},"topic":{"Architecture":1136,"History":1118,"Exhibitions":734,"Art":350
,"Art, German":275,"Excavations (Archaeology)":271,"Conservation and
restoration


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Javabin-to-JSON-convertor-tp2745263p2745263.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Broken stats.js

2011-03-28 Thread Chris Hostetter

: I can't seem to find any references to this issue anywhere except :
: https://issues.apache.org/jira/browse/SOLR-1750
: 
: (Which has more of a workaround), and it seems that the SolrInfoMBeanHandler
: is not in the 1.4.1 build.

correct, it will be in 3.1 however.

it's not so much a workarround as it is a total abandonment of stats.jsp 
in favor of something that is easier to test, maintain, and use.

: Any help would be appreciated, so I can tune the caching settings on my SOLR
: install (which so far is screaming along, but it's always good to have more
: speed).

the one thing i can suggest that should work out of the box with solr 
1.4.1 is to config solr to use JMX and then run a JMX client to query solr 
for those stats...

http://wiki.apache.org/solr/SolrJmx

...that bypasses the stupid jsp completley.

-Hoss


Re: DIH relating multiple DataSources

2011-03-28 Thread Chris Hostetter

: Subject: DIH relating multiple DataSources
: In-Reply-To: <1301054278.18711.1433747...@webmail.messagingengine.com>
: References:
: 
:  
:  <1301054278.18711.1433747...@webmail.messagingengine.com>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.




-Hoss


Re: Broken stats.js

2011-03-28 Thread Mark Mandel
Ah cool, thanks for your help.

I'll get digging, and see what I can do.

Mark

On Tue, Mar 29, 2011 at 11:36 AM, Chris Hostetter
wrote:

>
> : I can't seem to find any references to this issue anywhere except :
> : https://issues.apache.org/jira/browse/SOLR-1750
> :
> : (Which has more of a workaround), and it seems that the
> SolrInfoMBeanHandler
> : is not in the 1.4.1 build.
>
> correct, it will be in 3.1 however.
>
> it's not so much a workarround as it is a total abandonment of stats.jsp
> in favor of something that is easier to test, maintain, and use.
>
> : Any help would be appreciated, so I can tune the caching settings on my
> SOLR
> : install (which so far is screaming along, but it's always good to have
> more
> : speed).
>
> the one thing i can suggest that should work out of the box with solr
> 1.4.1 is to config solr to use JMX and then run a JMX client to query solr
> for those stats...
>
>http://wiki.apache.org/solr/SolrJmx
>
> ...that bypasses the stupid jsp completley.
>
> -Hoss
>



-- 
E: mark.man...@gmail.com
T: http://www.twitter.com/neurotic
W: www.compoundtheory.com

cf.Objective(ANZ) - Nov 17, 18 - Melbourne Australia
http://www.cfobjective.com.au

Hands-on ColdFusion ORM Training
www.ColdFusionOrmTraining.com


Fields not being indexed?

2011-03-28 Thread Charles Wardell
Can someone take a look at this and let me know what I am doing wrong. 
According to luke, only guid, tags, and aquiDate are available.
Schema is below as well.



http://twitter.com/AshleyxArsenic/statuses/52164920388763648



















[]



















































 
 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 


guid
 


Re: Fields not being indexed?

2011-03-28 Thread Chris Hostetter

: Subject: Fields not being indexed?
: In-Reply-To: 
: References: 
:  
:  

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.



-Hoss


Fields not being indexed?

2011-03-28 Thread Charles Wardell
Sorry for inadvertently Hijacking the last thread. Can someone take a look at 
this and let me know what I am doing wrong. According to luke, only guid, tags, 
and aquiDate are available. Schema is below as well.



http://twitter.com/AshleyxArsenic/statuses/52164920388763648



















[]






   

   
   
   
   
   
   
   
   
   
   
   

   
   
   
   
   

   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

   













 














guid
 


Re: Question about the message Indexing failed. Rolled back all changes

2011-03-28 Thread Gora Mohanty
On Mon, Mar 28, 2011 at 3:59 PM, Firdous Ali  wrote:
> Hi,
> I m unable to index data, looks like the datasource is not even read by
> solr, even created an empty dataimport.properties file at /conf but the 
> problem
> persists.
[...]

Look at the Solr log files, which will probably have an exception
pointing to the source of the error. If needed, post a small excerpt
from the logs, showing details of the exception. Post this here, or
preferably on pastebin.com, and send us a link.

Regards,
Gora


Re: copyField at search time / multi-language support

2011-03-28 Thread Gora Mohanty
On Mon, Mar 28, 2011 at 2:15 PM, Tom Mortimer  wrote:
> Hi,
>
> Here's my problem: I'm indexing a corpus with text in a variety of
> languages. I'm planning to detect these at index time and send the
> text to one of a suitably-configured field (e.g. "mytext_de" for
> German, "mytext_cjk" for Chinese/Japanese/Korean etc.)

>
> At search time I want to search all of these fields. However, there
> will be at least 12 of them, which could lead to a very long query
> string. (Also I need to use the standard query parser rather than
> dismax, for full query syntax.)

Sorry, unable to understand this. Are you detecting the language,
and based on that, indexing to one of mytext_de, mytext_cjk, etc.,
or does each field have mixed languages? If the former, why could
you not also detect the language at query time (or, have separate
query sources for users of different languages), and query the
appropriate field based on the known language to be searched?

> Therefore I was wondering if there was a way to copy fields at search
> time, so I can have my mytext query in a single field and have it
> copied to mytext_de, mytext_cjk etc. Something like:
>
>   
>   
>  ...
>
> If this is not currently possible, could someone give me some pointers
> for hacking Solr to support it? Should I subclass solr.SearchHandler?
> I know nothing about Solr internals at the moment...
[...]

This is not possible as far as I know, and would be quite inefficient.

Regards,
Gora


solr result problem

2011-03-28 Thread anurag.walia
Thanks in advance.

Find the screen shot of analyzer for this 
http://lucene.472066.n3.nabble.com/file/n2746849/solr.jpg solr.jpg  problem
.

I have a problem with number of character in Term Text  . I entered
"Polymer" but after snowballporterfilterfactory it become "Polym" while it
was not exist in "protwords.txt" file . I want if any word does not exist in
"protwords.txt" Term Text should be whole world like "Polymer" 

Regards 
Anurag Walia 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-result-problem-tp2746849p2746849.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: copyField at search time / multi-language support

2011-03-28 Thread Andy
Tom,

Could you share the method you use to perform language detection? Any open 
source tools that do that?

Thanks.

--- On Mon, 3/28/11, Tom Mortimer  wrote:

> From: Tom Mortimer 
> Subject: copyField at search time / multi-language support
> To: solr-user@lucene.apache.org
> Date: Monday, March 28, 2011, 4:45 AM
> Hi,
> 
> Here's my problem: I'm indexing a corpus with text in a
> variety of
> languages. I'm planning to detect these at index time and
> send the
> text to one of a suitably-configured field (e.g.
> "mytext_de" for
> German, "mytext_cjk" for Chinese/Japanese/Korean etc.)
> 
> At search time I want to search all of these fields.
> However, there
> will be at least 12 of them, which could lead to a very
> long query
> string. (Also I need to use the standard query parser
> rather than
> dismax, for full query syntax.)
> 
> Therefore I was wondering if there was a way to copy fields
> at search
> time, so I can have my mytext query in a single field and
> have it
> copied to mytext_de, mytext_cjk etc. Something like:
> 
>     dest="mytext_de" />
>     dest="mytext_cjk" />
>   ...
> 
> If this is not currently possible, could someone give me
> some pointers
> for hacking Solr to support it? Should I subclass
> solr.SearchHandler?
> I know nothing about Solr internals at the moment...
> 
> thanks,
> Tom
> 


   


Re: solr result problem

2011-03-28 Thread Gora Mohanty
On Tue, Mar 29, 2011 at 9:46 AM, anurag.walia  wrote:
[...]
> I have a problem with number of character in Term Text  . I entered
> "Polymer" but after snowballporterfilterfactory it become "Polym" while it
> was not exist in "protwords.txt" file . I want if any word does not exist in
> "protwords.txt" Term Text should be whole world like "Polymer"
[...]

Are you misunderstanding the function of protwords.txt? The
SnowballPorterFilterFactory will act upon all words, except
those included in protwords.txt. Thus, if you do not want "polymer"
to be filtered, put it in protwords.txt.

Regards,
Gora


Re: solr result problem

2011-03-28 Thread anurag.walia
Hi  Gora,

Thanks for relied.

i applied this snowballporterfilterfactory for remove difference of result
in case of plural or singular.
if i entered polymer then it working fine but again polymers giving me
"polym".
while bag or bags giving me bag after snowballporterfilterfactory .

Please find the screen shot.

Regards
Anurag Walia http://lucene.472066.n3.nabble.com/file/n2746902/solr2.jpg
solr2.jpg 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-result-problem-tp2746849p2746902.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr result problem

2011-03-28 Thread Gora Mohanty
On Tue, Mar 29, 2011 at 10:12 AM, anurag.walia  wrote:
> Hi  Gora,
>
> Thanks for relied.
>
> i applied this snowballporterfilterfactory for remove difference of result
> in case of plural or singular.
> if i entered polymer then it working fine but again polymers giving me
> "polym".
> while bag or bags giving me bag after snowballporterfilterfactory .
[...]

Um, put "polymers" in protwords.txt then?

Regards,
Gora


Re: [WKT] Spatial Searching

2011-03-28 Thread Mattmann, Chris A (388J)
LGPL licenses and Apache aren't exactly compatible, see:

http://www.apache.org/legal/3party.html#transition-examples-lgpl
http://www.apache.org/legal/resolved.html#category-x

In practice, this was the reason we started the SIS project.

Cheers,
Chris

On Mar 28, 2011, at 11:16 AM, Smiley, David W. wrote:

> (This is one of those messages that I would have responded to at the time if 
> I only noticed it.)
> 
> There is not yet indexing of arbitrary shapes (i.e. your data can only be 
> points), but with SOLR-2155 you can query via WKT thanks to JTS.  If you want 
> to index shapes then you'll have to wait a month or two for work that is 
> underway right now.  It's coming; be patient.
> 
> I don't see the LGPL licensing as a problem; it's *L*GPL, not GPL, after all. 
>  In SOLR-2155 the patch I take measures to download this library dynamically 
> at build time and compile against it.  JTS need not ship with Solr; the user 
> can get it themselves if they want this capability.  Non-JTS query shapes 
> should work without the presence of JTS.
> 
> ~ David Smiley
> Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
> 
> On Feb 8, 2011, at 11:18 PM, Adam Estrada wrote:
> 
>> I just came across a ~nudge post over in the SIS list on what the status is 
>> for that project. This got me looking more in to spatial mods with Solr4.0.  
>> I found this enhancement in Jira. 
>> https://issues.apache.org/jira/browse/SOLR-2155. In this issue, David 
>> mentions that he's already integrated JTS in to Solr4.0 for querying on 
>> polygons stored as WKT. 
>> 
>> It's relatively easy to get WKT strings in to Solr but does the Field type 
>> exist yet? Is there a patch or something that I can test out? 
>> 
>> Here's how I would do it using GDAL/OGR and the already existing csv update 
>> handler. http://www.gdal.org/ogr/drv_csv.html
>> 
>> ogr2ogr -f CSV output.csv input.shp -lco GEOMETRY=AS_WKT
>> This converts a shapefile to a csv with the geometries in tact in the form 
>> of WKT. You can then get the data in to Solr by running the following 
>> command.
>> curl 
>> "http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,attr1,attr2,attr3,geom&stream.file=C:\tmp\output.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8";
>> There are lots of flavors of geometries so I suspect that this will be a 
>> daunting task but because JTS recognizes each geometry type it should be 
>> possible to work with them. 
>> Does anyone know of a patch or even when this functionality might be 
>> included in to Solr4.0? I need to query for polygons ;-)
>> Thanks,
>> Adam


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: solr result problem

2011-03-28 Thread anurag.walia
it will be polymers but result will come different in case of polymer and
polymers (singular/plural).
or there can be more words like polymer.

Regards
Anurag Walia

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-result-problem-tp2746849p2746947.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr result problem

2011-03-28 Thread Gora Mohanty
On Tue, Mar 29, 2011 at 10:41 AM, anurag.walia  wrote:
> it will be polymers but result will come different in case of polymer and
> polymers (singular/plural).
> or there can be more words like polymer.
[...]

Your only alternative then is to implement a filter that works the
way you want it to.

Regards,
Gora


Re: solr result problem

2011-03-28 Thread anurag.walia
is there any other filter which can solved my singular plural problem?

Anurag

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-result-problem-tp2746849p2746956.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: copyField at search time / multi-language support

2011-03-28 Thread Markus Jelsma
https://issues.apache.org/jira/browse/SOLR-1979

> Tom,
> 
> Could you share the method you use to perform language detection? Any open
> source tools that do that?
> 
> Thanks.
> 
> --- On Mon, 3/28/11, Tom Mortimer  wrote:
> > From: Tom Mortimer 
> > Subject: copyField at search time / multi-language support
> > To: solr-user@lucene.apache.org
> > Date: Monday, March 28, 2011, 4:45 AM
> > Hi,
> > 
> > Here's my problem: I'm indexing a corpus with text in a
> > variety of
> > languages. I'm planning to detect these at index time and
> > send the
> > text to one of a suitably-configured field (e.g.
> > "mytext_de" for
> > German, "mytext_cjk" for Chinese/Japanese/Korean etc.)
> > 
> > At search time I want to search all of these fields.
> > However, there
> > will be at least 12 of them, which could lead to a very
> > long query
> > string. (Also I need to use the standard query parser
> > rather than
> > dismax, for full query syntax.)
> > 
> > Therefore I was wondering if there was a way to copy fields
> > at search
> > time, so I can have my mytext query in a single field and
> > have it
> > copied to mytext_de, mytext_cjk etc. Something like:
> > 
> > > dest="mytext_de" />
> > > dest="mytext_cjk" />
> >   ...
> > 
> > If this is not currently possible, could someone give me
> > some pointers
> > for hacking Solr to support it? Should I subclass
> > solr.SearchHandler?
> > I know nothing about Solr internals at the moment...
> > 
> > thanks,
> > Tom


Re: copyField at search time / multi-language support

2011-03-28 Thread Andy
Thanks Markus.

Do you know if this patch is good enough for production use? Thanks.

Andy

--- On Tue, 3/29/11, Markus Jelsma  wrote:

> From: Markus Jelsma 
> Subject: Re: copyField at search time / multi-language support
> To: solr-user@lucene.apache.org
> Cc: "Andy" 
> Date: Tuesday, March 29, 2011, 1:29 AM
> https://issues.apache.org/jira/browse/SOLR-1979
> 
> > Tom,
> > 
> > Could you share the method you use to perform language
> detection? Any open
> > source tools that do that?
> > 
> > Thanks.
> > 
> > --- On Mon, 3/28/11, Tom Mortimer 
> wrote:
> > > From: Tom Mortimer 
> > > Subject: copyField at search time /
> multi-language support
> > > To: solr-user@lucene.apache.org
> > > Date: Monday, March 28, 2011, 4:45 AM
> > > Hi,
> > > 
> > > Here's my problem: I'm indexing a corpus with
> text in a
> > > variety of
> > > languages. I'm planning to detect these at index
> time and
> > > send the
> > > text to one of a suitably-configured field (e.g.
> > > "mytext_de" for
> > > German, "mytext_cjk" for Chinese/Japanese/Korean
> etc.)
> > > 
> > > At search time I want to search all of these
> fields.
> > > However, there
> > > will be at least 12 of them, which could lead to
> a very
> > > long query
> > > string. (Also I need to use the standard query
> parser
> > > rather than
> > > dismax, for full query syntax.)
> > > 
> > > Therefore I was wondering if there was a way to
> copy fields
> > > at search
> > > time, so I can have my mytext query in a single
> field and
> > > have it
> > > copied to mytext_de, mytext_cjk etc. Something
> like:
> > > 
> > >     > > dest="mytext_de" />
> > >     > > dest="mytext_cjk" />
> > >   ...
> > > 
> > > If this is not currently possible, could someone
> give me
> > > some pointers
> > > for hacking Solr to support it? Should I
> subclass
> > > solr.SearchHandler?
> > > I know nothing about Solr internals at the
> moment...
> > > 
> > > thanks,
> > > Tom
> 





Re: copyField at search time / multi-language support

2011-03-28 Thread Markus Jelsma
I haven't tried this as an UpdateProcessor but it relies on Tika and that 
LanguageIdentifier works well, except for short texts.

> Thanks Markus.
> 
> Do you know if this patch is good enough for production use? Thanks.
> 
> Andy
> 
> --- On Tue, 3/29/11, Markus Jelsma  wrote:
> > From: Markus Jelsma 
> > Subject: Re: copyField at search time / multi-language support
> > To: solr-user@lucene.apache.org
> > Cc: "Andy" 
> > Date: Tuesday, March 29, 2011, 1:29 AM
> > https://issues.apache.org/jira/browse/SOLR-1979
> > 
> > > Tom,
> > > 
> > > Could you share the method you use to perform language
> > 
> > detection? Any open
> > 
> > > source tools that do that?
> > > 
> > > Thanks.
> > > 
> > > --- On Mon, 3/28/11, Tom Mortimer 
> > 
> > wrote:
> > > > From: Tom Mortimer 
> > > > Subject: copyField at search time /
> > 
> > multi-language support
> > 
> > > > To: solr-user@lucene.apache.org
> > > > Date: Monday, March 28, 2011, 4:45 AM
> > > > Hi,
> > > > 
> > > > Here's my problem: I'm indexing a corpus with
> > 
> > text in a
> > 
> > > > variety of
> > > > languages. I'm planning to detect these at index
> > 
> > time and
> > 
> > > > send the
> > > > text to one of a suitably-configured field (e.g.
> > > > "mytext_de" for
> > > > German, "mytext_cjk" for Chinese/Japanese/Korean
> > 
> > etc.)
> > 
> > > > At search time I want to search all of these
> > 
> > fields.
> > 
> > > > However, there
> > > > will be at least 12 of them, which could lead to
> > 
> > a very
> > 
> > > > long query
> > > > string. (Also I need to use the standard query
> > 
> > parser
> > 
> > > > rather than
> > > > dismax, for full query syntax.)
> > > > 
> > > > Therefore I was wondering if there was a way to
> > 
> > copy fields
> > 
> > > > at search
> > > > time, so I can have my mytext query in a single
> > 
> > field and
> > 
> > > > have it
> > > > copied to mytext_de, mytext_cjk etc. Something
> > 
> > like:
> > > > > > >
> > > > dest="mytext_de" />
> > > >
> > > > > > >
> > > > dest="mytext_cjk" />
> > > >
> > > >   ...
> > > >
> > > > If this is not currently possible, could someone
> > 
> > give me
> > 
> > > > some pointers
> > > > for hacking Solr to support it? Should I
> > 
> > subclass
> > 
> > > > solr.SearchHandler?
> > > > I know nothing about Solr internals at the
> > 
> > moment...
> > 
> > > > thanks,
> > > > Tom


Re: Suggest component

2011-03-28 Thread Grijesh
have you checked with q=*:*?
You mentioned in config buildOnCommit=true
So have you checked that your indexing process ends with commit?

-
Thanx: 
Grijesh 
www.gettinhahead.co.in 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggest-component-tp2725438p2747100.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr result problem

2011-03-28 Thread Grijesh
Try LucidImagination's KStemmer

-
Thanx: 
Grijesh 
www.gettinhahead.co.in 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-result-problem-tp2746849p2747106.html
Sent from the Solr - User mailing list archive at Nabble.com.