Re: Apache Solr Configuration Problem (Japanese Language)

2014-03-06 Thread T. Kuro Kurosaka

Andy,
I don't have a direct answer to your question but I have a question.

On 03/05/2014 07:21 AM, Andy Alexander wrote:

fq=ss_language:ja&q=製品


I am guessing you have a field called ss_language where a language code 
of the document is stored,

and you have Solr documents of different languages.


+DisjunctionMaxQuery((content:製品)~0.01)
This indicate your default query field is "content".  What does the 
analyzer for this field look like?

Does the analyzer work for any languages that you want to support?
Many analyzers have language dependency and won't work with multilingual 
fields.


--
T. "Kuro" Kurosaka • Senior Software Engineer
Healthline - The Power of Intelligent Health
www.healthline.com  |@Healthline  | @HealthlineCorp



Apache Solr Configuration Problem (Japanese Language)

2014-03-05 Thread Andy Alexander
I am trying to pass a string of Japanese characters to an Apache Solr
query. The string in question is '製品'.

When a search is passed without any arguments, it brings up all of the
indexed information, including all of the documents that have this
particular string in them, however when this parameter is passed in as q=製品,
only one of the items is displayed.

Furthermore, when I have the query fq=ss_language:ja&q=製品 *three* items are
shown.

What would cause this peculiar behavior? The field in question where I am
searching for this string is indexed, and my assumption is that it should
bring up all documents with this string inside of them.

Here's the debug information:


製品
製品
+DisjunctionMaxQuery((content:製品)~0.01)
+(content:製品)~0.01


0.41303736 = (MATCH) fieldWeight(content:製品 in 80), product of:
1.4142135 = tf(termFreq(content:製品)=2) 5.3405533 = idf(docFreq=3,
maxDocs=307) 0.0546875 = fieldNorm(field=content, doc=80)


0.33378458 = (MATCH) fieldWeight(content:製品 in 66), product of: 1.0 =
tf(termFreq(content:製品)=1) 5.3405533 = idf(docFreq=3, maxDocs=307)
0.0625 = fieldNorm(field=content, doc=66)


0.2529327 = (MATCH) fieldWeight(content:製品 in 46), product of:
3.4641016 = tf(termFreq(content:製品)=12) 5.3405533 = idf(docFreq=3,
maxDocs=307) 0.013671875 = fieldNorm(field=content, doc=46)


ExtendedDismaxQParser



ss_language:ja


ss_language:ja


1.0

0.0

0.0


0.0


0.0


0.0


0.0


0.0


0.0



1.0

0.0


0.0


0.0


0.0


0.0


0.0


1.0






Re: Configuration problem

2014-03-03 Thread Thomas Fischer
Am 03.03.2014 um 22:43 schrieb Shawn Heisey:

> On 3/3/2014 9:02 AM, Thomas Fischer wrote:
>> The setting is
>> solr directories (I use different solr versions at the same time):
>> /srv/solr/solr4.6.1 is the solr home, in solr home is a file solr.xml of the 
>> new "discovery type" (no cores), and inside the core directories are empty 
>> files core.properties and symbolic links to the universal conf directory.
>>  solr webapps (I use very different webapps simultaneously):
>> /srv/www/webapps/solr/solr4.6.1 is the solr webapp
>> 
>> I tried to convey this information to the tomcat server by putting a file 
>> solr4.6.1.xml into the cataiina/localhost folder with the contents
>> 
>> > crossContext="true">
>>  > value="/srv/solr/solr4.6.1" override="true"/>
>> 
> 
> Your message is buried deep in another message thread about NoSQL, because 
> you replied to an existing message rather than starting a new message to 
> solr-user@lucene.apache.org.  On list-mirroring forums like Nabble, nobody 
> will even see your message (or this reply) unless they actually open that 
> other thread.  This is what it looks like on a threading mail reader 
> (Thunderbird):
> 
> https://www.dropbox.com/s/87ilv7jls7y5gym/solr-reply-thread.png

Yes, I'm sorry, I only afterwards realized that my question inherited the 
thread from the E-Mail I was reading and using as a template for the answer.

Meanwhile I figured out that I overlooked the third place to define solr home 
for Tomcat (after JAVA_OPTS and JNDI): web.xml in WEB-INF of the given webapp.
This overrides the other definitions and created the impression that I couldn't 
set  solr home.

But now I get the message
"Could not load config file /srv/solr/solr4.6.1/cores/geo/solrconfig.xml"
for the core "geo".
In the solr wiki I read (http://wiki.apache.org/solr/ConfiguringSolr):
"In each core, Solr will look for a conf/solrconfig.xml file" and expected solr 
to look for
/srv/solr/solr4.6.1/cores/geo/conf/solrconfig.xml (which exists), but obviously 
it doesn't.
Why? My misunderstanding?

Best
Thomas





Re: Configuration problem

2014-03-03 Thread Shawn Heisey

On 3/3/2014 9:02 AM, Thomas Fischer wrote:

The setting is
solr directories (I use different solr versions at the same time):
/srv/solr/solr4.6.1 is the solr home, in solr home is a file solr.xml of the new 
"discovery type" (no cores), and inside the core directories are empty files 
core.properties and symbolic links to the universal conf directory.
  
solr webapps (I use very different webapps simultaneously):

/srv/www/webapps/solr/solr4.6.1 is the solr webapp

I tried to convey this information to the tomcat server by putting a file 
solr4.6.1.xml into the cataiina/localhost folder with the contents






Your message is buried deep in another message thread about NoSQL, 
because you replied to an existing message rather than starting a new 
message to solr-user@lucene.apache.org.  On list-mirroring forums like 
Nabble, nobody will even see your message (or this reply) unless they 
actually open that other thread.  This is what it looks like on a 
threading mail reader (Thunderbird):


https://www.dropbox.com/s/87ilv7jls7y5gym/solr-reply-thread.png

I don't use Tomcat, so I can't even begin to comment on that.  I can 
talk about your solr home setting and what Solr is going to do with that.


You probably do not have /srv/solr/solr4.6.1/solr.xml on your system.  
Solr will look for solr.mxl in your solr home, and if it cannot find it, 
it assumes that you are not running multicore, so it look for things 
like collection1/conf/solrconfig.xml instead.


There is a solr.xml in the example.  Use that, changing as necessary, or 
create a solr.xml file with just the following line in it.  It will 
probably start working:




You *might* need the following instead, but since Solr uses standard XML 
parsing libraries, I would guess that the above line will work.





Thanks,
Shawn



Configuration problem

2014-03-03 Thread Thomas Fischer
Hello,

for some reason I have problems to get my local solr system to run (MacBook, 
tomcat 6.0.35).

The setting is
solr directories (I use different solr versions at the same time):
/srv/solr/solr4.6.1 is the solr home, in solr home is a file solr.xml of the 
new "discovery type" (no cores), and inside the core directories are empty 
files core.properties and symbolic links to the universal conf directory.
 
solr webapps (I use very different webapps simultaneously):
/srv/www/webapps/solr/solr4.6.1 is the solr webapp

I tried to convey this information to the tomcat server by putting a file 
solr4.6.1.xml into the cataiina/localhost folder with the contents





The Tomcat Manager shows solr4.6.1 as started, but following the given link 
gives an error with the message:
"SolrCore 'collection1' is not available due to init failure: Could not load 
config file /srv/solr4.6.1/collection1/solrconfig.xml"
which is plausible, since
1. there is no folder /srv/solr4.6.1/collection1 and
2.for the actual cores solrconfig.xml is inside of 
/srv/solr4.6.1/cores/geo/conf/

But why does Tomcat try to find a solrconfig.xml there?
The problem persists if I start tomcat with 
-Dsolr.solr.home=/srv/solr/solr4.6.1, it seems that the system just ignores the 
solr home setting.

Can somebody give me a hint what I'm doing wrong?

Best regards
Thomas

P.S.: Is there a way to stop Tomcat from throwing these errors into my face 
threefold: once as heading (!), once as message and once as description?




Re: SolrEntityProcessor Configuration Problem

2012-04-06 Thread Lance Norskog
The SolrEntityProcessor resolves all of its parameters at start time,
not for each query. This technique cannot work. I filed it:

https://issues.apache.org/jira/browse/SOLR-3336

On Fri, Apr 6, 2012 at 11:13 AM,   wrote:
> Dear all,
> I'm facing a problem with SolrEntityProcessor, when having it configured
> under a JDBC Datasource.
> My configuration looks like this:
>
> 
>
>                        
>                        
>                        
>
>                        
>
>                         clob="true"/>
>                        
>                         name="extended_keywords"                clob="true"/>
>                         name="publication_date"/>
>
>                        
>
>                         name="dl_file_entry_id" />
>                         name="dl_file_version_id" />
>                         />
>                        
>                        
>                        
>
>
>                                 fl="content" url="http://vmcenter120:8983/solr/";
> query="folderId:${V_MARKET_STUDIES.DL_FOLDER_ID}"
> fq="entryClassPK:${V_MARKET_STUDIES.DL_FILE_ENTRY_ID}">
>                
>
>                
>        
>
> I have 6 rows in the Oracle Database, but only the first row is processed
> right, means that the 2nd Solr is queried
> and the results went to the document, the remaining 5 rows where processed
> without quering the 2nd Solr and therfore
> didn't have the content field filled.
>
> Any suggestions?
> Did I configured something wrong, or misunderstand something wrong?
> Thanks for your help
>
>
> Best regards
> Michael



-- 
Lance Norskog
goks...@gmail.com


SolrEntityProcessor Configuration Problem

2012-04-06 Thread michael . kroh
Dear all,
I'm facing a problem with SolrEntityProcessor, when having it configured 
under a JDBC Datasource.
My configuration looks like this:













 










http://vmcenter120:8983/solr/"; 
query="folderId:${V_MARKET_STUDIES.DL_FOLDER_ID}" 
fq="entryClassPK:${V_MARKET_STUDIES.DL_FILE_ENTRY_ID}">

 



I have 6 rows in the Oracle Database, but only the first row is processed 
right, means that the 2nd Solr is queried
and the results went to the document, the remaining 5 rows where processed 
without quering the 2nd Solr and therfore
didn't have the content field filled.

Any suggestions?
Did I configured something wrong, or misunderstand something wrong?
Thanks for your help


Best regards
Michael

Re: HTMLStripCharFilterFactory configuration problem

2010-04-17 Thread Ahmet Arslan


> Actually I am using SolrJ client..
> Is there anyway to do same using solrj.
> 
> thanks

If you are using Java, life is easier. You can use this static function before 
adding a field to SolrInputDocument.

static String stripHTMLX(String value) {
StringBuilder out = new StringBuilder();
StringReader strReader = new StringReader(value);
try {
HTMLStripCharFilter html = new 
HTMLStripCharFilter(CharReader.get(strReader.markSupported() ? strReader : new 
BufferedReader(strReader)));
char[] cbuf = new char[1024 * 10];
while (true) {
int count = html.read(cbuf);
if (count == -1)
break; // end of stream mark is -1
if (count > 0)
out.append(cbuf, 0, count);
}
html.close();
} catch (IOException e) {
e.printStackTrace();
return null;
//  "Failed stripping HTML for column: " + column, e);
}
return out.toString();
}


  


Re: HTMLStripCharFilterFactory configuration problem

2010-04-17 Thread Ranveer Kumar
thanks..

Actually I am using SolrJ client..
Is there anyway to do same using solrj.

thanks

On Sat, Apr 17, 2010 at 8:06 PM, Ahmet Arslan  wrote:

>
>
> > Thanks for reply..
> > but how will I get the stored value instead of indexed
> > value..
> > where I need to configure to get stored instead of indexed
> > value.
> > please help...
> >
>
> You need to remove html tags before analysis (charfilter, tokenizer,
> tokenfilter) phase. For example if you are using DIH to index, you can use
> HTMLStripTransformer[1]. How are you indexing your data?
>
> [1]http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer
>
>
>
>


Re: HTMLStripCharFilterFactory configuration problem

2010-04-17 Thread Ahmet Arslan


> Thanks for reply..
> but how will I get the stored value instead of indexed
> value..
> where I need to configure to get stored instead of indexed
> value.
> please help...
> 

You need to remove html tags before analysis (charfilter, tokenizer, 
tokenfilter) phase. For example if you are using DIH to index, you can use 
HTMLStripTransformer[1]. How are you indexing your data?

[1]http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer


  


Re: HTMLStripCharFilterFactory configuration problem

2010-04-17 Thread Ranveer Kumar
Hi Sven,

Thanks for reply..
but how will I get the stored value instead of indexed value..
where I need to configure to get stored instead of indexed value.
please help...

thanks
with regards


On Wed, Apr 14, 2010 at 3:16 PM, Sven Maurmann wrote:

> Hi,
>
> please note that you get the stored value of the field as a result and
> not the indexed one.
>
> Cheers,
>   Sven
>
>
> --On Wednesday, April 14, 2010 02:54:52 PM +0530 Ranveer Kumar <
> ranveer.s...@gmail.com> wrote:
>
>  Hi all,
>>
>> I am facing problem to configure HTMLStripCharFilterFactory.
>> following is the schema :
>>   > positionIncrementGap="100">   
>> 
>>
>>
>>>ignoreCase="true"
>>words="stopwords.txt"
>>enablePositionIncrements="true"
>>/>
>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>>
>>> language="English" protected="protwords.txt"/>
>>  
>>  
>>  
>>
>>
>>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>>ignoreCase="true"
>>words="stopwords.txt"
>>enablePositionIncrements="true"
>>/>
>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>>
>>> language="English" protected="protwords.txt"/>
>>  
>>
>>
>> when I am checking with analysis.jsp it giving true result. But in
>> my query result still I am getting html tage..
>> I am using solrj client..
>>
>> please help me
>>
>
>
>
> --
> kippdata informationstechnologie GmbH
> Sven Maurmann   Tel: 0228 98549 -12
> Bornheimer Str. 33a Fax: 0228 98549 -50
> D-53111 Bonnsven.maurm...@kippdata.de
>
> HRB 8018 Amtsgericht Bonn / USt.-IdNr. DE 196 457 417
> Geschäftsführer: Dr. Thomas Höfer, Rainer Jung, Sven Maurmann
>


Re: HTMLStripCharFilterFactory configuration problem

2010-04-14 Thread Sven Maurmann

Hi,

please note that you get the stored value of the field as a result and
not the indexed one.

Cheers,
   Sven

--On Wednesday, April 14, 2010 02:54:52 PM +0530 Ranveer Kumar 
 wrote:



Hi all,

I am facing problem to configure HTMLStripCharFilterFactory.
following is the schema :
  







  
  
  







  


when I am checking with analysis.jsp it giving true result. But in
my query result still I am getting html tage..
I am using solrj client..

please help me




--
kippdata informationstechnologie GmbH
Sven Maurmann   Tel: 0228 98549 -12
Bornheimer Str. 33a Fax: 0228 98549 -50
D-53111 Bonnsven.maurm...@kippdata.de

HRB 8018 Amtsgericht Bonn / USt.-IdNr. DE 196 457 417
Geschäftsführer: Dr. Thomas Höfer, Rainer Jung, Sven Maurmann


HTMLStripCharFilterFactory configuration problem

2010-04-14 Thread Ranveer Kumar
Hi all,

I am facing problem to configure HTMLStripCharFilterFactory.
following is the schema :
   
  







  
  
  


   





  


when I am checking with analysis.jsp it giving true result. But in my query
result still I am getting html tage..
I am using solrj client..

please help me


Re: Tomcat JNDI and CWD Configuration problem with multiple solrs

2008-04-19 Thread Albert Ramstedt
OMG!

I found the error. I cant believe how much time i spent on this and it turns
out i should pay more attention. (Or really, I can believe, because it
happens more frequently than I wish it would).

Anyway, for those having the same issue in the future:

I am using acts_as_solr, a rails plugin for solr searching. And to set up
solr for my railsapp I blindly copied the solrconfig.xml from the plugin to
my solr/conf dir and installed the jndi context into tomcat and expected it
to work. What I experienced was that it then proceeded to create a
solr/index dir under cwd. This because the solrconfig.xml in acts_as_solr
has this line in the config:

 ${solr.data.dir:./solr/data}

It seems this creates the datadir under cwd. And it is probably not wanted
when you install it on a systemwide tomcat app server. Problem solved when i
commented that line out.

Hopefully this will save future acts_as_solr users some pain.

Albert

On Sat, Apr 19, 2008 at 12:05 AM, Chris Hostetter <[EMAIL PROTECTED]>
wrote:

>
> : The apps seem to work fine, only for some reason, when I start tomcat it
> : creates a solr dir in the cwd. So naturally, depending on where i do the
>
> Solr does not ever attempt to create a directory named "solr" (the only
> directories Solr tries to create if they don't already exist are inside
> of hte data dir)
>
> : restart, it wont work. If i cd to some dir where I have write access,
> the
> : apps goes up fine, and it even says the solr/home is where it should be.
>
> so what get's put in this solr dir that is created for you?  my guess is
> it's the expanded war file -- there is probably a tomcat setting for where
> these should go, and your tomcat configs have it as "."
>
> : (The dir i defined in the xml file, NOT cwd). But under statistics, both
> : separate solr apps seems to use an IndexReader under CWD. Ideally I
> would
>
> can you be more explicit about what exactly you are seeing (ie: cut+paste
> the log messages from Solr startup about solr home nad JNDI, cut+paste
> exactly what you see on the statistics page, etc..., cut+paste the shell
> commands you are running -- starting with a call to pwd so we know what
> the current directory is, cut+paste the directory listings of each
> directory you ae refering to.)
>
> : my own install of tomcat 6. (Although I dont understand why the example
> has
> : "f:/" stuff in the directory paths, since that notation throws errors at
> me.
>
> That's just an example of a windows path.
>
>
> -Hoss
>
>


Re: Tomcat JNDI and CWD Configuration problem with multiple solrs

2008-04-18 Thread Chris Hostetter

: The apps seem to work fine, only for some reason, when I start tomcat it
: creates a solr dir in the cwd. So naturally, depending on where i do the

Solr does not ever attempt to create a directory named "solr" (the only 
directories Solr tries to create if they don't already exist are inside 
of hte data dir)

: restart, it wont work. If i cd to some dir where I have write access, the
: apps goes up fine, and it even says the solr/home is where it should be.

so what get's put in this solr dir that is created for you?  my guess is 
it's the expanded war file -- there is probably a tomcat setting for where 
these should go, and your tomcat configs have it as "."

: (The dir i defined in the xml file, NOT cwd). But under statistics, both
: separate solr apps seems to use an IndexReader under CWD. Ideally I would

can you be more explicit about what exactly you are seeing (ie: cut+paste 
the log messages from Solr startup about solr home nad JNDI, cut+paste 
exactly what you see on the statistics page, etc..., cut+paste the shell 
commands you are running -- starting with a call to pwd so we know what 
the current directory is, cut+paste the directory listings of each 
directory you ae refering to.)

: my own install of tomcat 6. (Although I dont understand why the example has
: "f:/" stuff in the directory paths, since that notation throws errors at me.

That's just an example of a windows path.


-Hoss



Tomcat JNDI and CWD Configuration problem with multiple solrs

2008-04-17 Thread Albert Ramstedt
Hello List!

I am not an expert at configuring Tomcat, so I must be doing something
wrong, but for the life of me, I cannot find anything that would explain
this:

I want to have two separate solr apps running on one tomcat. I use the exact
configuration suggested here:

http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac

- Under multiple solr webapps.

The apps seem to work fine, only for some reason, when I start tomcat it
creates a solr dir in the cwd. So naturally, depending on where i do the
restart, it wont work. If i cd to some dir where I have write access, the
apps goes up fine, and it even says the solr/home is where it should be.
(The dir i defined in the xml file, NOT cwd). But under statistics, both
separate solr apps seems to use an IndexReader under CWD. Ideally I would
want to be able to configure this, so I know where the reader keeps its
files. And must both apps share this directory? I suspect this sharing is
why I cannot reindex both apps at the same time, since it touches some .lock
file during reindexing in the reader dir.

I use the exact same xml files under /Catalina/localhost/solr1.xml etc as
the wiki says. Same behaviour in tomcat 5.5 that ships with Ubuntu 7.10 and
my own install of tomcat 6. (Although I dont understand why the example has
"f:/" stuff in the directory paths, since that notation throws errors at me.

Does anyone know if I am doing something wrong, and how I can have separate
IndexReader folders?

Albert