Re: searching using the CJKAnalyzer

2004-10-12 Thread Daan Hoogland
Jon Schuster wrote:

I didn't need to make any changes to Entities to get Japanese searches working. Are 
you using the CJKAnalyzer when you perform the search, not only when building the 
index?
  

Yes, I use CJKAnalyzer all around. When searching I translate 
character-entities in order to find anything. When displaying search 
results, I don't see anything that looks as being part of an eastern 
character set. instead I see accented latin - and mathematical symbols.

When I don't pass entities by the way things get really nasty:
query passed: ??
 char(, LATIN_1_SUPPLEMENT)  char(?, LATIN_1_SUPPLEMENT) token found : 
  length: 1
 char(?, LATIN_1_SUPPLEMENT)  char(, LATIN_1_SUPPLEMENT)  char(, 
LATIN_1_SUPPLEMENT) token found :  length: 1
 char(, LATIN_1_SUPPLEMENT) searching contents: 

This was a query for two japanese characters.

-Original Message-
From: Daan Hoogland [mailto:[EMAIL PROTECTED] 
Sent: Sunday, October 10, 2004 10:48 PM
To: Lucene Users List
Subject: Re: searching using the CJKAnalyzer
Importance: Low


Che Dong wrote:

  

Seem not Analyser problem but html parser charset detecting error.

Could you show me the detail of the problem?



Thank Che,
I got it working by making the decode() from the Entities in demo 
public. I wrote a scanner to tranlate any entities in the query.
I want to translate back to entities in the results, but I'm not sure 
what the criteria should be. It seems to be just binary data.
How to conclude that 04?03?04 means ?

  

Thanks

Che Dong

Daan Hoogland wrote:



LS,
in
http://issues.apache.org/eyebrowse/ReadMsg?listId=30msgNo=8980
Jon Schuster explains how to get a Japanese search system working. I 
followed his advice and got a index that luke shows as what I 
expected it to be.
I don't know how to enter a search so that it gets passed to the 
engine properly. It works in luke but not in weblucene or in my own app.


  

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]







  




-- 
The information contained in this communication and any attachments is confidential 
and may be privileged, and is for the sole use of the intended recipient(s). Any 
unauthorized review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please notify the sender immediately by replying to this message 
and destroy all copies of this message and any attachments. ASML is neither liable for 
the proper and complete transmission of the information contained in this 
communication, nor for any delay in its receipt.


Re: searching using the CJKAnalyzer

2004-10-12 Thread Che Dong
CJKAnalyser not support single byte-stream, front end interface and 
backend indexing process need to transform source into double byte 
charactor-stream properly before search/index.

Please tell me know the output of
http://www.chedong.com/tech/HelloUnicode.java
with javac -encoding=gb2312 and javac -encoding=iso-8859-1
Regards
Che Dong
Daan Hoogland wrote:
Jon Schuster wrote:

I didn't need to make any changes to Entities to get Japanese searches working. Are 
you using the CJKAnalyzer when you perform the search, not only when building the 
index?

Yes, I use CJKAnalyzer all around. When searching I translate 
character-entities in order to find anything. When displaying search 
results, I don't see anything that looks as being part of an eastern 
character set. instead I see accented latin - and mathematical symbols.

When I don't pass entities by the way things get really nasty:
query passed: ??
 char(, LATIN_1_SUPPLEMENT)  char(?, LATIN_1_SUPPLEMENT) token found : 
  length: 1
 char(?, LATIN_1_SUPPLEMENT)  char(, LATIN_1_SUPPLEMENT)  char(, 
LATIN_1_SUPPLEMENT) token found :  length: 1
 char(, LATIN_1_SUPPLEMENT) searching contents: 

This was a query for two japanese characters.

-Original Message-
From: Daan Hoogland [mailto:[EMAIL PROTECTED] 
Sent: Sunday, October 10, 2004 10:48 PM
To: Lucene Users List
Subject: Re: searching using the CJKAnalyzer
Importance: Low

Che Dong wrote:


Seem not Analyser problem but html parser charset detecting error.
Could you show me the detail of the problem?
  

Thank Che,
I got it working by making the decode() from the Entities in demo 
public. I wrote a scanner to tranlate any entities in the query.
I want to translate back to entities in the results, but I'm not sure 
what the criteria should be. It seems to be just binary data.
How to conclude that 04?03?04 means ?



Thanks
Che Dong
Daan Hoogland wrote:
  


LS,
in
http://issues.apache.org/eyebrowse/ReadMsg?listId=30msgNo=8980
Jon Schuster explains how to get a Japanese search system working. I 
followed his advice and got a index that luke shows as what I 
expected it to be.
I don't know how to enter a search so that it gets passed to the 
engine properly. It works in luke but not in weblucene or in my own app.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
  







-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: searching using the CJKAnalyzer

2004-10-11 Thread Jon Schuster
I didn't need to make any changes to Entities to get Japanese searches working. Are 
you using the CJKAnalyzer when you perform the search, not only when building the 
index?

-Original Message-
From: Daan Hoogland [mailto:[EMAIL PROTECTED] 
Sent: Sunday, October 10, 2004 10:48 PM
To: Lucene Users List
Subject: Re: searching using the CJKAnalyzer
Importance: Low


Che Dong wrote:

 Seem not Analyser problem but html parser charset detecting error.

 Could you show me the detail of the problem?

Thank Che,
I got it working by making the decode() from the Entities in demo 
public. I wrote a scanner to tranlate any entities in the query.
I want to translate back to entities in the results, but I'm not sure 
what the criteria should be. It seems to be just binary data.
How to conclude that 04?03?04 means ?


 Thanks

 Che Dong

 Daan Hoogland wrote:

 LS,
 in
 http://issues.apache.org/eyebrowse/ReadMsg?listId=30msgNo=8980
 Jon Schuster explains how to get a Japanese search system working. I 
 followed his advice and got a index that luke shows as what I 
 expected it to be.
 I don't know how to enter a search so that it gets passed to the 
 engine properly. It works in luke but not in weblucene or in my own app.




 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





-- 
The information contained in this communication and any attachments is confidential 
and may be privileged, and is for the sole use of the intended recipient(s). Any 
unauthorized review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please notify the sender immediately by replying to this message 
and destroy all copies of this message and any attachments. ASML is neither liable for 
the proper and complete transmission of the information contained in this 
communication, nor for any delay in its receipt.


Re: searching using the CJKAnalyzer

2004-10-10 Thread Daan Hoogland
Che Dong wrote:

 Seem not Analyser problem but html parser charset detecting error.

 Could you show me the detail of the problem?

Thank Che,
I got it working by making the decode() from the Entities in demo 
public. I wrote a scanner to tranlate any entities in the query.
I want to translate back to entities in the results, but I'm not sure 
what the criteria should be. It seems to be just binary data.
How to conclude that 0Š4?0†3¨¦?0„4 means ÓÐÒ°?


 Thanks

 Che Dong

 Daan Hoogland wrote:

 LS,
 in
 http://issues.apache.org/eyebrowse/ReadMsg?listId=30msgNo=8980
 Jon Schuster explains how to get a Japanese search system working. I 
 followed his advice and got a index that luke shows as what I 
 expected it to be.
 I don't know how to enter a search so that it gets passed to the 
 engine properly. It works in luke but not in weblucene or in my own app.




 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





-- 
The information contained in this communication and any attachments is confidential 
and may be privileged, and is for the sole use of the intended recipient(s). Any 
unauthorized review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please notify the sender immediately by replying to this message 
and destroy all copies of this message and any attachments. ASML is neither liable for 
the proper and complete transmission of the information contained in this 
communication, nor for any delay in its receipt.


searching using the CJKAnalyzer

2004-10-08 Thread Daan Hoogland
LS,
in
http://issues.apache.org/eyebrowse/ReadMsg?listId=30msgNo=8980
Jon Schuster explains how to get a Japanese search system working. I 
followed his advice and got a index that luke shows as what I expected 
it to be.
I don't know how to enter a search so that it gets passed to the engine 
properly. It works in luke but not in weblucene or in my own app.


-- 
The information contained in this communication and any attachments is confidential 
and may be privileged, and is for the sole use of the intended recipient(s). Any 
unauthorized review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please notify the sender immediately by replying to this message 
and destroy all copies of this message and any attachments. ASML is neither liable for 
the proper and complete transmission of the information contained in this 
communication, nor for any delay in its receipt.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]