char mapping in lucene-icu

2014-02-14 Thread alxsss

Hello,

I try to use lucene-icu li in solr-4.6.1. I need to  change a char mapping in 
lucene-icu. I have made changes
to 

lucene/analysis/icu/src/data/utr30/DiacriticFolding.txt

and built jar file using ant , but it did not help.

 I took a look to  lucene/analysis/icu/build.xml and see these lines

 
  
  
  
Note that the gennorm2 and icupkg tools must be on your PATH. These 
tools
are part of the ICU4C package. See http://site.icu-project.org/ 


  
  
  
  
  
  



  
  
  


  

looks like ant does not execute gennorm2. If I build utr30.nrm file using 
gennorm2 manually
 and replacing utr30.nrm in the jar file then starting solr gives the following 
error.
Caused by: java.lang.RuntimeException: java.io.IOException: ICU data file 
error: Header authentication failed, please check if you have a valid ICU data 
file

My questions are;
 1. if the above code in the build file does not get executed then how the 
utr30 file is generated?
 2. How to change a character mapping. 


Thanks.
Alex.



Re: char mapping in lucene-icu

2014-02-14 Thread Jack Krupansky

Do you get the exception if you run ant before changing the data files?

"Header authentication failed, please check if you have a valid ICU data 
file"


Check with the ICU project as to the proper format for THEIR files. I mean, 
this doesn't sound like a Lucene issue.


Maybe it could be as simple as whether the data file should have DOS or UNIX 
or Mac line endings (CRLF vs. NL vs. CR.) Be sure to use an editor that 
satisfies the requirements of ICU.


To be clear, Lucene itself does not have a published API for modifying the 
mappings of ICU.


-- Jack Krupansky

-Original Message- 
From: alx...@aim.com

Sent: Friday, February 14, 2014 7:48 PM
To: java-user@lucene.apache.org
Subject: char mapping in lucene-icu


Hello,

I try to use lucene-icu li in solr-4.6.1. I need to  change a char mapping 
in lucene-icu. I have made changes

to

lucene/analysis/icu/src/data/utr30/DiacriticFolding.txt

and built jar file using ant , but it did not help.

I took a look to  lucene/analysis/icu/build.xml and see these lines

 value="nfc.txt nfkc.txt nfkc_cf.txt BasicFoldings.txt DiacriticFolding.txt 
DingbatFolding.txt HanRadicalFolding.txt NativeDigitFolding.txt"/>

 
 value="${resources.dir}/org/apache/lucene/analysis/icu/utr30.nrm"/>

 
   Note that the gennorm2 and icupkg tools must be on your PATH. 
These tools

are part of the ICU4C package. See http://site.icu-project.org/ 
   
   
 
 
 
 
 
 
   
   
   
 
 
 
   
   
 

looks like ant does not execute gennorm2. If I build utr30.nrm file using 
gennorm2 manually
and replacing utr30.nrm in the jar file then starting solr gives the 
following error.
Caused by: java.lang.RuntimeException: java.io.IOException: ICU data file 
error: Header authentication failed, please check if you have a valid ICU 
data file


My questions are;
1. if the above code in the build file does not get executed then how the 
utr30 file is generated?

2. How to change a character mapping.


Thanks.
Alex.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: char mapping in lucene-icu

2014-02-14 Thread alxsss


Hi Jack,

 I do not get exception before changing data files. And  I do not get exception 
after changing data files and creating lucene-icu...jar by ant.
But changing data files and running ant does not change the output.

So I decided to manually create .nrm file by using steps outlined in the 
build.xml file 

 
  
  
  
Note that the gennorm2 and icupkg tools must be on your PATH. These 
tools
are part of the ICU4C package. See http://site.icu-project.org/ 


  
  
  
  
  
  



  
  
  


  


namely


gennorm2 -v -s src/data/utr30 nfc.txt nfkc.txt nfkc_cf.txt BasicFoldings.txt 
DiacriticFolding.txt DingbatFolding.txt HanRadicalFolding.txt 
NativeDigitFolding.txt -o  utr30.tmp

icupkg -tb  utr30.tmp  utr30.nrm
 
then I unpacked lucene-icu...jar file, replaced .nrm file  and created new jar 
file using jar cf 

Solr gives error if I use this new .jar file

What I noticed was that ant task actually does not run gennorm2 task.
 
If I delete gennrom2 entiry from build.xml file utr30nrm still gets created by 
ant task. I have deleted even these lines


  

  
  

  


  

  

it still gets created. So, I wondered how ant creates it?


icu support team wrote that they do not have any mappings. 
I mean mappings between diacritic letters and latin letters.

 

 Thanks.
Alex.



 

-Original Message-
From: Jack Krupansky 
To: java-user 
Sent: Fri, Feb 14, 2014 5:13 pm
Subject: Re: char mapping in lucene-icu


Do you get the exception if you run ant before changing the data files?

"Header authentication failed, please check if you have a valid ICU data 
file"

Check with the ICU project as to the proper format for THEIR files. I mean, 
this doesn't sound like a Lucene issue.

Maybe it could be as simple as whether the data file should have DOS or UNIX 
or Mac line endings (CRLF vs. NL vs. CR.) Be sure to use an editor that 
satisfies the requirements of ICU.

To be clear, Lucene itself does not have a published API for modifying the 
mappings of ICU.

-- Jack Krupansky

-Original Message- 
From: alx...@aim.com
Sent: Friday, February 14, 2014 7:48 PM
To: java-user@lucene.apache.org
Subject: char mapping in lucene-icu


Hello,

I try to use lucene-icu li in solr-4.6.1. I need to  change a char mapping 
in lucene-icu. I have made changes
to

lucene/analysis/icu/src/data/utr30/DiacriticFolding.txt

and built jar file using ant , but it did not help.

I took a look to  lucene/analysis/icu/build.xml and see these lines


  
  
  
Note that the gennorm2 and icupkg tools must be on your PATH. 
These tools
are part of the ICU4C package. See http://site.icu-project.org/ 


  
  
  
  
  
  



  
  
  


  

looks like ant does not execute gennorm2. If I build utr30.nrm file using 
gennorm2 manually
and replacing utr30.nrm in the jar file then starting solr gives the 
following error.
Caused by: java.lang.RuntimeException: java.io.IOException: ICU data file 
error: Header authentication failed, please check if you have a valid ICU 
data file

My questions are;
1. if the above code in the build file does not get executed then how the 
utr30 file is generated?
2. How to change a character mapping.


Thanks.
Alex.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org