Re: Forrest PDF non-Latin-1 support [was: RE: prototype Solr 1.3 RC 1]
Thank you Steve! See, I knew you'd nail it. I don't want to complicate lives of others just because of one little diacritic. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Steven A Rowe [EMAIL PROTECTED] To: solr-dev@lucene.apache.org Sent: Friday, August 29, 2008 5:57:31 PM Subject: Forrest PDF non-Latin-1 support [was: RE: prototype Solr 1.3 RC 1] On 08/29/2008 at 3:24 PM, Chris Hostetter wrote: I suspect the PDF formatter just doesn't play nicely with the non-trivial UTF-8 characters. This is an Apache FOP FAQ; from : 6.2. Some characters are not displayed, or displayed incorrectly, or displayed as #. This usually means the selected font doesn't have a glyph for the character. The standard text fonts supplied with Acrobat Reader have mostly glyphs for characters from the ISO Latin 1 character set. [...] If you use your own fonts, the font must have a glyph for the desired character. Furthermore the font must be available on the machine where the PDF is viewed or it must have been embedded in the PDF file. [...] There's an open Forrest bug for this problem: , and the discussion there includes a link to the Cocoon documentation for embedding fonts in PDF files: . This looks kinda complicated, and AFAICT would require modifications to the Forrest installation wherever the site is built. I suspect that almost nobody looks at the PDF version of the Who we are page (and I sure am sorry now that I brought this up...) If things are left as-is, Otis's last name would be displayed properly in the HTML, and garbled in the PDF; if the diacritic is removed, then it will be displayed improperly in both places :) Steve
Re: Forrest PDF non-Latin-1 support [was: RE: prototype Solr 1.3 RC 1]
On Fri, 2008-08-29 at 17:57 -0400, Steven A Rowe wrote: On 08/29/2008 at 3:24 PM, Chris Hostetter wrote: I suspect the PDF formatter just doesn't play nicely with the non-trivial UTF-8 characters. ... There's an open Forrest bug for this problem: https://issues.apache.org/jira/browse/FOR-132, and the discussion there includes a link to the Cocoon documentation for embedding fonts in PDF files: http://cocoon.apache.org/2.1/userdocs/pdf-serializer.html#FOP+and+Embedding+Fonts. This looks kinda complicated, and AFAICT would require modifications to the Forrest installation wherever the site is built. I just saw the thread, I will have a look. Which version of forrest is currently recommended? I ask because they have been done (and still some underway) to the pdf plugin lately. Will let you know about my findings. salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: prototype Solr 1.3 RC 1
Seems like all issues have been closed. What is the plan for the release now? On Fri, Aug 29, 2008 at 8:46 PM, Grant Ingersoll [EMAIL PROTECTED]wrote: I created a Hudson task to do the building/archival tasks for the release candidates.It is a on-demand task (i.e. not scheduled) See http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Candidate/ for the job in general. The artifacts (including Maven) are at: http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Candidate/lastSuccessfulBuild/artifact/ The web site (including javadocs): http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Candidate/lastSuccessfulBuild/artifact/release-candidate/build/docs/index.html I haven't gone through with a fine tooth comb yet, hence the prototype in the subject line, but my preliminary skimming of it seems like it is on track. I will cover it more later today. In the meantime, feedback is appreciated. Cheers, Grant -- Regards, Shalin Shekhar Mangar.
Re: prototype Solr 1.3 RC 1
On Wed, Sep 3, 2008 at 2:57 PM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: Seems like all issues have been closed. What is the plan for the release now? I need to update the lucene libs again first... MikeM found+fixed a lucene bug today. -Yonik
RE: prototype Solr 1.3 RC 1
Random nit: in release-candidate/build/docs/who.pdf, Otis's name is spelled Otis Gospodneti# (final character is a hash mark). - Steve On 08/29/2008 at 11:16 AM, Grant Ingersoll wrote: I created a Hudson task to do the building/archival tasks for the release candidates.It is a on-demand task (i.e. not scheduled) See http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can didate/ for the job in general. The artifacts (including Maven) are at: http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can didate/lastSuccessfulBuild/artifact/ The web site (including javadocs): http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can didate/lastSuccessfulBuild/artifact/release-candidate/build/do cs/index.html I haven't gone through with a fine tooth comb yet, hence the prototype in the subject line, but my preliminary skimming of it seems like it is on track. I will cover it more later today. In the meantime, feedback is appreciated. Cheers, Grant
Re: prototype Solr 1.3 RC 1
On Aug 29, 2008, at 12:05 PM, Steven A Rowe wrote: Random nit: in release-candidate/build/docs/who.pdf, Otis's name is spelled Otis Gospodneti# (final character is a hash mark). - Steve Hmm, that's weird. It's also that way on the current site. On 08/29/2008 at 11:16 AM, Grant Ingersoll wrote: I created a Hudson task to do the building/archival tasks for the release candidates.It is a on-demand task (i.e. not scheduled) See http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can didate/ for the job in general. The artifacts (including Maven) are at: http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can didate/lastSuccessfulBuild/artifact/ The web site (including javadocs): http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can didate/lastSuccessfulBuild/artifact/release-candidate/build/do cs/index.html I haven't gone through with a fine tooth comb yet, hence the prototype in the subject line, but my preliminary skimming of it seems like it is on track. I will cover it more later today. In the meantime, feedback is appreciated. Cheers, Grant -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Re: prototype Solr 1.3 RC 1
Grant, here is what it's supposed to be: Gospodnetić If Forrest and friends don't like that diacritic, I suppose I can live with Gospodnetic -- damn i18n! ;) This is what I see locally: $ ffxg Gospod ./src/site/src/documentation/content/xdocs/who.xml: liOtis Gospodneti#263;/li $ find . -name \*html | xargs grep Gospod ./site/who.html:liOtis Gospodnetić/li So the HTML looks OK, but apparently the PDF does not. I don't know how else to specify that c with a diacritic other than with that #263; . $10 that Steve knows! Thank you, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Grant Ingersoll [EMAIL PROTECTED] To: solr-dev@lucene.apache.org Sent: Friday, August 29, 2008 2:42:26 PM Subject: Re: prototype Solr 1.3 RC 1 On Aug 29, 2008, at 12:05 PM, Steven A Rowe wrote: Random nit: in release-candidate/build/docs/who.pdf, Otis's name is spelled Otis Gospodneti# (final character is a hash mark). - Steve Hmm, that's weird. It's also that way on the current site. On 08/29/2008 at 11:16 AM, Grant Ingersoll wrote: I created a Hudson task to do the building/archival tasks for the release candidates.It is a on-demand task (i.e. not scheduled) See http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can didate/ for the job in general. The artifacts (including Maven) are at: http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can didate/lastSuccessfulBuild/artifact/ The web site (including javadocs): http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can didate/lastSuccessfulBuild/artifact/release-candidate/build/do cs/index.html I haven't gone through with a fine tooth comb yet, hence the prototype in the subject line, but my preliminary skimming of it seems like it is on track. I will cover it more later today. In the meantime, feedback is appreciated. Cheers, Grant -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Re: prototype Solr 1.3 RC 1
Maybe this is the answer: http://forrest.apache.org/docs_0_90/faq.html#encoding And this is what we've got: $ head -1 src/site/src/documentation/content/xdocs/who.xml ?xml version=1.0? Sounds like something that would be good to fix in general, but I don't have forrest set up to try it :( Otis - Original Message From: Otis Gospodnetic [EMAIL PROTECTED] To: solr-dev@lucene.apache.org Sent: Friday, August 29, 2008 3:13:39 PM Subject: Re: prototype Solr 1.3 RC 1 Grant, here is what it's supposed to be: Gospodnetić If Forrest and friends don't like that diacritic, I suppose I can live with Gospodnetic -- damn i18n! ;) This is what I see locally: $ ffxg Gospod ./src/site/src/documentation/content/xdocs/who.xml: * Otis Gospodnetić $ find . -name \*html | xargs grep Gospod ./site/who.html: * Otis Gospodnetić So the HTML looks OK, but apparently the PDF does not. I don't know how else to specify that c with a diacritic other than with that ć . $10 that Steve knows! Thank you, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Grant Ingersoll To: solr-dev@lucene.apache.org Sent: Friday, August 29, 2008 2:42:26 PM Subject: Re: prototype Solr 1.3 RC 1 On Aug 29, 2008, at 12:05 PM, Steven A Rowe wrote: Random nit: in release-candidate/build/docs/who.pdf, Otis's name is spelled Otis Gospodneti# (final character is a hash mark). - Steve Hmm, that's weird. It's also that way on the current site. On 08/29/2008 at 11:16 AM, Grant Ingersoll wrote: I created a Hudson task to do the building/archival tasks for the release candidates.It is a on-demand task (i.e. not scheduled) See http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can didate/ for the job in general. The artifacts (including Maven) are at: http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can didate/lastSuccessfulBuild/artifact/ The web site (including javadocs): http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can didate/lastSuccessfulBuild/artifact/release-candidate/build/do cs/index.html I haven't gone through with a fine tooth comb yet, hence the prototype in the subject line, but my preliminary skimming of it seems like it is on track. I will cover it more later today. In the meantime, feedback is appreciated. Cheers, Grant -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Re: prototype Solr 1.3 RC 1
: If Forrest and friends don't like that diacritic, I suppose I can live : with Gospodnetic -- damn i18n! ;) I seem to recall that we had this problem with forrest and the Lucene-Java who page as well ... over there you are listed in lowly ASCII, without your I18N goodness. -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Grant Ingersoll [EMAIL PROTECTED] To: solr-dev@lucene.apache.org Sent: Friday, August 29, 2008 2:42:26 PM Subject: Re: prototype Solr 1.3 RC 1 On Aug 29, 2008, at 12:05 PM, Steven A Rowe wrote: Random nit: in release-candidate/build/docs/who.pdf, Otis's name is spelled Otis Gospodneti# (final character is a hash mark). - Steve Hmm, that's weird. It's also that way on the current site. On 08/29/2008 at 11:16 AM, Grant Ingersoll wrote: I created a Hudson task to do the building/archival tasks for the release candidates.It is a on-demand task (i.e. not scheduled) See http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can didate/ for the job in general. The artifacts (including Maven) are at: http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can didate/lastSuccessfulBuild/artifact/ The web site (including javadocs): http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can didate/lastSuccessfulBuild/artifact/release-candidate/build/do cs/index.html I haven't gone through with a fine tooth comb yet, hence the prototype in the subject line, but my preliminary skimming of it seems like it is on track. I will cover it more later today. In the meantime, feedback is appreciated. Cheers, Grant -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ : -Hoss
Re: prototype Solr 1.3 RC 1
: Maybe this is the answer: : http://forrest.apache.org/docs_0_90/faq.html#encoding my reading of that is that setting an encoding will let you use the litteral UTF-8 character 9which is what i would expect) but not the end of the answer... Another option is to use character entities such as ouml; (ö) or the numeric form #246; (ö). ...which is what we are doing. I suspect the PDF formatter just doesn't play nicely with the non-trivial UTF-8 characters. -Hoss
Re: prototype Solr 1.3 RC 1
On Aug 29, 2008, at 3:18 PM, Otis Gospodnetic wrote: Maybe this is the answer: http://forrest.apache.org/docs_0_90/faq.html#encoding And this is what we've got: $ head -1 src/site/src/documentation/content/xdocs/who.xml ?xml version=1.0? Sounds like something that would be good to fix in general, but I don't have forrest set up to try it :( It's easy to setup ;-) http://forrest.apache.org.
Re: prototype Solr 1.3 RC 1
: I haven't gone through with a fine tooth comb yet, hence the prototype in : the subject line, but my preliminary skimming of it seems like it is on track. : I will cover it more later today. In the meantime, feedback is appreciated. I've done some testing with both the Solr 1.2 example and the configs from my apachecon demo last year, the only problem that jumped out at me was SOLR-740. FWIW: I've also commited some usage/javadoc fixes. -Hoss
Forrest PDF non-Latin-1 support [was: RE: prototype Solr 1.3 RC 1]
On 08/29/2008 at 3:24 PM, Chris Hostetter wrote: I suspect the PDF formatter just doesn't play nicely with the non-trivial UTF-8 characters. This is an Apache FOP FAQ; from http://xmlgraphics.apache.org/fop/faq.html#pdf-characters: 6.2. Some characters are not displayed, or displayed incorrectly, or displayed as #. This usually means the selected font doesn't have a glyph for the character. The standard text fonts supplied with Acrobat Reader have mostly glyphs for characters from the ISO Latin 1 character set. [...] If you use your own fonts, the font must have a glyph for the desired character. Furthermore the font must be available on the machine where the PDF is viewed or it must have been embedded in the PDF file. [...] There's an open Forrest bug for this problem: https://issues.apache.org/jira/browse/FOR-132, and the discussion there includes a link to the Cocoon documentation for embedding fonts in PDF files: http://cocoon.apache.org/2.1/userdocs/pdf-serializer.html#FOP+and+Embedding+Fonts. This looks kinda complicated, and AFAICT would require modifications to the Forrest installation wherever the site is built. I suspect that almost nobody looks at the PDF version of the Who we are page (and I sure am sorry now that I brought this up...) If things are left as-is, Otis's last name would be displayed properly in the HTML, and garbled in the PDF; if the diacritic is removed, then it will be displayed improperly in both places :) Steve