Re: Forrest PDF non-Latin-1 support [was: RE: prototype Solr 1.3 RC 1]

2008-09-04 Thread Otis Gospodnetic
Thank you Steve!  See, I knew you'd nail it.  I don't want to complicate lives 
of others just because of one little diacritic.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Steven A Rowe [EMAIL PROTECTED]
 To: solr-dev@lucene.apache.org
 Sent: Friday, August 29, 2008 5:57:31 PM
 Subject: Forrest PDF non-Latin-1 support [was: RE: prototype Solr 1.3 RC 1]
 
 On 08/29/2008 at 3:24 PM, Chris Hostetter wrote:
  I suspect the PDF formatter just doesn't play nicely with the
  non-trivial UTF-8 characters.
 
 This is an Apache FOP FAQ; from 
 :
 
6.2. Some characters are not displayed, or displayed
 incorrectly, or displayed as #.
 
This usually means the selected font doesn't have a
glyph for the character.
 
The standard text fonts supplied with Acrobat Reader have
mostly glyphs for characters from the ISO Latin 1 character
set. [...]
 
If you use your own fonts, the font must have a glyph for the
desired character. Furthermore the font must be available on
the machine where the PDF is viewed or it must have been
embedded in the PDF file. [...]
 
 There's an open Forrest bug for this problem: 
 , and the discussion there 
 includes a link to the Cocoon documentation for embedding fonts in PDF files: 
 .
 
 This looks kinda complicated, and AFAICT would require modifications to the 
 Forrest installation wherever the site is built.
 
 I suspect that almost nobody looks at the PDF version of the Who we are 
 page 
 (and I sure am sorry now that I brought this up...)
 
 If things are left as-is, Otis's last name would be displayed properly in the 
 HTML, and garbled in the PDF; if the diacritic is removed, then it will be 
 displayed improperly in both places :)
 
 Steve



Re: Forrest PDF non-Latin-1 support [was: RE: prototype Solr 1.3 RC 1]

2008-09-04 Thread Thorsten Scherler
On Fri, 2008-08-29 at 17:57 -0400, Steven A Rowe wrote:
 On 08/29/2008 at 3:24 PM, Chris Hostetter wrote:
  I suspect the PDF formatter just doesn't play nicely with the
  non-trivial UTF-8 characters.
...
 
 There's an open Forrest bug for this problem: 
 https://issues.apache.org/jira/browse/FOR-132, and the discussion there 
 includes a link to the Cocoon documentation for embedding fonts in PDF files: 
 http://cocoon.apache.org/2.1/userdocs/pdf-serializer.html#FOP+and+Embedding+Fonts.
 
 This looks kinda complicated, and AFAICT would require modifications to the 
 Forrest installation wherever the site is built.

I just saw the thread, I will have a look.

Which version of forrest is currently recommended? I ask because they
have been done (and still some underway) to the pdf plugin lately.

Will let you know about my findings.

salu2
-- 
Thorsten Scherler thorsten.at.apache.org
Open Source Java  consulting, training and solutions



Re: prototype Solr 1.3 RC 1

2008-09-03 Thread Shalin Shekhar Mangar
Seems like all issues have been closed. What is the plan for the release
now?

On Fri, Aug 29, 2008 at 8:46 PM, Grant Ingersoll [EMAIL PROTECTED]wrote:

 I created a Hudson task to do the building/archival tasks for the release
 candidates.It is a on-demand task (i.e. not scheduled)

 See
 http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Candidate/ for
 the job in general.

 The artifacts (including Maven) are at:

 http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Candidate/lastSuccessfulBuild/artifact/

 The web site (including javadocs):

 http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Candidate/lastSuccessfulBuild/artifact/release-candidate/build/docs/index.html

 I haven't gone through with a fine tooth comb yet, hence the prototype in
 the subject line, but my preliminary skimming of it seems like it is on
 track.   I will cover it more later today.  In the meantime, feedback is
 appreciated.

 Cheers,
 Grant




-- 
Regards,
Shalin Shekhar Mangar.


Re: prototype Solr 1.3 RC 1

2008-09-03 Thread Yonik Seeley
On Wed, Sep 3, 2008 at 2:57 PM, Shalin Shekhar Mangar
[EMAIL PROTECTED] wrote:
 Seems like all issues have been closed. What is the plan for the release
 now?

I need to update the lucene libs again first... MikeM found+fixed a
lucene bug today.

-Yonik


RE: prototype Solr 1.3 RC 1

2008-08-29 Thread Steven A Rowe
Random nit: in release-candidate/build/docs/who.pdf, Otis's name is spelled 
Otis Gospodneti# (final character is a hash mark). - Steve

On 08/29/2008 at 11:16 AM, Grant Ingersoll wrote:
 I created a Hudson task to do the building/archival tasks for the
 release candidates.It is a on-demand task (i.e. not scheduled)
 
 See http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can
 didate/ for the job in general.
 
 The artifacts (including Maven) are at:
 http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can
 didate/lastSuccessfulBuild/artifact/
 
 The web site (including javadocs):
 http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can
 didate/lastSuccessfulBuild/artifact/release-candidate/build/do
 cs/index.html
 
 I haven't gone through with a fine tooth comb yet, hence the
 prototype in the subject line, but my preliminary skimming of it
 seems like it is on track.   I will cover it more later
 today.  In the
 meantime, feedback is appreciated.
 
 Cheers,
 Grant


 



Re: prototype Solr 1.3 RC 1

2008-08-29 Thread Grant Ingersoll


On Aug 29, 2008, at 12:05 PM, Steven A Rowe wrote:

Random nit: in release-candidate/build/docs/who.pdf, Otis's name is  
spelled Otis Gospodneti# (final character is a hash mark). - Steve


Hmm, that's weird.  It's also that way on the current site.




On 08/29/2008 at 11:16 AM, Grant Ingersoll wrote:

I created a Hudson task to do the building/archival tasks for the
release candidates.It is a on-demand task (i.e. not scheduled)

See http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can
didate/ for the job in general.

The artifacts (including Maven) are at:
http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can
didate/lastSuccessfulBuild/artifact/

The web site (including javadocs):
http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can
didate/lastSuccessfulBuild/artifact/release-candidate/build/do
cs/index.html

I haven't gone through with a fine tooth comb yet, hence the
prototype in the subject line, but my preliminary skimming of it
seems like it is on track.   I will cover it more later
today.  In the
meantime, feedback is appreciated.

Cheers,
Grant







--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ









Re: prototype Solr 1.3 RC 1

2008-08-29 Thread Otis Gospodnetic
Grant, here is what it's supposed to be:  Gospodnetić

 
If Forrest and friends don't like that diacritic, I suppose I can live with 
Gospodnetic -- damn i18n! ;)

This is what I see locally:

$ ffxg Gospod 
./src/site/src/documentation/content/xdocs/who.xml:  liOtis 
Gospodneti#263;/li

$ find . -name \*html | xargs grep Gospod 
./site/who.html:liOtis Gospodnetić/li

So the HTML looks OK, but apparently the PDF does not.  I don't know how else 
to specify that c with a diacritic other than with that #263; .  $10 that 
Steve knows!

Thank you,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Grant Ingersoll [EMAIL PROTECTED]
 To: solr-dev@lucene.apache.org
 Sent: Friday, August 29, 2008 2:42:26 PM
 Subject: Re: prototype Solr 1.3 RC 1
 
 
 On Aug 29, 2008, at 12:05 PM, Steven A Rowe wrote:
 
  Random nit: in release-candidate/build/docs/who.pdf, Otis's name is  
  spelled Otis Gospodneti# (final character is a hash mark). - Steve
 
 Hmm, that's weird.  It's also that way on the current site.
 
 
 
  On 08/29/2008 at 11:16 AM, Grant Ingersoll wrote:
  I created a Hudson task to do the building/archival tasks for the
  release candidates.It is a on-demand task (i.e. not scheduled)
 
  See http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can
  didate/ for the job in general.
 
  The artifacts (including Maven) are at:
  http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can
  didate/lastSuccessfulBuild/artifact/
 
  The web site (including javadocs):
  http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can
  didate/lastSuccessfulBuild/artifact/release-candidate/build/do
  cs/index.html
 
  I haven't gone through with a fine tooth comb yet, hence the
  prototype in the subject line, but my preliminary skimming of it
  seems like it is on track.   I will cover it more later
  today.  In the
  meantime, feedback is appreciated.
 
  Cheers,
  Grant
 
 
 
 
 
 --
 Grant Ingersoll
 http://www.lucidimagination.com
 
 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ



Re: prototype Solr 1.3 RC 1

2008-08-29 Thread Otis Gospodnetic
Maybe this is the answer:
  http://forrest.apache.org/docs_0_90/faq.html#encoding

And this is what we've got:

$ head -1 src/site/src/documentation/content/xdocs/who.xml 
?xml version=1.0?

Sounds like something that would be good to fix in general, but I don't have 
forrest set up to try it :(


Otis




- Original Message 
 From: Otis Gospodnetic [EMAIL PROTECTED]
 To: solr-dev@lucene.apache.org
 Sent: Friday, August 29, 2008 3:13:39 PM
 Subject: Re: prototype Solr 1.3 RC 1
 
 Grant, here is what it's supposed to be:  Gospodnetić
 
 
 If Forrest and friends don't like that diacritic, I suppose I can live with 
 Gospodnetic -- damn i18n! ;)
 
 This is what I see locally:
 
 $ ffxg Gospod 
 ./src/site/src/documentation/content/xdocs/who.xml:  
* Otis Gospodnetić
 
 $ find . -name \*html | xargs grep Gospod 
 ./site/who.html:
* Otis Gospodnetić
 
 So the HTML looks OK, but apparently the PDF does not.  I don't know how else 
 to 
 specify that c with a diacritic other than with that ć .  $10 that Steve 
 knows!
 
 Thank you,
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
  From: Grant Ingersoll 
  To: solr-dev@lucene.apache.org
  Sent: Friday, August 29, 2008 2:42:26 PM
  Subject: Re: prototype Solr 1.3 RC 1
  
  
  On Aug 29, 2008, at 12:05 PM, Steven A Rowe wrote:
  
   Random nit: in release-candidate/build/docs/who.pdf, Otis's name is  
   spelled Otis Gospodneti# (final character is a hash mark). - Steve
  
  Hmm, that's weird.  It's also that way on the current site.
  
  
  
   On 08/29/2008 at 11:16 AM, Grant Ingersoll wrote:
   I created a Hudson task to do the building/archival tasks for the
   release candidates.It is a on-demand task (i.e. not scheduled)
  
   See http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can
   didate/ for the job in general.
  
   The artifacts (including Maven) are at:
   http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can
   didate/lastSuccessfulBuild/artifact/
  
   The web site (including javadocs):
   http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can
   didate/lastSuccessfulBuild/artifact/release-candidate/build/do
   cs/index.html
  
   I haven't gone through with a fine tooth comb yet, hence the
   prototype in the subject line, but my preliminary skimming of it
   seems like it is on track.   I will cover it more later
   today.  In the
   meantime, feedback is appreciated.
  
   Cheers,
   Grant
  
  
  
  
  
  --
  Grant Ingersoll
  http://www.lucidimagination.com
  
  Lucene Helpful Hints:
  http://wiki.apache.org/lucene-java/BasicsOfPerformance
  http://wiki.apache.org/lucene-java/LuceneFAQ



Re: prototype Solr 1.3 RC 1

2008-08-29 Thread Chris Hostetter

: If Forrest and friends don't like that diacritic, I suppose I can live 
: with Gospodnetic -- damn i18n! ;)

I seem to recall that we had this problem with forrest and the Lucene-Java 
who page as well ... over there you are listed in lowly ASCII, without 
your I18N goodness.

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Grant Ingersoll [EMAIL PROTECTED]
 To: solr-dev@lucene.apache.org
 Sent: Friday, August 29, 2008 2:42:26 PM
 Subject: Re: prototype Solr 1.3 RC 1
 
 
 On Aug 29, 2008, at 12:05 PM, Steven A Rowe wrote:
 
  Random nit: in release-candidate/build/docs/who.pdf, Otis's name is  
  spelled Otis Gospodneti# (final character is a hash mark). - Steve
 
 Hmm, that's weird.  It's also that way on the current site.
 
 
 
  On 08/29/2008 at 11:16 AM, Grant Ingersoll wrote:
  I created a Hudson task to do the building/archival tasks for the
  release candidates.It is a on-demand task (i.e. not scheduled)
 
  See http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can
  didate/ for the job in general.
 
  The artifacts (including Maven) are at:
  http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can
  didate/lastSuccessfulBuild/artifact/
 
  The web site (including javadocs):
  http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Can
  didate/lastSuccessfulBuild/artifact/release-candidate/build/do
  cs/index.html
 
  I haven't gone through with a fine tooth comb yet, hence the
  prototype in the subject line, but my preliminary skimming of it
  seems like it is on track.   I will cover it more later
  today.  In the
  meantime, feedback is appreciated.
 
  Cheers,
  Grant
 
 
 
 
 
 --
 Grant Ingersoll
 http://www.lucidimagination.com
 
 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ

: 



-Hoss



Re: prototype Solr 1.3 RC 1

2008-08-29 Thread Chris Hostetter

: Maybe this is the answer:
:  http://forrest.apache.org/docs_0_90/faq.html#encoding

my reading of that is that setting an encoding will let you use the 
litteral UTF-8 character 9which is what i would expect) but not the end of 
the answer...

 Another option is to use character entities such as ouml;  (ö) or 
 the numeric form #246;  (ö).

...which is what we are doing.  I suspect the PDF formatter just doesn't 
play nicely with the non-trivial UTF-8 characters.


-Hoss


Re: prototype Solr 1.3 RC 1

2008-08-29 Thread Grant Ingersoll


On Aug 29, 2008, at 3:18 PM, Otis Gospodnetic wrote:


Maybe this is the answer:
 http://forrest.apache.org/docs_0_90/faq.html#encoding

And this is what we've got:

$ head -1 src/site/src/documentation/content/xdocs/who.xml
?xml version=1.0?

Sounds like something that would be good to fix in general, but I  
don't have forrest set up to try it :(


It's easy to setup ;-)  http://forrest.apache.org. 
 


Re: prototype Solr 1.3 RC 1

2008-08-29 Thread Chris Hostetter

: I haven't gone through with a fine tooth comb yet, hence the prototype in
: the subject line, but my preliminary skimming of it seems like it is on track.
: I will cover it more later today.  In the meantime, feedback is appreciated.

I've done some testing with both the Solr 1.2 example and the configs from 
my apachecon demo last year, the only problem that jumped out at me was 
SOLR-740.

FWIW: I've also commited some usage/javadoc fixes.


-Hoss



Forrest PDF non-Latin-1 support [was: RE: prototype Solr 1.3 RC 1]

2008-08-29 Thread Steven A Rowe
On 08/29/2008 at 3:24 PM, Chris Hostetter wrote:
 I suspect the PDF formatter just doesn't play nicely with the
 non-trivial UTF-8 characters.

This is an Apache FOP FAQ; from 
http://xmlgraphics.apache.org/fop/faq.html#pdf-characters:

   6.2. Some characters are not displayed, or displayed
incorrectly, or displayed as #.

   This usually means the selected font doesn't have a
   glyph for the character.

   The standard text fonts supplied with Acrobat Reader have
   mostly glyphs for characters from the ISO Latin 1 character
   set. [...]

   If you use your own fonts, the font must have a glyph for the
   desired character. Furthermore the font must be available on
   the machine where the PDF is viewed or it must have been
   embedded in the PDF file. [...]

There's an open Forrest bug for this problem: 
https://issues.apache.org/jira/browse/FOR-132, and the discussion there 
includes a link to the Cocoon documentation for embedding fonts in PDF files: 
http://cocoon.apache.org/2.1/userdocs/pdf-serializer.html#FOP+and+Embedding+Fonts.

This looks kinda complicated, and AFAICT would require modifications to the 
Forrest installation wherever the site is built.

I suspect that almost nobody looks at the PDF version of the Who we are page 
(and I sure am sorry now that I brought this up...)

If things are left as-is, Otis's last name would be displayed properly in the 
HTML, and garbled in the PDF; if the diacritic is removed, then it will be 
displayed improperly in both places :)

Steve