Re: VOTE RC0 Release apache-solr-ref-guide-4.5.pdf

2013-09-28 Thread Steve Rowe
The TODO list is now empty (except for a shelved item), so that clears up the 
stuff I found.

Steve

On Sep 26, 2013, at 1:28 PM, Chris Hostetter hossman_luc...@fucit.org wrote:

 
 Awesome work steve!
 
 I collected all of this up into a scratch page, let's see how many we can 
 burn through easily and then post another RC...
 
 https://cwiki.apache.org/confluence/display/solr/Internal+-+TODO+List
 
 
 -Hoss
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: VOTE RC0 Release apache-solr-ref-guide-4.5.pdf

2013-09-27 Thread Otis Gospodnetic
I have just 3 chars to contribute: WOW

Otis



On Thu, Sep 26, 2013 at 8:29 AM, Steve Rowe sar...@gmail.com wrote:
 Except for #1/#34 - internal links to beginning-of-page sections point one 
 page earlier than they should - and #8/#41 - missing Thai and Polish chars - 
 which I don't know how to fix, I'll try to address the other items on this 
 (um, very long) list of mostly minor stuff I found:

 0. All examples in the exported PDF have an extra blank line at the top.  I 
 was able to eliminate these from this page 
 https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=32604227 
 (What is an analyzer?) by eliminating the newline between the initial {code 
 …} line and the first line of the examples.  This doesn't have any apparent 
 effect on the layout of the page on the wiki, but the PDF export of that page 
 no longer has the extra blank lines.  Any objections to switching all {code} 
 examples in the guide like this?

 1. Pg 2: The section links from the TOC all take you to the previous page, 
 rather than to the top of the page where the section starts.  (Same behavior 
 on OS X Preview, and under Windows, on Firefox's built-in PDF viewer and on 
 Adobe Reader.)  This looks like a general problem - see e.g. #34.

 2. Pg 68: Stray asterisks in the analyzer tags in the fieldType example 
 under Analysis Phases, apparently to make the surrounded text bold (which 
 also didn't happen).

 3. Pg 69: The solr.KeywordTokenizerFactory example is missing one quotation 
 mark from each of the left and right hand sides.

 4. Pg 70: Under solr.TokenizerFactory, there is a bogus StandardTokenizer 
 link in the sentence Theere aren't any filters that use StandardTokenizer's 
 types - the link is to the non-existent StandardTokenizer page on the Solr 
 wiki.  (It might be useful to systematically link stuff like this to the 
 corresponding Lucene or Solr javadocs, but this should probably be templated 
 or scripted, so that the version-specific links are handled properly.)

 5. Pg 71: Under Standard Tokenizer, the email addresses recognition claim 
 is false, and Internet domain name recognition isn't validation per se, e.g. 
 google.supercomputername will be tokenized as a single token along with 
 google.com.  The Out example output needs fixup accordingly.  I see that 
 the Classic Tokenizer section on pg 72 has the verbatim email/domain text; 
 for ClassicTokenizer, the email claim is true, but it has the same issue with 
 internet domain names as StandardTokenizer.

 6. Pg 74: The NGram Tokenizer example output should be (bicy, bicyc, 
 icyc, icycl, cycl, cycle, ycle) instead of all of the 4grams before 
 the 5grams (I think this class's behavior was changed in 4.4 by LUCENE-5042).

 7. Pg 75: The ICU tokenizer rulefiles argument is missing.

 8. Pg 75: The ICU Tokenizer's In input and Out output are completely 
 missing the Thai text that's visible on the wiki.

 9. Pg 75: Missing spaces in the Regular Expression Pattern Tokenizer's 
 group attribute description, at the boundaries between the first two 
 sentences: token(s).The and tokens.Non-negative.

 10. Pg 72, 76, 77, etc.: Many analysis components' factory class names should 
 be styled with a fixed-width font.

 11. Pg 77: UAX29 URL Email Tokenizer recognizes not only .com Internet domain 
 names, but also domain names including any other valid top-level domain 
 (i.e., unlike StandardTokenizer and ClassicTokenizer, domain names are 
 validated against the white list drawn from the IANA Root Zone database 
 http://www.internic.net/zones/root.zone as of the last time ant gen-tld 
 was performed and the tokenizer was generated.)

 12. Pg 77: UAX29 tokenizer: file::// should be file://

 13. Pg 77: UAX29 tokenizer's URL and EMAIL type names are missing angle 
 brackets.

 14. Pg 77: UAX29 tokenizer's maxTokenLength attribute name should be styled 
 with a fixed-width font.

 15. Pg 78: In the example demonstrating how arguments can be given to 
 filter elements via attributes, there is a stray asterisk, apparently 
 intended to bold the surrounding text, which also didn't work: *min=2 
 max=7/

 16. Pg 79: The ASCII Folding Filter's Out output should have the accent 
 stripped from the á - a and the ASCII character value adjusted - (ASCII 
 character 97)

 17. Pg 81: The Edge N-gram Filter's 4-6 gram size example Out should be 
 (four, scor, score, twen, twent, twenty) - some of these are 
 missing.

 18. Pg 83: The ICU Normalizer 2 Filter example should include the name and 
 mode attributes in the filter element.

 19. Pg 87: Stray asterisks in both of the N-Gram Filter examples: 
 *minGramSize=...

 20. Pg 87: The N-Gram Filter 3-5 gram size example Out output should be 
 (fou, four, our, sco, scor, score, cor, core, ore) - rather 
 than ordering by gram size, output is now ordered first by position and then 
 by gram size.

 21. Pg 88: Stray asterisk in the first occurrence only example of the Pattern 
 Replace Filter: *replace=first.

 

Re: VOTE RC0 Release apache-solr-ref-guide-4.5.pdf

2013-09-26 Thread Varun Thacker
Hi,

SOLR-3076 went into this release, but in the documentation for how to
support Block Join in Solr is not present.

In the ref guide there is a section called Other Parsers (
https://cwiki.apache.org/confluence/display/solr/Other+Parsers) . We should
add BlockJoinChildQParser and BlockJoinParentQParser.

Also we should add an example on how to index childDocs in XML to make use
of BlockJoin in Solr.

I can document them right now but where should I post it? If someone can
give me access to the Confluence I could add it there. My confluence
username is [varunthacker]


On Thu, Sep 26, 2013 at 1:06 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 Please vote to release the following artifacts as the Apache Solr
 Reference Guide for 4.5...

 https://dist.apache.org/repos/**dist/dev/lucene/solr/ref-**
 guide/apache-solr-ref-guide-4.**5-RC0/https://dist.apache.org/repos/dist/dev/lucene/solr/ref-guide/apache-solr-ref-guide-4.5-RC0/

 $ cat apache-solr-ref-guide-4.5-RC0/**apache-solr-ref-guide-4.5.pdf.**sha1
 ee40215d30f264d663f723ea2196b7**2b8cc5effc  apache-solr-ref-guide-4.5.pdf

 (When reviewing the PDF, please don't hesitate to point out any typos or
 formatting glitches or any other problems of subject matter. Re-spinning a
 new RC is trivial, So in my opinion the bar is very low in terms of what
 things are worth fixing before relase.)





 -Hoss

 --**--**-
 To unsubscribe, e-mail: 
 dev-unsubscribe@lucene.apache.**orgdev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 


Regards,
Varun Thacker
http://www.vthacker.in/


Re: VOTE RC0 Release apache-solr-ref-guide-4.5.pdf

2013-09-26 Thread Steve Rowe
Hi Varun,

Thanks, good catch!

Permission to edit the Reference Guide directly is only granted to Lucene/Solr 
committers - see 
https://cwiki.apache.org/confluence/display/solr/Internal+-+Maintaining+Documentation#Internal-MaintainingDocumentation-WhoCanEditThisDocumentation.
 

For small additions/corrections, non-committers can add a comment on a page in 
the section that is closest to where the content should go, and then a 
committer can put the content where it belongs.  But for larger stuff, it's 
better to create a JIRA issue, and attach the content there.

Steve

On Sep 26, 2013, at 5:48 AM, Varun Thacker varunthacker1...@gmail.com wrote:

 Hi,
 
 SOLR-3076 went into this release, but in the documentation for how to support 
 Block Join in Solr is not present.
 
 In the ref guide there is a section called Other Parsers 
 (https://cwiki.apache.org/confluence/display/solr/Other+Parsers) . We should 
 add BlockJoinChildQParser and BlockJoinParentQParser. 
 
 Also we should add an example on how to index childDocs in XML to make use of 
 BlockJoin in Solr.
 
 I can document them right now but where should I post it? If someone can give 
 me access to the Confluence I could add it there. My confluence username is 
 [varunthacker]
 
 
 On Thu, Sep 26, 2013 at 1:06 AM, Chris Hostetter hossman_luc...@fucit.org 
 wrote:
 
 Please vote to release the following artifacts as the Apache Solr Reference 
 Guide for 4.5...
 
 https://dist.apache.org/repos/dist/dev/lucene/solr/ref-guide/apache-solr-ref-guide-4.5-RC0/
 
 $ cat apache-solr-ref-guide-4.5-RC0/apache-solr-ref-guide-4.5.pdf.sha1
 ee40215d30f264d663f723ea2196b72b8cc5effc  apache-solr-ref-guide-4.5.pdf
 
 (When reviewing the PDF, please don't hesitate to point out any typos or 
 formatting glitches or any other problems of subject matter. Re-spinning a 
 new RC is trivial, So in my opinion the bar is very low in terms of what 
 things are worth fixing before relase.)
 
 
 
 
 
 -Hoss
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 
 -- 
 
 
 Regards,
 Varun Thacker
 http://www.vthacker.in/


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: VOTE RC0 Release apache-solr-ref-guide-4.5.pdf

2013-09-26 Thread Varun Thacker
Hi Steve,

No problems.

I've created SOLR-5275 for this.


On Thu, Sep 26, 2013 at 3:26 PM, Steve Rowe sar...@gmail.com wrote:

 Hi Varun,

 Thanks, good catch!

 Permission to edit the Reference Guide directly is only granted to
 Lucene/Solr committers - see 
 https://cwiki.apache.org/confluence/display/solr/Internal+-+Maintaining+Documentation#Internal-MaintainingDocumentation-WhoCanEditThisDocumentation
 .

 For small additions/corrections, non-committers can add a comment on a
 page in the section that is closest to where the content should go, and
 then a committer can put the content where it belongs.  But for larger
 stuff, it's better to create a JIRA issue, and attach the content there.

 Steve

 On Sep 26, 2013, at 5:48 AM, Varun Thacker varunthacker1...@gmail.com
 wrote:

  Hi,
 
  SOLR-3076 went into this release, but in the documentation for how to
 support Block Join in Solr is not present.
 
  In the ref guide there is a section called Other Parsers (
 https://cwiki.apache.org/confluence/display/solr/Other+Parsers) . We
 should add BlockJoinChildQParser and BlockJoinParentQParser.
 
  Also we should add an example on how to index childDocs in XML to make
 use of BlockJoin in Solr.
 
  I can document them right now but where should I post it? If someone can
 give me access to the Confluence I could add it there. My confluence
 username is [varunthacker]
 
 
  On Thu, Sep 26, 2013 at 1:06 AM, Chris Hostetter 
 hossman_luc...@fucit.org wrote:
 
  Please vote to release the following artifacts as the Apache Solr
 Reference Guide for 4.5...
 
 
 https://dist.apache.org/repos/dist/dev/lucene/solr/ref-guide/apache-solr-ref-guide-4.5-RC0/
 
  $ cat apache-solr-ref-guide-4.5-RC0/apache-solr-ref-guide-4.5.pdf.sha1
  ee40215d30f264d663f723ea2196b72b8cc5effc  apache-solr-ref-guide-4.5.pdf
 
  (When reviewing the PDF, please don't hesitate to point out any typos or
 formatting glitches or any other problems of subject matter. Re-spinning a
 new RC is trivial, So in my opinion the bar is very low in terms of what
 things are worth fixing before relase.)
 
 
 
 
 
  -Hoss
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 
  --
 
 
  Regards,
  Varun Thacker
  http://www.vthacker.in/


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 


Regards,
Varun Thacker
http://www.vthacker.in/


Re: VOTE RC0 Release apache-solr-ref-guide-4.5.pdf

2013-09-26 Thread Steve Rowe
Except for #1/#34 - internal links to beginning-of-page sections point one page 
earlier than they should - and #8/#41 - missing Thai and Polish chars - which I 
don't know how to fix, I'll try to address the other items on this (um, very 
long) list of mostly minor stuff I found:

0. All examples in the exported PDF have an extra blank line at the top.  I was 
able to eliminate these from this page 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=32604227 
(What is an analyzer?) by eliminating the newline between the initial {code 
…} line and the first line of the examples.  This doesn't have any apparent 
effect on the layout of the page on the wiki, but the PDF export of that page 
no longer has the extra blank lines.  Any objections to switching all {code} 
examples in the guide like this?

1. Pg 2: The section links from the TOC all take you to the previous page, 
rather than to the top of the page where the section starts.  (Same behavior on 
OS X Preview, and under Windows, on Firefox's built-in PDF viewer and on Adobe 
Reader.)  This looks like a general problem - see e.g. #34.

2. Pg 68: Stray asterisks in the analyzer tags in the fieldType example 
under Analysis Phases, apparently to make the surrounded text bold (which 
also didn't happen).

3. Pg 69: The solr.KeywordTokenizerFactory example is missing one quotation 
mark from each of the left and right hand sides.

4. Pg 70: Under solr.TokenizerFactory, there is a bogus StandardTokenizer 
link in the sentence Theere aren't any filters that use StandardTokenizer's 
types - the link is to the non-existent StandardTokenizer page on the Solr 
wiki.  (It might be useful to systematically link stuff like this to the 
corresponding Lucene or Solr javadocs, but this should probably be templated or 
scripted, so that the version-specific links are handled properly.)

5. Pg 71: Under Standard Tokenizer, the email addresses recognition claim is 
false, and Internet domain name recognition isn't validation per se, e.g. 
google.supercomputername will be tokenized as a single token along with 
google.com.  The Out example output needs fixup accordingly.  I see that 
the Classic Tokenizer section on pg 72 has the verbatim email/domain text; 
for ClassicTokenizer, the email claim is true, but it has the same issue with 
internet domain names as StandardTokenizer.

6. Pg 74: The NGram Tokenizer example output should be (bicy, bicyc, 
icyc, icycl, cycl, cycle, ycle) instead of all of the 4grams before 
the 5grams (I think this class's behavior was changed in 4.4 by LUCENE-5042).

7. Pg 75: The ICU tokenizer rulefiles argument is missing.

8. Pg 75: The ICU Tokenizer's In input and Out output are completely 
missing the Thai text that's visible on the wiki.

9. Pg 75: Missing spaces in the Regular Expression Pattern Tokenizer's group 
attribute description, at the boundaries between the first two sentences: 
token(s).The and tokens.Non-negative.

10. Pg 72, 76, 77, etc.: Many analysis components' factory class names should 
be styled with a fixed-width font.

11. Pg 77: UAX29 URL Email Tokenizer recognizes not only .com Internet domain 
names, but also domain names including any other valid top-level domain (i.e., 
unlike StandardTokenizer and ClassicTokenizer, domain names are validated 
against the white list drawn from the IANA Root Zone database 
http://www.internic.net/zones/root.zone as of the last time ant gen-tld was 
performed and the tokenizer was generated.)

12. Pg 77: UAX29 tokenizer: file::// should be file://

13. Pg 77: UAX29 tokenizer's URL and EMAIL type names are missing angle 
brackets.

14. Pg 77: UAX29 tokenizer's maxTokenLength attribute name should be styled 
with a fixed-width font.

15. Pg 78: In the example demonstrating how arguments can be given to filter 
elements via attributes, there is a stray asterisk, apparently intended to bold 
the surrounding text, which also didn't work: *min=2 max=7/

16. Pg 79: The ASCII Folding Filter's Out output should have the accent 
stripped from the á - a and the ASCII character value adjusted - (ASCII 
character 97)

17. Pg 81: The Edge N-gram Filter's 4-6 gram size example Out should be 
(four, scor, score, twen, twent, twenty) - some of these are 
missing.

18. Pg 83: The ICU Normalizer 2 Filter example should include the name and 
mode attributes in the filter element.

19. Pg 87: Stray asterisks in both of the N-Gram Filter examples: 
*minGramSize=...

20. Pg 87: The N-Gram Filter 3-5 gram size example Out output should be 
(fou, four, our, sco, scor, score, cor, core, ore) - rather 
than ordering by gram size, output is now ordered first by position and then by 
gram size.

21. Pg 88: Stray asterisk in the first occurrence only example of the Pattern 
Replace Filter: *replace=first.

22. Pg 89: encoder argument to the Phonetic Filter has surrounding double 
curly brackets instead of being styled with a fixed-width font. 

23. Pg 90: It should be mentioned on Porter Stem 

Re: VOTE RC0 Release apache-solr-ref-guide-4.5.pdf

2013-09-26 Thread Cassandra Targett
Thanks Steve.

I'll only address a couple of your specific issues inline. We can
split the rest of the list if you'd like, but I think a lot of them
are on the same page in the wiki (although multiple pages in the PDF)
- let me know.

Cassandra

On Thu, Sep 26, 2013 at 7:29 AM, Steve Rowe sar...@gmail.com wrote:
 0. All examples in the exported PDF have an extra blank line at the top.  I 
 was able to eliminate these from this page 
 https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=32604227 
 (What is an analyzer?) by eliminating the newline between the initial {code 
 …} line and the first line of the examples.  This doesn't have any apparent 
 effect on the layout of the page on the wiki, but the PDF export of that page 
 no longer has the extra blank lines.  Any objections to switching all {code} 
 examples in the guide like this?

CT: is it that horrible? There are dozens and dozens of code examples,
and it will take a while for someone to fix all of them. Since I edit
in wiki markup mode, I've always found it easier to add the line break
so my eyes can find the samples faster. That said, ease of use for
users is more important than my convenience, so if you think it's
badly distracting, then it's worth trying to fix it.

An alternative might be to try to change the CSS that produces the
code examples - the problem is that the default styling for the PDF
includes some padding, and then puts in the newline. Fiddling with the
CSS is painful though - we can't see the interim HTML and it's
essentially trial  error over  over.

So, it's essentially one of two annoying choices: edit all the code
examples by hand, or generate the PDF x-dozen times to maybe find out
the CSS approach won't work.


 1. Pg 2: The section links from the TOC all take you to the previous page, 
 rather than to the top of the page where the section starts.  (Same behavior 
 on OS X Preview, and under Windows, on Firefox's built-in PDF viewer and on 
 Adobe Reader.)  This looks like a general problem - see e.g. #34.

CT: This is essentially a known problem (see my comment:
https://issues.apache.org/jira/browse/SOLR-4886?focusedCommentId=13703660#comment-13703660,
last bullet point). The way the PDF is created is that Confluence
creates the entire document in an HTML page, which include bookmark
tags right before the different heading levels. When the PDF is then
generated, a rule is applied to insert a page-break before all h2
headings. That leaves the bookmark orphaned on the previous page. I
have never found a solution to this problem - you can't edit the HTML
and you don't have any control over where the bookmark tags in the
HTML are put before the HTML is converted to PDF. The only solution is
to never have page breaks, which I think severely diminishes
readability.


 2. Pg 68: Stray asterisks in the analyzer tags in the fieldType example 
 under Analysis Phases, apparently to make the surrounded text bold (which 
 also didn't happen).

CT: BTW, it never will - code examples are rendered verbatim, without
any of the styling normally applied.

 43. Pg 106: Langauge-Specific Factories: Catalan, Danish, Irish and Romanian 
 are missing from the covered languages; Catalan and Irish should include 
 ElisionFilterFactory in their examples - there are articles lists in Lucene's 
 {Catalan,Irish}Analyzer.

CT: A general note about the languages and examples - there used to be
examples that were incorrect so were removed so that might account for
some of the gaps. There's an open issue you'll want to look at before
diving in: https://issues.apache.org/jira/browse/SOLR-5031.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: VOTE RC0 Release apache-solr-ref-guide-4.5.pdf

2013-09-26 Thread Steve Rowe
Hi Cassandra,

On Sep 26, 2013, at 9:15 AM, Cassandra Targett casstarg...@gmail.com wrote:
 I'll only address a couple of your specific issues inline. We can
 split the rest of the list if you'd like, but I think a lot of them
 are on the same page in the wiki (although multiple pages in the PDF)
 - let me know.

I'll try to do them all myself, but if it looks like it's going to take more 
than one day, I'll ask for help. 

 On Thu, Sep 26, 2013 at 7:29 AM, Steve Rowe sar...@gmail.com wrote:
 0. All examples in the exported PDF have an extra blank line at the top.  I 
 was able to eliminate these from this page 
 https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=32604227 
 (What is an analyzer?) by eliminating the newline between the initial 
 {code …} line and the first line of the examples.  This doesn't have any 
 apparent effect on the layout of the page on the wiki, but the PDF export of 
 that page no longer has the extra blank lines.  Any objections to switching 
 all {code} examples in the guide like this?
 
 CT: is it that horrible? There are dozens and dozens of code examples,
 and it will take a while for someone to fix all of them. Since I edit
 in wiki markup mode, I've always found it easier to add the line break
 so my eyes can find the samples faster. That said, ease of use for
 users is more important than my convenience, so if you think it's
 badly distracting, then it's worth trying to fix it.

For me it's somewhere between annoying and badly distracting, but this will of 
course depend on the viewer.

 An alternative might be to try to change the CSS that produces the
 code examples - the problem is that the default styling for the PDF
 includes some padding, and then puts in the newline. Fiddling with the
 CSS is painful though - we can't see the interim HTML and it's
 essentially trial  error over  over.

I'll take a look at the CSS - this is the one, right?: 
https://cwiki.apache.org/confluence/spaces/flyingpdf/viewpdfstyleconfig.action?key=solr

About the interim HTML, I found this description of how to get it: 
https://confluence.atlassian.com/display/CONF35/Exporting+Confluence+Pages+and+Spaces+to+HTML.

 1. Pg 2: The section links from the TOC all take you to the previous page, 
 rather than to the top of the page where the section starts.  (Same behavior 
 on OS X Preview, and under Windows, on Firefox's built-in PDF viewer and on 
 Adobe Reader.)  This looks like a general problem - see e.g. #34.
 
 CT: This is essentially a known problem (see my comment:
 https://issues.apache.org/jira/browse/SOLR-4886?focusedCommentId=13703660#comment-13703660,
 last bullet point). The way the PDF is created is that Confluence
 creates the entire document in an HTML page, which include bookmark
 tags right before the different heading levels. When the PDF is then
 generated, a rule is applied to insert a page-break before all h2
 headings. That leaves the bookmark orphaned on the previous page. I
 have never found a solution to this problem - you can't edit the HTML
 and you don't have any control over where the bookmark tags in the
 HTML are put before the HTML is converted to PDF. The only solution is
 to never have page breaks, which I think severely diminishes
 readability.

Thanks for the explanation. I agree about page breaks being more important than 
off-by-one-page link targets.  I wonder if there is some CSS trick to put the 
page break before the target a instead of the h2 section.

 2. Pg 68: Stray asterisks in the analyzer tags in the fieldType example 
 under Analysis Phases, apparently to make the surrounded text bold (which 
 also didn't happen).
 
 CT: BTW, it never will - code examples are rendered verbatim, without
 any of the styling normally applied.

Hmm, so there's no way to apply any formatting at all?  That's too bad.

 
 43. Pg 106: Langauge-Specific Factories: Catalan, Danish, Irish and Romanian 
 are missing from the covered languages; Catalan and Irish should include 
 ElisionFilterFactory in their examples - there are articles lists in 
 Lucene's {Catalan,Irish}Analyzer.
 
 CT: A general note about the languages and examples - there used to be
 examples that were incorrect so were removed so that might account for
 some of the gaps. There's an open issue you'll want to look at before
 diving in: https://issues.apache.org/jira/browse/SOLR-5031.

Thanks for the pointer.

Steve


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: VOTE RC0 Release apache-solr-ref-guide-4.5.pdf

2013-09-26 Thread Cassandra Targett
On Thu, Sep 26, 2013 at 8:59 AM, Steve Rowe sar...@gmail.com wrote:

 I'll try to do them all myself, but if it looks like it's going to take more 
 than one day, I'll ask for help.


OK, let me know.


 I'll take a look at the CSS - this is the one, right?: 
 https://cwiki.apache.org/confluence/spaces/flyingpdf/viewpdfstyleconfig.action?key=solr

 About the interim HTML, I found this description of how to get it: 
 https://confluence.atlassian.com/display/CONF35/Exporting+Confluence+Pages+and+Spaces+to+HTML.

My first reaction was that it wouldn't work: The HTML export exports
the selected pages into a .zip file of HTML files (one file for each
wiki page). The interim-HTML for the PDF is one big single HTML file.
They're different exports, using different stylesheets. However, it
would make sense if the HTML was similar, so I took a look with my own
Confluence instance and the two exports use many of the same divs for
the same elements. It's not 1:1, but you could at least figure out
what the right divs are. The big difference will be heading levels -
the PDF flattens them all depending on the page hierarchy.

There are also CSS' in place that you don't see and default rules that
are applied if you haven't overridden them. And then I also think
there are some styles put into the HTML itself that would override
anything in the CSS. A few weeks ago I was working on a number of
possible changes to the PDF, the formatting of code samples being one
of them, but after two days working on it, I gave up for now. It
really isn't fun to work on.


 1. Pg 2: The section links from the TOC all take you to the previous page, 
 rather than to the top of the page where the section starts.  (Same 
 behavior on OS X Preview, and under Windows, on Firefox's built-in PDF 
 viewer and on Adobe Reader.)  This looks like a general problem - see e.g. 
 #34.

 CT: This is essentially a known problem (see my comment:
 https://issues.apache.org/jira/browse/SOLR-4886?focusedCommentId=13703660#comment-13703660,
 last bullet point). The way the PDF is created is that Confluence
 creates the entire document in an HTML page, which include bookmark
 tags right before the different heading levels. When the PDF is then
 generated, a rule is applied to insert a page-break before all h2
 headings. That leaves the bookmark orphaned on the previous page. I
 have never found a solution to this problem - you can't edit the HTML
 and you don't have any control over where the bookmark tags in the
 HTML are put before the HTML is converted to PDF. The only solution is
 to never have page breaks, which I think severely diminishes
 readability.

 Thanks for the explanation. I agree about page breaks being more important 
 than off-by-one-page link targets.  I wonder if there is some CSS trick to 
 put the page break before the target a instead of the h2 section.

 2. Pg 68: Stray asterisks in the analyzer tags in the fieldType example 
 under Analysis Phases, apparently to make the surrounded text bold (which 
 also didn't happen).

 CT: BTW, it never will - code examples are rendered verbatim, without
 any of the styling normally applied.

 Hmm, so there's no way to apply any formatting at all?  That's too bad.

You can apply syntax formatting based on the language of the example,
but not inline formatting to highlight specific lines - one way I've
gotten around that in other places is to enable line numbers to
display in the example, and then call out the line numbers in the
text.

Cassandra

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: VOTE RC0 Release apache-solr-ref-guide-4.5.pdf

2013-09-26 Thread Yonik Seeley
On Thu, Sep 26, 2013 at 5:48 AM, Varun Thacker
varunthacker1...@gmail.com wrote:
 SOLR-3076 went into this release, but in the documentation for how to
 support Block Join in Solr is not present.

IMO, it's a work in progress / experimental.  It doesn't necessarily
need to be in the normal ref guide at this point, but if anything gets
added it should probably be marked as experimental and potentially
subject to change.

-Yonik
http://lucidworks.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: VOTE RC0 Release apache-solr-ref-guide-4.5.pdf

2013-09-26 Thread Steve Rowe
Cassandra,

On Sep 26, 2013, at 10:39 AM, Cassandra Targett casstarg...@gmail.com wrote:
 I'll take a look at the CSS - this is the one, right?: 
 https://cwiki.apache.org/confluence/spaces/flyingpdf/viewpdfstyleconfig.action?key=solr
 
 About the interim HTML, I found this description of how to get it: 
 https://confluence.atlassian.com/display/CONF35/Exporting+Confluence+Pages+and+Spaces+to+HTML.
 
 My first reaction was that it wouldn't work: The HTML export exports
 the selected pages into a .zip file of HTML files (one file for each
 wiki page). The interim-HTML for the PDF is one big single HTML file.
 They're different exports, using different stylesheets. However, it
 would make sense if the HTML was similar, so I took a look with my own
 Confluence instance and the two exports use many of the same divs for
 the same elements. It's not 1:1, but you could at least figure out
 what the right divs are. The big difference will be heading levels -
 the PDF flattens them all depending on the page hierarchy.
 
 There are also CSS' in place that you don't see and default rules that
 are applied if you haven't overridden them. And then I also think
 there are some styles put into the HTML itself that would override
 anything in the CSS. A few weeks ago I was working on a number of
 possible changes to the PDF, the formatting of code samples being one
 of them, but after two days working on it, I gave up for now. It
 really isn't fun to work on.

I added the following to the PDF stylesheet:

   /* trim leading blank line from pre-formatted code blocks */  
   div.codeContentpre {  
 margin-top: -6px;  
   }   

and it seems to do the trick - the top and bottom vertical whitespace look 
balanced to me now on two individual pages I exported.  I'll export the whole 
thing now and look at every box to make sure this isn't doing the wrong thing 
somewhere.

Steve


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: VOTE RC0 Release apache-solr-ref-guide-4.5.pdf

2013-09-26 Thread Chris Hostetter

Awesome work steve!

I collected all of this up into a scratch page, let's see how many we can 
burn through easily and then post another RC...

https://cwiki.apache.org/confluence/display/solr/Internal+-+TODO+List


-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org