[
https://issues.apache.org/jira/browse/LUCENE-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667837#action_12667837
]
Luis Alves commented on LUCENE-1528:
------------------------------------
Hi Michael,
I checked the book "Generating parser with JavaCC" and I checked the javacc
website (https://javacc.dev.java.net/doc/javaccgrm.html)
for grammar, here is the syntax for a character list:
character_list ::= [ "~" ] "[" [ character_descriptor ( ","
character_descriptor )* ] "]"
character_descriptor ::= java_string_literal [ "-" java_string_literal ]
also the '|' character in javacc syntax is used like an XOR, and there is no OR
or AND operator to be used in the javacc syntax that I'm aware.
So the expression <_WHITESPACE> | [ "+", ... ] would have to look like
~(<_WHITESPACE> & [ "+", ... ]) but this is not possible in javacc grammar.
So I think the best option for now, is to keep the current syntax.
If you like, I can change
<#_WHITESPACE: ( " " | "\t" | "\n" | "\r") >
to a character_list to make it more consistent, but that would not help to
remove the duplicated list of characters.
<#_WHITESPACE: [ " ", "\t", "\n", "\r" ] >
> Add support for Ideographic Space to the queryparser - also know as fullwith
> space and wide-space
> -------------------------------------------------------------------------------------------------
>
> Key: LUCENE-1528
> URL: https://issues.apache.org/jira/browse/LUCENE-1528
> Project: Lucene - Java
> Issue Type: Improvement
> Components: QueryParser
> Affects Versions: 2.4.1
> Reporter: Luis Alves
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.4.1
>
> Attachments: lucene_wide_space_v1_src.patch
>
> Original Estimate: 4h
> Remaining Estimate: 4h
>
> The Ideographic Space is a space character that is as wide as a normal CJK
> character cell.
> It is also known as wide-space or fullwith space.This type of space is used
> in CJK languages.
> This patch adds support for the wide space, making the queryparser component
> more friendly
> to queries that contain CJK text.
> Reference:
> 'http://en.wikipedia.org/wiki/Space_(punctuation)' - see Table of spaces,
> char U+3000.
> I also added a new testcase that fails before the patch.
> After the patch is applied all junits pass.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]