Would it be helpful to any other language besides Khmer to be able to know
what kind of space (line-break) etc. was before a token?
Or would someone be willing to help write a java rule for Khmer (by looking
at my xml mock-up)? I don't think it would be too difficult. It would
have a list of words, and then detect to make sure there was a real space
before each of the words and not just a zero-width space.
I think it would be pretty close to the
current WhitespaceBeforePunctuationRule.java but I am not sure.
Nathan
On Thu, May 23, 2013 at 6:28 PM, Ruud Baars <[email protected]> wrote:
> Might a change to the spacebefore-detection be an option?
> Like specifying space type instead of just No or Yes ?
>
> Ruud
>
> On 23-05-13 12:58, Marcin Miłkowski wrote:
> > W dniu 2013-05-23 11:32, Nathan Wells pisze:
> >> So I just confirmed with some farther testing that spacebefore considers
> >> a zero-width space as "spacebefore" so my rule will be false.
> >>
> >> Is the only way to proceed then a java rule? Or is there some way I can
> >> add an exception to "spacebefore" to be all spaces except a zero-width
> >> space (U+200B)?
> > No. Unfortunately, no.
> >
> >> The reason is in Khmer there are certain conjunctions that should always
> >> have a "Real" space before them, not just a zero-width space, so I am
> >> trying to create a rule to detect this.
> > I'm afraid that in this particular case, a Java rule would be needed.
> >
> > Best,
> > Marcin
> >
> >> Thanks,
> >> Nathan
> >>
> >>
> >> On Wed, May 22, 2013 at 10:35 PM, Nathan Wells <[email protected]
> >> <mailto:[email protected]>> wrote:
> >>
> >> Yes, I used the "spacebefore" detection (and I could have totally
> >> used it in the wrong way - so that could also be the problem!). But
> >> on my test it seemed that a zero-width space was included as a
> >> "space" so therefore "spacebefore" was true (Khmer words have a
> >> zero-width space between them - I am trying to detect a normal
> space
> >> before a word). Am I correct that "spacebefore" will think a
> >> zero-width space is a "space" the same as a normal space?
> >>
> >> Is there anyway to detect specifically a normal space, and ignore a
> >> zero-width space?
> >>
> >> Thanks,
> >> Nathan
> >>
> >>
> >>
> >>
> >> On Wed, May 22, 2013 at 10:27 PM, Marcin Miłkowski
> >> <[email protected] <mailto:[email protected]>> wrote:
> >>
> >> W dniu 2013-05-22 16:00, Nathan Wells pisze:
> >> > Hello Again,
> >> >
> >> > I am writing a rule trying to detect a space (U+0020) before
> >> a certain
> >> > token for Khmer. And if it is not present (or if only a
> >> zero-width space
> >> > exists U+200B) to add a space before the words.
> >> >
> >> > But it looks rules in the grammar.xml might not be able to
> >> discern the
> >> > difference between a zero-width space and a space...does
> that
> >> have to be
> >> > done in a java rule?
> >> >
> >>
> >> No. See here:
> >>
> >> http://wiki.languagetool.org/tips-and-tricks#toc13
> >>
> >> Best regards,
> >> Marcin
> >>
> >> > I don't really know java so I would rather keep things in
> the
> >> > grammar.xml for Khmer.
> >> >
> >> > I tried this, but it didn't work:
> >> >
> >> > <rule id="CONJUNCTION_SPACE" name="Add space before certain
> >> conjunctions">
> >> > <pattern>
> >> > <marker>
> >> > <token spacebefore="no"
> regexp="yes">(ដើម្បី|ពីព្រោះ|ហើយនិង)</token>
> >> > </marker>
> >> > </pattern>
> >> > <message>Add a full space before this word.
> >> > <suggestion><match no="1"
> >> > regexp_match="(ដើម្បី|ពីព្រោះ|ហើយនិង)" regexp_replace="
> >> $1"></match></suggestion>
> >> > </message>
> >> > <short>Add a full space before this
> word.</short>
> >> > <example type="correct">
> >> > គាត់បានទៅ<marker> ដើម្បី</marker>មើល។
> >> > </example>
> >> > <example type="incorrect" correction=" ដើម្បី">
> >> > គាត់បានទៅ<marker>ដើម្បី</marker>មើល។
> >> > </example>
> >> > </rule>
> >> >
> >> > Any help would be much appreciated - thanks!
> >> > Nathan
> >> >
> >> >
> >> >
> >>
>
> ------------------------------------------------------------------------------
> >> > Try New Relic Now & We'll Send You this Cool Shirt
> >> > New Relic is the only SaaS-based application performance
> >> monitoring service
> >> > that delivers powerful full stack analytics. Optimize and
> >> monitor your
> >> > browser, app, & servers with just a few lines of code. Try
> >> New Relic
> >> > and get this awesome Nerd Life shirt!
> >> http://p.sf.net/sfu/newrelic_d2d_may
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > Languagetool-devel mailing list
> >> > [email protected]
> >> <mailto:[email protected]>
> >> >
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
> >> >
> >>
> >>
> >>
>
> ------------------------------------------------------------------------------
> >> Try New Relic Now & We'll Send You this Cool Shirt
> >> New Relic is the only SaaS-based application performance
> >> monitoring service
> >> that delivers powerful full stack analytics. Optimize and
> >> monitor your
> >> browser, app, & servers with just a few lines of code. Try New
> Relic
> >> and get this awesome Nerd Life shirt!
> >> http://p.sf.net/sfu/newrelic_d2d_may
> >> _______________________________________________
> >> Languagetool-devel mailing list
> >> [email protected]
> >> <mailto:[email protected]>
> >>
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
> >>
> >>
> >>
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> Try New Relic Now & We'll Send You this Cool Shirt
> >> New Relic is the only SaaS-based application performance monitoring
> service
> >> that delivers powerful full stack analytics. Optimize and monitor your
> >> browser, app, & servers with just a few lines of code. Try New Relic
> >> and get this awesome Nerd Life shirt!
> http://p.sf.net/sfu/newrelic_d2d_may
> >>
> >>
> >>
> >> _______________________________________________
> >> Languagetool-devel mailing list
> >> [email protected]
> >> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
> >>
> >
> >
> ------------------------------------------------------------------------------
> > Try New Relic Now & We'll Send You this Cool Shirt
> > New Relic is the only SaaS-based application performance monitoring
> service
> > that delivers powerful full stack analytics. Optimize and monitor your
> > browser, app, & servers with just a few lines of code. Try New Relic
> > and get this awesome Nerd Life shirt!
> http://p.sf.net/sfu/newrelic_d2d_may
> > _______________________________________________
> > Languagetool-devel mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
>
>
> ------------------------------------------------------------------------------
> Try New Relic Now & We'll Send You this Cool Shirt
> New Relic is the only SaaS-based application performance monitoring service
> that delivers powerful full stack analytics. Optimize and monitor your
> browser, app, & servers with just a few lines of code. Try New Relic
> and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
> _______________________________________________
> Languagetool-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel