Re: Parsing a user-entered localized datetime

2013-04-12 Thread Denis Steckelmacher

On 11/04/2013 19:17, John Layt wrote :


We do support a FancyDate parsing style in QLocale::readDate(), but
it is very limited to things like Yesterday and Monday.  There are
no plans to extend our fancy date support at this time as it would be
very hard to get right in a generic way, besides which kdelibs is
frozen until KF5.  In the future (Qt5/KF5) we may move localization to
using ICU which doesn't offer any such feature so we would need a new
one class for this.

A new class for parsing Relative Dates separate from the existing
date parsing code would make the most sense.  This would just take
strings and guess a rough time period.  I do think it will be very
hard writing generic code that works for every language that we
support, you should talk to the translators about this, especially
Chusselove.  I know the Fuzzy Clock tried hard to find a way to output
dates in a similar way but it ended up requireing lots fo manual work
for each new language.

As Kevin mentions, we store our default locale settings in the
entry-desktop files at
http://quickgit.kde.org/?p=kde-runtime.gita=treef=l10n [1] .  You
can have a default value for a setting that is used by all languages,
but then also language specific versions of each setting if needed. 
Alternatively you can use the standard i18n() calls.

Good luck :-)

John.


I've looked at KCalendarSystem and it seems that every calendar system 
is built
around some sorts of days, months and years. It simplifies things a 
bit, it would
have been difficult to handle things like two seasons ago in special 
calendar

systems.

I like your idea of a dedicated relative date class. In fact, I 
thought about a
HumanDateParser class, that reads locale-specific parser rules (I 
imagined them
to be stored in XML files, as they are very easy to read using Qt, and 
something
more rich that i18nc calls is required, except if you want translators 
to have to
translate things like 
day(s)[1],week(s)[7],month(s)[31:(January,...)), and use

them to parse strings.

Yesterday, I tried to note down what I consider are the strings that a 
parser
should be able to parse. If period is any word in day, week, month, 
year and
their plural forms, and day of week is the name of a day of the week, 
it should
be feasible to parse number period ago (3 weeks ago), next 
period
(next week), last period|day of week (last week, last year, last 
Monday), or
something more fancy like first Thursday of May. Shortcuts can be 
given, for
instance tomorrow. I don't know of these rules have to be regular 
expressions,
as some languages may separate words differently or use complex 
expression rules.


The parser rules will list the rules recognized by a given language in 
a given
calendar system, and provide parsing clues. For instance, some 
sentences typically
refer to a future event (next Friday, or even in May), while others 
can be
understood as a past tense or a future tense, depending on the 
application's context
(Dolphin is used to search files that exist, not that will exist in two 
weeks).


Finally, the parsing would consists of finding parts of the string that 
match one
rule. The first match would be taken. When a date has been found, its 
matching
portion of the string is removed and a time is looked for. I hope this 
could make
it possible to parse strings like Last Monday on 8 pm, without having 
to worry
about the on word, that every user will place differently or replace 
with a

comma or any other thing.

Denis Steckelmacher.

(on a side note, I have already written a parser matching only parts of 
human-written
content. It extracted quantity information from strings like 2 bottles 
of 1 l of milk
and was able to guess nearly 90% of the quantities. The Human likes to 
write valuable
information in recognizable ways, even if there are words between them. 
For instance,
the 2 in my example is the only number not followed by a unit, and 1 
l can only

mean one liter. So, the algorithm found 2x 1 liter)


Re: Parsing a user-entered localized datetime

2013-04-11 Thread Kevin Krammer
Hi Denis,

On Wednesday, 2013-04-10, Denis Steckelmacher wrote:

 to other calendar systems. Does KDE store its locale-specific settings
 in files
 that can be easily edited by translators ?

Can't help you with the rest, but one example of locale specific settings are 
(street-)address formatting rules.
You can find those files in kde-runtime/l10n, lool for config key 
AddressFormat in the entry.desktop files for various locales.

Cheers,
Kevin

-- 
Kevin Krammer, KDE developer, xdg-utils developer
KDE user support, developer mentoring


signature.asc
Description: This is a digitally signed message part.


Parsing a user-entered localized datetime

2013-04-10 Thread Denis Steckelmacher

Hi,

I am a student applying for this year's Google Summer of Code, to work 
on
Nepomuk. The project that interests me is a real query parser for 
user-entered
search queries. For instance, the user can use Dolphin to find 
documents created

one year ago, containing KDE and openSuSE, or having a certain tag.

I have already some spare time to work on this project and I want to 
begin
investigating some of the problems I will face during the summer. One 
of them is
parsing a user-entered datetime, like in the example above. A quick 
search on

the KDE and Qt documentation pointed me to KCalendarSystem::readDate.
Unfortunately, this method does not seem to be able to parse 
difficult

datetimes, for instance last monday on 8:45 or this morning.

Is there any class that I have missed and that does something like that 
? If
there isn't any, where could I contribute this kind of parser (in a 
separate
class in KDE Libs or by extending KCalendarSystem::readDate) ? I have 
already
some ideas about how I could implement it, but I am still thinking 
about how to
allow translators to provide locale-specific parsing rules. It can be a 
simple
translation of X days ago in another language, or even parsing rules 
adapted
to other calendar systems. Does KDE store its locale-specific settings 
in files

that can be easily edited by translators ?

My native language is French, so I can ensure that the parser will be 
able to
parse English and French dates, but I don't know any calendar system 
other than

the Gregorian one.

Denis Steckelmacher.