[SMW-devel] Date range patch

2008-03-06 Thread Louis Gerbarg
I was getting errors trying to store dates in the mid 2290s... here is
a patch that remedies this. It was tested under php 5.2.5 on Mac OS X
10.5.2 (32 bit). The patch should run correctly on php >= 5.1.3,
though I suspect it will not actually give people the extended ranges
unless you are on more recent releases due to fixes in the DateTime
class. If I did everything correctly it should still accept all the
same input formats it previously did, as well as store everything in
the database the same way, so no data conversion should be necessary,
but I did not test that extensively.

This patch should not be necessary if you are running a php 5.2
snapshot build on a 64 bit platform, but I think it is still
worthwhile since limiting extended ranges to 64 bit machines is
probably not optimal ;-)

There was a bit of patch fuzz because of changes in
exportData/exportRDF after I ran the tests, it looks like it should be
a non-issue.

Louis


DateTime.patch
Description: Binary data
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel


Re: [SMW-devel] Classes vs. Categories

2008-03-06 Thread Harold Solbrig
See http://biomedgt.org/  or http://www.wiktolog.org/agrowiki/ (the
latter is slightly out of date at the moment, but it includes an idea of
what might be done with basic OWL/Dublin Core and the like).  These
examples are slightly different cases, however, as are importing classes
*into* the wiki rather than creating them.  As we are developing
ontologies and classification schemes, there is very little use of
"instances".

It is our hope that ontology developers will be able use these resources
(or at least the first one) to comment on and propose changes to the
ontology contents.  We export these proposals in a Protege-OWL editor
through the RDF export mechanism, although we have to do a goodly amount
of transformation to accomplish this.  Ideally, we would like to reach
the point where we can generate real OWL through the RDF when
applicable.

As an aside, we have to tweak the auto-completion code in SemanticForms
to enumerate Categories rather than just articles.  As no one else seems
to have this requirement, we are assuming that our use case is somewhat
non-standard.

The Mayo Clinic is also developing a similar mechanism to curate the
contents of the International Classification of Diseases version 10
(ICD-10), but this isn't publicly available at this time.

Harold Solbrig
Apelon, Inc

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Jeff Thompson
Sent: Wednesday, March 05, 2008 12:24 PM
To: Semantic MediaWiki devs
Subject: Re: [SMW-devel] Classes vs. Categories

Jon Lang wrote:
> Jeff Thompson wrote:
>>  Good points about the difference between OWL DL and OWL Full.
>>  So if you only want to export OWL DL, what would you do with a page 
>> like  the President or Dog or Wine pages on Wikipedia, which are
pages about a class.
> 
> Place articles about classes in the Category namespace.  

This is indeed the logical answer.  I asked the question trying to be
provocative since I haven't seen a wiki where main articles like Dog are
put in the Category namespace.
Have you seen this in practice?




-
This SF.net email is sponsored by: Microsoft Defy all challenges.
Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel


Re: [SMW-devel] Date range patch

2008-03-06 Thread Markus Krötzsch
First of all: thanks a lot. Comments inline.


On Donnerstag, 6. März 2008, Louis Gerbarg wrote:
> I was getting errors trying to store dates in the mid 2290s...

I thought my schedule was packed with meetings, and you are already planning 
until the 2290s. What kind of site is that?

> here is 
> a patch that remedies this. It was tested under php 5.2.5 on Mac OS X
> 10.5.2 (32 bit). The patch should run correctly on php >= 5.1.3,
> though I suspect it will not actually give people the extended ranges
> unless you are on more recent releases due to fixes in the DateTime
> class. If I did everything correctly it should still accept all the
> same input formats it previously did, as well as store everything in
> the database the same way, so no data conversion should be necessary,
> but I did not test that extensively.

How does that work? Can the DB handle 64bit large numbers? Is there a 
performance hit associated to that?

I now have already two patches for extending data ranges: the other one is the 
Historical Date datatype by Terry A. Hurlbut. I wonder whether this extended 
datatype could also use your method (AFAIK the historical dates use days 
instead of seconds internally, right?). I also consider switching to a format 
that separates parts of dates in different DB fields, so that SMW can 
distinguish dates without exact day or day time from those dates which just 
happen to be at a year's 1st of Jan 00:00 ... this would probably also solve 
the 64bit issue since the components would fit into 32bit.

>
> This patch should not be necessary if you are running a php 5.2
> snapshot build on a 64 bit platform, but I think it is still
> worthwhile since limiting extended ranges to 64 bit machines is
> probably not optimal ;-)
>
> There was a bit of patch fuzz because of changes in
> exportData/exportRDF after I ran the tests, it looks like it should be
> a non-issue.

Yes, I changed the export code substantially, but that should not affect 
anything else.

Markus




-- 
Markus Krötzsch
Institut AIFB, Universität Karlsruhe (TH), 76128 Karlsruhe
phone +49 (0)721 608 7362  fax +49 (0)721 608 5998
[EMAIL PROTECTED]  www  http://korrekt.org


signature.asc
Description: This is a digitally signed message part.
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel


Re: [SMW-devel] Date range patch

2008-03-06 Thread Louis Gerbarg
On Thu, Mar 6, 2008 at 4:38 PM, Markus Krötzsch
<[EMAIL PROTECTED]> wrote:
> First of all: thanks a lot. Comments inline.
>
>
>
>  On Donnerstag, 6. März 2008, Louis Gerbarg wrote:
>  > I was getting errors trying to store dates in the mid 2290s...
>
>  I thought my schedule was packed with meetings, and you are already planning
>  until the 2290s. What kind of site is that?

I have been experimenting with annotating information about some
science fiction series. Some of that takes place far in the future.
Nothing is on a public facing server, and I have no idea if I will
actually do anything useful with this, but it is more about me working
through some issues with categorization and templating properties than
this particular data set.

>  > here is
>  > a patch that remedies this. It was tested under php 5.2.5 on Mac OS X
>  > 10.5.2 (32 bit). The patch should run correctly on php >= 5.1.3,
>  > though I suspect it will not actually give people the extended ranges
>  > unless you are on more recent releases due to fixes in the DateTime
>  > class. If I did everything correctly it should still accept all the
>  > same input formats it previously did, as well as store everything in
>  > the database the same way, so no data conversion should be necessary,
>  > but I did not test that extensively.
>
>  How does that work? Can the DB handle 64bit large numbers? Is there a
>  performance hit associated to that?

The existing date implementation stores it in the DB as an XSD
formatted string and double precision float, so there is no particular
efficacy issue one way or another. Altering it save it simply as an
integer (32 or 64) would probably be a bit of a performance win, but
extending the range should not negatively impact what is currently
being done. Since the numeric value in the DB is a double it cannot
actually store a full 64bit range, but it can safely store well past
the value of a 32 bit integer (~50ish bits if I recall correctly).
Depending on exactly how you are using the numeric representation vs
XSD representation there could be an interesting failure mode when the
numeric overflows but the XSD is still within 64 bits. (100+ million
years).

Strictly speaking, whether or not using a double field like that is
clean is sort of moot, because anyone running a 64 bit build of php
5.2.6-snap is going to end up with strtotime returning 64 bit values,
which should cause the same values as I am generating to end up in the
DB, even without my code changes. If it is a problem it is going to
need to be fixed regardless of whether the patch is applied.

>
>  I now have already two patches for extending data ranges: the other one is 
> the
>  Historical Date datatype by Terry A. Hurlbut. I wonder whether this extended
>  datatype could also use your method (AFAIK the historical dates use days
>  instead of seconds internally, right?). I also consider switching to a format
>  that separates parts of dates in different DB fields, so that SMW can
>  distinguish dates without exact day or day time from those dates which just
>  happen to be at a year's 1st of Jan 00:00 ... this would probably also solve
>  the 64bit issue since the components would fit into 32bit.

When you say parts of dates I presume you mean month, day, year, etc.
So long as the parser keeps working with the same wikitext I am not
partial to what I did, I just needed something that worked, and I was
more than happy to share it incase it was useful. I don't really need
anything more than an extension of the current time tracking mechanism
for what I am currently interested in, though the problem you mention
is certainly interesting. The one issue I can forsee with the
representation you are describing is that the boundaries people find
interesting are different. Imagine tracking comic books. Some are
published quarterly, monthly, biweekly, etc. Ideally you want some way
to distinguish between "Winter," "January," and "Week 1," all of which
start at the same time.

I think a more general way to solve it would be make a new data type
of DateRange, which was stored as two 64 bit integers. It would
effectively allow you to diminish the precision of a particular date
by separating the two points. Having said that, I don't need that
functionality, and doing it right seems like a lot of work.

Louis

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel


Re: [SMW-devel] Classes vs. Categories

2008-03-06 Thread Jeff Thompson
Thanks for the links. On biomedgt, I see that the Category inheritance
is used to make a class heirarchy:
http://biomedgt.org/index.php?title=Special:CategoryTree&target=BGT_TopThing(%40)&mode=0

Notice that each category not only is the subject of a subClassOf
relation, but also many other relations that describe the class, which is not 
allowed
by OWL DL.  See the rich fact box for BGT Antigen Gene:
http://biomedgt.org/index.php/Category:BGT_Antigen_Gene(B54432)

The category BGT Antigen Gene is in the category of BGT Gene as a subclass of 
it:
http://biomedgt.org/index.php/Category:BGT_Gene(B16785)
But notice that BGT Gene is in the category BGT Gene Kind.  This is not a 
subclass
relation, but rather the class BGT Gene is an instance of the second-order
class BGT Gene Kind.  Indeed, the RDF export wrongly uses subClassOf:
http://BiomedGT.org/index.php/Special:URIResolver/Category:BGT_Gene-28B16785-29";>
   http://BiomedGT.org/index.php/Special:URIResolver/Category:BGT_Gene_Kind-28B179-29"/>


So once again, even in this semantically aware application, category is 
improperly
used for both "subClassOf" and "instance of", and many properties other than 
subClassOf
are applied to a class, so we need OWL Full anyway.  This is a case for not 
relying
on the wiki category system, and needing explicit "subClassOf" properties.

Harold Solbrig wrote:
> See http://biomedgt.org/  or http://www.wiktolog.org/agrowiki/ (the
> latter is slightly out of date at the moment, but it includes an idea of
> what might be done with basic OWL/Dublin Core and the like).  These
> examples are slightly different cases, however, as are importing classes
> *into* the wiki rather than creating them.  As we are developing
> ontologies and classification schemes, there is very little use of
> "instances".
> 
> It is our hope that ontology developers will be able use these resources
> (or at least the first one) to comment on and propose changes to the
> ontology contents.  We export these proposals in a Protege-OWL editor
> through the RDF export mechanism, although we have to do a goodly amount
> of transformation to accomplish this.  Ideally, we would like to reach
> the point where we can generate real OWL through the RDF when
> applicable.
> 
> As an aside, we have to tweak the auto-completion code in SemanticForms
> to enumerate Categories rather than just articles.  As no one else seems
> to have this requirement, we are assuming that our use case is somewhat
> non-standard.
> 
> The Mayo Clinic is also developing a similar mechanism to curate the
> contents of the International Classification of Diseases version 10
> (ICD-10), but this isn't publicly available at this time.
> 
> Harold Solbrig
> Apelon, Inc
> 
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of
> Jeff Thompson
> Sent: Wednesday, March 05, 2008 12:24 PM
> To: Semantic MediaWiki devs
> Subject: Re: [SMW-devel] Classes vs. Categories
> 
> Jon Lang wrote:
>> Jeff Thompson wrote:
>>>  Good points about the difference between OWL DL and OWL Full.
>>>  So if you only want to export OWL DL, what would you do with a page 
>>> like  the President or Dog or Wine pages on Wikipedia, which are
> pages about a class.
>> Place articles about classes in the Category namespace.  
> 
> This is indeed the logical answer.  I asked the question trying to be
> provocative since I haven't seen a wiki where main articles like Dog are
> put in the Category namespace.
> Have you seen this in practice?
> 
> 
> 
> 
> -
> This SF.net email is sponsored by: Microsoft Defy all challenges.
> Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
> ___
> Semediawiki-devel mailing list
> Semediawiki-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
> 
> -
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
> ___
> Semediawiki-devel mailing list
> Semediawiki-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
> 
> 
> 



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel