Hi,

Well, the pan-EU data portals doe use extra services, although technically they 
are not part of the crawler.
A machine translator, for instance, for translating metadata titles and short 
descriptions.
And maybe a geo-coding service for mapping regions to geo-coordinates

But they are based upon some metadata that is already available (title in 
another language, or name of region)

If your data would be stored in a CMS, the crawler can probably harvest some 
trivial metadata (update time, name...)
File format can be derived from file extension.


There are also services that try to extract semantic info from (meta)data,  but 
I don't think they are used by open data portals.
They are more targeted to news items, documents etc. For instance 
http://www.opencalais.com/about-open-calais/

I assume that, if you would have enough datasets that are already labeled 
correctly (category / theme from a limited list),
it would be possible for a crawler to automatically classify new datasets 
(using a machine learning library)


Best regards

Bart

-----Original Message-----
From: okfn-discuss [mailto:okfn-discuss-boun...@lists.okfn.org] On Behalf Of 
Masahiko SHOJI
Sent: Monday 1 August 2016 10:59
To: Open Knowledge Foundation discussion list <okfn-discuss@lists.okfn.org>
Subject: Re: [okfn-discuss] How to add metadata to data sets on the portal 
?(Request for cooperation)

Hi Bart,

Thank you for your kind reply.  I should clarify my question, but your comment 
is very useful for me.

Japanese government wants to know efficient ways for officials of various  
ministries in charge of registering new data sets as their daily work.  They 
seems to feel burden to adding metadata manually.

I have heard that some countries may be using crawler which automatically adds 
metadata. I do not know about what kind of metadata is.  Do you know about such 
information?


Best regards,

Masa SHOJI
Representative Director
Open Knowledge Japan




2016-07-29 5:47 GMT+09:00 Bart Hanssens <bart.hanss...@fedict.be>:
> Hi,
>
> It probably depends on what you mean by adding metadata on the portal, and 
> how the portal is maintained.
> Is this about adding extra metadata, after the datasets are published on the 
> portal ?
> Are the datasets on the Japanese portal maintained manually, or pushed to the 
> portal by an automated process ?
>
> E.g. for data.gov.be, the national portal in Belgium, gets it 
> (meta)data from various other (regional) portals, and from different 
> websites, by scraping (HTML sites) or using an API (CKAN, OpenDataSoft or 
> other software).
> The metadata from all portals is transformed to a DCAT-AP file, that is used 
> to update the data.gov.be site.
>
> We don't add metadata anymore on the data.gov.be itself, everything is 
> done before it gets uploaded
>
> Sometimes extra metadata is added (e.g. mapping of free text keywords 
> to themes/categories, or "missing" language tags, or a default email 
> contact...), or data is corrected etc
>
> Everything is command line (basically a small Java program that runs 
> various SPARQL files), and some mapping files are to be created 
> manually (typically a SKOS file)
>
> It's not "pretty", and not very advanced, but it works for our purposes...
>
> See https://github.com/Fedict/dcattools
>
>
> Best regards,
>
> Bart Hanssens
> Interoperability expert
> Federal Public Service ICT Belgium
>
> -----Original Message-----
> From: okfn-discuss [mailto:okfn-discuss-boun...@lists.okfn.org] On 
> Behalf Of Masahiko SHOJI
> Sent: Thursday 28 July 2016 11:49
> To: Open Knowledge Foundation discussion list 
> <okfn-discuss@lists.okfn.org>
> Subject: [okfn-discuss] How to add metadata to data sets on the portal 
> ?(Request for cooperation)
>
> Hi all,
>
> Japanese government is asking me about efficient way to add metadata
> to enormous data sets on the government open data portal site.   I
> would appreciate it if you could cooperate my question.
>
> 1. How do government officials add metadata to each data sets on the 
> government data portal site like "data.gov" ?
>
> 2. Do the government have any tools ( crawler ...etc) to add metadata?
> or have any plans to develop them?
>
> 3. Who knows this issue?  What department is in charge of this?
>
> 4. Do you have any related information or any outlook about this issue?
>
> Thank you.
>
> Masa Shoji
> Representative Director
> Open Knowledge Japan
> _______________________________________________
> okfn-discuss mailing list
> okfn-discuss@lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/okfn-discuss
> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-discuss
> _______________________________________________
> okfn-discuss mailing list
> okfn-discuss@lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/okfn-discuss
> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-discuss
_______________________________________________
okfn-discuss mailing list
okfn-discuss@lists.okfn.org
https://lists.okfn.org/mailman/listinfo/okfn-discuss
Unsubscribe: https://lists.okfn.org/mailman/options/okfn-discuss
_______________________________________________
okfn-discuss mailing list
okfn-discuss@lists.okfn.org
https://lists.okfn.org/mailman/listinfo/okfn-discuss
Unsubscribe: https://lists.okfn.org/mailman/options/okfn-discuss

Reply via email to