[ 
https://issues.apache.org/jira/browse/ANY23-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133771#comment-14133771
 ] 

Timothy Potter commented on ANY23-238:
--------------------------------------

Hi Lewis.
  The BNode ids will be of the same format.  The problem was that the MD5 of 
the string "0" was being used in some cases where the source contained 
'itemid=""'.  For example, in one of our extractions this lead to over 140,000 
type relations to the BNode _:nodecfcd208495d565ef66e7dff9f98764da as 
'cfcd208495d565ef66e7dff9f98764da' is the MD5 of "0".  I'm not objecting to the 
use of an MD5 hash as the BNode id as long as it has an extremely low 
probability of collisions.  In Any23 the MD5 is often generated directly on the 
Java hashcode, which when extracting billions of tuples can lead to collisions. 
 Especially if there is a problem with the hashcode implementation.

> Fix generation of BNode name for microdata when 'itemid' is given without a 
> value.
> ----------------------------------------------------------------------------------
>
>                 Key: ANY23-238
>                 URL: https://issues.apache.org/jira/browse/ANY23-238
>             Project: Apache Any23
>          Issue Type: Improvement
>          Components: microdata
>    Affects Versions: 1.0
>            Reporter: Lewis John McGibbney
>             Fix For: 1.1
>
>
> Linking this issue to the relevant Github issue
> https://github.com/apache/any23/pull/9



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to