You could use REPLACE instead of ADD (or db:replace instead of db:add) and
name your tweet by the JSON id. For more details, have a look at our
documentation [1].

Deleting duplicates after the insertion would be another approach, but it
surely is too slow if your plan is to store thousands or millions of tweets.

[1] http://docs.basex.org/wiki/Database_Module#db:replace



thufir <[email protected]> schrieb am Di., 4. Feb. 2020, 07:41:

> Not sure of the correct lingo, but I'm building a database of tweets.
> As I run it, duplicate tweets are added to the database.  I can see the
> duplicates with:
>
> for $tweets  in db:open("twitter")
> return <tweet>{$tweets/json/id__str}</tweet>
>
> Firstly, how would I select the json node for a duplicate entity.  But,
> before even selecting that node, recursively look to see if there's more
> than one result for that id__str value.
>
> How would I even generate a count of each occurrence for the data of a
> specific id__str?
>
>
> thanks,
>
> Thufir
>

Reply via email to