yes, this worked.  Kinda lengthy, but this is the code I came up with:


private void replace(JSONArray tweets) throws JSONException, BaseXException, IOException {
        log.fine(tweets.toString());
        JSONObject tweet = null;
        long id = 0L;
        new Open(databaseName).execute(context);
        new Set("parser", "json").execute(context);
        Command replace = null;

        for (int i = 0; i < tweets.length(); i++) {
            tweet = new JSONObject(tweets.get(i).toString());
            id = Long.parseLong(tweet.get("id_str").toString());
            replace = new Replace(id + ".xml");
            replace.setInput(new ArrayInput(tweet.toString()));
            replace.execute(context);
        }
        log.fine((new XQuery(".")).execute(context).toString());
    }

what I don't really understand there is that when creating the Replace command the "primary key" would seem to be the id_str from the tweet -- which is fine. But that relates to a filename xxx.xml?

thanks,

Thufir

On 2020-02-03 11:05 p.m., Christian Grün wrote:
You could use REPLACE instead of ADD (or db:replace instead of db:add) and name your tweet by the JSON id. For more details, have a look at our documentation [1].

Deleting duplicates after the insertion would be another approach, but it surely is too slow if your plan is to store thousands or millions of tweets.

[1] http://docs.basex.org/wiki/Database_Module#db:replace



thufir <[email protected] <mailto:[email protected]>> schrieb am Di., 4. Feb. 2020, 07:41:

    Not sure of the correct lingo, but I'm building a database of tweets.
    As I run it, duplicate tweets are added to the database.  I can see the
    duplicates with:

    for $tweets  in db:open("twitter")
    return <tweet>{$tweets/json/id__str}</tweet>

    Firstly, how would I select the json node for a duplicate entity.  But,
    before even selecting that node, recursively look to see if there's
    more
    than one result for that id__str value.

    How would I even generate a count of each occurrence for the data of a
    specific id__str?


    thanks,

    Thufir

Reply via email to