The question marks are actual question marks. I'm not sure how to find the
"duplicate" keys in the map in memory. As far as I can tell there is only
one "? 5" key in the in memory map.
I thought maybe computing the frequencies of the hash values of the keys
and looking for any with more than one would find them, but this code:
read-notes> (def dupes (filter #(> (second %) 1) (frequencies (map hash
(keys phrases)))))
#'read-notes/dupes
read-notes> (count dupes)
8911
seems to indicate 8,911 keys with identical hash values.
On Wednesday, November 25, 2015 at 10:27:29 PM UTC-6, Ghadi Shayban wrote:
>
> While in memory before writing, are the hash codes for the "duplicate"
> keys the same? You can call (hash) on the keys. I'm thinking there is
> perhaps an issue with unicode string serialization... Are the question
> marks a particular character?
>
> If you can find the similar strings in memory, before they are written,
> call:
> (map int the-string)
> To see the actual unicode characters for the question marks.
>
> On Wednesday, November 25, 2015 at 11:07:34 PM UTC-5, Dave Kincaid wrote:
>>
>> The number of keys in the map is 8,054,160.
>>
>> On Wednesday, November 25, 2015 at 10:04:11 PM UTC-6, Dave Kincaid wrote:
>>>
>>> I have something very strange going on when I try to write a map out to
>>> a file and read it back in. It's a perfectly fine hash-map with ?????
>>> key/values (so it's pretty big). When I write the map out to a file using
>>>
>>> (spit "/tmp/mednotes6153968756847768349/repl-write.edn" (pr-str phrases
>>> ))
>>>
>>> and then read it back in with
>>>
>>> (edn/read (PushbackReader. (io/reader
>>> "/tmp/mednotes6153968756847768349/repl-write.edn")))
>>>
>>> I am getting a duplicate key exception indicating that "? 5" is
>>> duplicated. phrases is a clojure.lang.PersistentHashMap. The keys of the
>>> map are strings and the values are numbers. When I get the value for "? 5"
>>> from the map it returns 352.
>>>
>>> I tried to grep the file to find the occurrences of the key "? 5" (and
>>> the 30 characters before and after it) and it seems to return 4 of them.
>>> The second one is the right one from the map, but I have no idea where the
>>> other 3 are coming from.
>>>
>>> [/tmp/mednotes6153968756847768349]> egrep -o ".{30}\"\? 5\" .{30}"
>>> repl-write.edn
>>> hasing a toothbrush for" 160, "? 5" 32, ". ) during his /" 32, "to
>>> "is intact with sutures" 32, "? 5" 352, "4.81 pounds" 128, "ceren
>>> udden" 32, "being up all" 32, "? 5" 32, "limited financial means"
>>> , "count , everytime she" 32, "? 5" 32, "had a partial mandibulect
>>>
>>> Does anyone have an idea what might be happening when the map is written
>>> out to the file? How is that key getting duplicated?
>>>
>>> I have tried a few slightly different ways of writing to the file
>>> including
>>>
>>> (spit "/tmp/mednotes6153968756847768349/repl-write.edn" (binding
>>> [*print-dup* true] (pr-str phrases)))
>>>
>>> and
>>>
>>> (spit "/tmp/mednotes6153968756847768349/repl-write.edn" (.toString
>>> phrases))
>>>
>>> based on some StackOverflow answers I found. They all seem to do the
>>> same thing.
>>>
>>> Here is the exception stack trace.
>>>
>>> 1. Caused by java.lang.IllegalArgumentException
>>> Duplicate key: ? 5
>>>
>>> PersistentHashMap.java: 67
>>> clojure.lang.PersistentHashMap/createWithCheck
>>> RT.java: 1538 clojure.lang.RT/map
>>> EdnReader.java: 631
>>> clojure.lang.EdnReader$MapReader/invoke
>>> EdnReader.java: 142 clojure.lang.EdnReader/read
>>> EdnReader.java: 108 clojure.lang.EdnReader/read
>>> edn.clj: 35 clojure.edn/read
>>> edn.clj: 33 clojure.edn/read
>>> AFn.java: 154 clojure.lang.AFn/applyToHelper
>>> AFn.java: 144 clojure.lang.AFn/applyTo
>>> Compiler.java: 3623
>>> clojure.lang.Compiler$InvokeExpr/eval
>>> Compiler.java: 439 clojure.lang.Compiler$DefExpr/eval
>>> Compiler.java: 6787 clojure.lang.Compiler/eval
>>> Compiler.java: 6745 clojure.lang.Compiler/eval
>>> core.clj: 3081 clojure.core/eval
>>> main.clj: 240
>>> clojure.main/repl/read-eval-print/fn
>>> main.clj: 240 clojure.main/repl/read-eval-print
>>> main.clj: 258 clojure.main/repl/fn
>>> main.clj: 258 clojure.main/repl
>>> RestFn.java: 1523 clojure.lang.RestFn/invoke
>>> interruptible_eval.clj: 58
>>> clojure.tools.nrepl.middleware.interruptible-eval/evaluate/fn
>>> AFn.java: 152 clojure.lang.AFn/applyToHelper
>>> AFn.java: 144 clojure.lang.AFn/applyTo
>>> core.clj: 630 clojure.core/apply
>>> core.clj: 1868 clojure.core/with-bindings*
>>> RestFn.java: 425 clojure.lang.RestFn/invoke
>>> interruptible_eval.clj: 56
>>> clojure.tools.nrepl.middleware.interruptible-eval/evaluate
>>> interruptible_eval.clj: 191
>>> clojure.tools.nrepl.middleware.interruptible-eval/interruptible-eval/fn/fn
>>> interruptible_eval.clj: 159
>>> clojure.tools.nrepl.middleware.interruptible-eval/run-next/fn
>>> AFn.java: 22 clojure.lang.AFn/run
>>> ThreadPoolExecutor.java: 1142
>>> java.util.concurrent.ThreadPoolExecutor/runWorker
>>> ThreadPoolExecutor.java: 617
>>> java.util.concurrent.ThreadPoolExecutor$Worker/run
>>> Thread.java: 745 java.lang.Thread/run
>>>
>>>
>>>
>>>
>>>
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.