[jira] [Comment Edited] (AVRO-2051) Thread contention accessing JsonProperties props
[ https://issues.apache.org/jira/browse/AVRO-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16090911#comment-16090911 ] Daniel Kulp edited comment on AVRO-2051 at 7/18/17 1:04 AM: Of course another option is to just surround the access to the props with a ReentrantReadWriteLock. Bunch of ideas to test and benchmark. was (Author: dkulp): Of course another option is to just surround the access to the props with a ReentrantReadWriteLock. > Thread contention accessing JsonProperties props > > > Key: AVRO-2051 > URL: https://issues.apache.org/jira/browse/AVRO-2051 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.8.2 >Reporter: Daniel Kulp > > See > https://lists.apache.org/thread.html/dd34ab8439137a81a9de29ad4161f37b17638394cea0806765689976@%3Cuser.avro.apache.org%3E > Basically, the getJsonProp method, being synchronized, is causing thread > contention issues when trying to share schemas between threads.My > proposal (pull request forthcoming) is to treat "props" as an immutable map > and do a copy+add+swap for the addProp method. This will make the addProp > call slower (particularly for large maps of props), but would make the reads > significantly faster as no locking will be needed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AVRO-2051) Thread contention accessing JsonProperties props
[ https://issues.apache.org/jira/browse/AVRO-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16090911#comment-16090911 ] Daniel Kulp commented on AVRO-2051: --- Of course another option is to just surround the access to the props with a ReentrantReadWriteLock. > Thread contention accessing JsonProperties props > > > Key: AVRO-2051 > URL: https://issues.apache.org/jira/browse/AVRO-2051 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.8.2 >Reporter: Daniel Kulp > > See > https://lists.apache.org/thread.html/dd34ab8439137a81a9de29ad4161f37b17638394cea0806765689976@%3Cuser.avro.apache.org%3E > Basically, the getJsonProp method, being synchronized, is causing thread > contention issues when trying to share schemas between threads.My > proposal (pull request forthcoming) is to treat "props" as an immutable map > and do a copy+add+swap for the addProp method. This will make the addProp > call slower (particularly for large maps of props), but would make the reads > significantly faster as no locking will be needed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AVRO-2051) Thread contention accessing JsonProperties props
[ https://issues.apache.org/jira/browse/AVRO-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16090896#comment-16090896 ] Daniel Kulp commented on AVRO-2051: --- I'm trying to find something that will work for Avro 1.8.x as that's what we'll need.Thus, removing all of that is likely not an option. That said, I just discovered that we already have parts of guava shaded in as a dependency. Thus, I believe I can use the CacheBuilder to create the equivalent of a "ConcurrentLinkedHashMap" (there are some google links that mention this) that would work for this and not have the quadratic issue. I'll investigate more tomorrow. Another option would be to either add a dependency to something else (like caffeine) that has a ConcurrentLinkedHashMap or copy/shade an Apache licensed version (like https://github.com/ben-manes/concurrentlinkedhashmap/blob/master/src/main/java/com/googlecode/concurrentlinkedhashmap/ConcurrentLinkedHashMap.java) into the src and use it. > Thread contention accessing JsonProperties props > > > Key: AVRO-2051 > URL: https://issues.apache.org/jira/browse/AVRO-2051 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.8.2 >Reporter: Daniel Kulp > > See > https://lists.apache.org/thread.html/dd34ab8439137a81a9de29ad4161f37b17638394cea0806765689976@%3Cuser.avro.apache.org%3E > Basically, the getJsonProp method, being synchronized, is causing thread > contention issues when trying to share schemas between threads.My > proposal (pull request forthcoming) is to treat "props" as an immutable map > and do a copy+add+swap for the addProp method. This will make the addProp > call slower (particularly for large maps of props), but would make the reads > significantly faster as no locking will be needed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AVRO-2052) Remove org.apache.avro.file.DataFileWriter Double Buffering
[ https://issues.apache.org/jira/browse/AVRO-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated AVRO-2052: -- Attachment: AVRO-2052.1.patch Call {{directBinaryEncoder}} instead of the buffered {{binaryEncoder}} > Remove org.apache.avro.file.DataFileWriter Double Buffering > --- > > Key: AVRO-2052 > URL: https://issues.apache.org/jira/browse/AVRO-2052 > Project: Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.7.7, 1.8.2 >Reporter: BELUGA BEHR >Priority: Trivial > Attachments: AVRO-2052.1.patch > > > {code:title=org.apache.avro.file.DataFileWriter} > private void init(OutputStream outs) throws IOException { > this.underlyingStream = outs; > this.out = new BufferedFileOutputStream(outs); > EncoderFactory efactory = new EncoderFactory(); > this.vout = efactory.binaryEncoder(out, null); > dout.setSchema(schema); > buffer = new NonCopyingByteArrayOutputStream( > Math.min((int)(syncInterval * 1.25), Integer.MAX_VALUE/2 -1)); > this.bufOut = efactory.binaryEncoder(buffer, null); > if (this.codec == null) { > this.codec = CodecFactory.nullCodec().createInstance(); > } > this.isOpen = true; > } > {code} > It's clear here that both streams are writing to a buffered destination, {{ > BufferedFileOutputStream}} and {{ByteArrayOutputStream}} therefore there is > no reason to need a buffered encoder and instead, write directly to the > buffered streams with {{directBinaryEncoder}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AVRO-2052) Remove org.apache.avro.file.DataFileWriter Double Buffering
BELUGA BEHR created AVRO-2052: - Summary: Remove org.apache.avro.file.DataFileWriter Double Buffering Key: AVRO-2052 URL: https://issues.apache.org/jira/browse/AVRO-2052 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.8.2, 1.7.7 Reporter: BELUGA BEHR Priority: Trivial {code:title=org.apache.avro.file.DataFileWriter} private void init(OutputStream outs) throws IOException { this.underlyingStream = outs; this.out = new BufferedFileOutputStream(outs); EncoderFactory efactory = new EncoderFactory(); this.vout = efactory.binaryEncoder(out, null); dout.setSchema(schema); buffer = new NonCopyingByteArrayOutputStream( Math.min((int)(syncInterval * 1.25), Integer.MAX_VALUE/2 -1)); this.bufOut = efactory.binaryEncoder(buffer, null); if (this.codec == null) { this.codec = CodecFactory.nullCodec().createInstance(); } this.isOpen = true; } {code} It's clear here that both streams are writing to a buffered destination, {{ BufferedFileOutputStream}} and {{ByteArrayOutputStream}} therefore there is no reason to need a buffered encoder and instead, write directly to the buffered streams with {{directBinaryEncoder}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AVRO-2051) Thread contention accessing JsonProperties props
[ https://issues.apache.org/jira/browse/AVRO-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16090654#comment-16090654 ] Doug Cutting commented on AVRO-2051: This can make building schemas quadratic in the number of properties, no? While for most schemas this is probably not an issue, for some it might significantly impact performance. I think instead we should just bite the bullet and make Schema immutable, eliminating the addProp method altogether. At the same time, we should stop exposing JsonNode in the public API, instead using only Object, as intended in AVRO-1585. > Thread contention accessing JsonProperties props > > > Key: AVRO-2051 > URL: https://issues.apache.org/jira/browse/AVRO-2051 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.8.2 >Reporter: Daniel Kulp > > See > https://lists.apache.org/thread.html/dd34ab8439137a81a9de29ad4161f37b17638394cea0806765689976@%3Cuser.avro.apache.org%3E > Basically, the getJsonProp method, being synchronized, is causing thread > contention issues when trying to share schemas between threads.My > proposal (pull request forthcoming) is to treat "props" as an immutable map > and do a copy+add+swap for the addProp method. This will make the addProp > call slower (particularly for large maps of props), but would make the reads > significantly faster as no locking will be needed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] avro pull request #236: [AVRO-2051] Remove synchronization for JsonPropertie...
GitHub user dkulp opened a pull request: https://github.com/apache/avro/pull/236 [AVRO-2051] Remove synchronization for JsonProperties.getJsonProp This change does two basic things: 1) Makes "props" a private field and requires the subclasses to access it via the additional methods. This allows some changing of the underlying implementation a bit easier. 2) Change props to an AtomicReference and makes it act like an immutable map. The addProp method does a full copy of the map, adds the new value, and then atomicly swaps in the map thus not affecting other threads that would be using the value that was "current" when they called the get method. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dkulp/avro master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/avro/pull/236.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #236 commit ad14635fa3af97b90282a79b7e04a0b8753e45b5 Author: Daniel Kulp Date: 2017-07-17T19:08:10Z [AVRO-2051] Remove synchronization for JsonProperties.getJsonProp --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (AVRO-2051) Thread contention accessing JsonProperties props
[ https://issues.apache.org/jira/browse/AVRO-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16090365#comment-16090365 ] ASF GitHub Bot commented on AVRO-2051: -- GitHub user dkulp opened a pull request: https://github.com/apache/avro/pull/236 [AVRO-2051] Remove synchronization for JsonProperties.getJsonProp This change does two basic things: 1) Makes "props" a private field and requires the subclasses to access it via the additional methods. This allows some changing of the underlying implementation a bit easier. 2) Change props to an AtomicReference and makes it act like an immutable map. The addProp method does a full copy of the map, adds the new value, and then atomicly swaps in the map thus not affecting other threads that would be using the value that was "current" when they called the get method. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dkulp/avro master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/avro/pull/236.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #236 commit ad14635fa3af97b90282a79b7e04a0b8753e45b5 Author: Daniel Kulp Date: 2017-07-17T19:08:10Z [AVRO-2051] Remove synchronization for JsonProperties.getJsonProp > Thread contention accessing JsonProperties props > > > Key: AVRO-2051 > URL: https://issues.apache.org/jira/browse/AVRO-2051 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.8.2 >Reporter: Daniel Kulp > > See > https://lists.apache.org/thread.html/dd34ab8439137a81a9de29ad4161f37b17638394cea0806765689976@%3Cuser.avro.apache.org%3E > Basically, the getJsonProp method, being synchronized, is causing thread > contention issues when trying to share schemas between threads.My > proposal (pull request forthcoming) is to treat "props" as an immutable map > and do a copy+add+swap for the addProp method. This will make the addProp > call slower (particularly for large maps of props), but would make the reads > significantly faster as no locking will be needed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AVRO-2051) Thread contention accessing JsonProperties props
Daniel Kulp created AVRO-2051: - Summary: Thread contention accessing JsonProperties props Key: AVRO-2051 URL: https://issues.apache.org/jira/browse/AVRO-2051 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.8.2 Reporter: Daniel Kulp See https://lists.apache.org/thread.html/dd34ab8439137a81a9de29ad4161f37b17638394cea0806765689976@%3Cuser.avro.apache.org%3E Basically, the getJsonProp method, being synchronized, is causing thread contention issues when trying to share schemas between threads.My proposal (pull request forthcoming) is to treat "props" as an immutable map and do a copy+add+swap for the addProp method. This will make the addProp call slower (particularly for large maps of props), but would make the reads significantly faster as no locking will be needed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AVRO-1786) Strange IndexOutofBoundException in GenericDatumReader.readString
[ https://issues.apache.org/jira/browse/AVRO-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089823#comment-16089823 ] BELUGA BEHR commented on AVRO-1786: --- May be experiencing this issue as well trying to collect more information... > Strange IndexOutofBoundException in GenericDatumReader.readString > - > > Key: AVRO-1786 > URL: https://issues.apache.org/jira/browse/AVRO-1786 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.4, 1.7.7 > Environment: CentOS 6.5 Linux x64, 2.6.32-358.14.1.el6.x86_64 > Use IBM JVM: > IBM J9 VM (build 2.7, JRE 1.7.0 Linux amd64-64 Compressed References > 20140515_199835 (JIT enabled, AOT enabled) >Reporter: Yong Zhang >Priority: Minor > > Our production cluster is CENTOS 6.5 (2.6.32-358.14.1.el6.x86_64), running > IBM BigInsight V3.0.0.2. In Apache term, it is Hadoop 2.2.0 with MRV1(no > yarn), and comes with AVRO 1.7.4, running with IBM J9 VM (build 2.7, JRE > 1.7.0 Linux amd64-64 Compressed References 20140515_199835 (JIT enabled, AOT > enabled). Not sure if the JDK matters, but it is NOT Oracle JVM. > We have a ETL implemented in a chain of MR jobs. In one MR job, it is going > to merge 2 sets of AVRO data. Dataset1 is in HDFS location A, and Dataset2 is > in HDFS location B, and both contains the AVRO records binding to the same > AVRO schema. The record contains an unique id field, and a timestamp field. > The MR job is to merge the records based on the ID, and use the later > timestamp record to replace previous timestamp record, and omit the final > AVRO record out. Very straightforward. > Now we faced a problem that one reducer keeps failing with the following > stacktrace on JobTracker: > {code} > java.lang.IndexOutOfBoundsException > at java.io.ByteArrayInputStream.read(ByteArrayInputStream.java:191) > at java.io.DataInputStream.read(DataInputStream.java:160) > at > org.apache.avro.io.DirectBinaryDecoder.doReadBytes(DirectBinaryDecoder.java:184) > at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:263) > at > org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107) > at > org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:348) > at > org.apache.avro.reflect.ReflectDatumReader.readString(ReflectDatumReader.java:143) > at > org.apache.avro.reflect.ReflectDatumReader.readString(ReflectDatumReader.java:125) > at > org.apache.avro.reflect.ReflectDatumReader.readString(ReflectDatumReader.java:121) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139) > at > org.apache.avro.hadoop.io.AvroDeserializer.deserialize(AvroDeserializer.java:108) > at > org.apache.avro.hadoop.io.AvroDeserializer.deserialize(AvroDeserializer.java:48) > at > org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:142) > at > org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:117) > at > org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:297) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:165) > at > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:652) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at > java.security.AccessController.doPrivileged(AccessController.java:366) > at javax.security.auth.Subject.doAs(Subject.java:572) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > {code} > Here is the my Mapper and Reducer methods: > Mapper: > public void map(AvroKey key, NullWritable value, Context > context) throws IOException, InterruptedException > Reducer: > protected void reduce(CustomPartitionKeyClass key, > Iterable> values, Context context) throws > IOException, InterruptedException > What bother me are the following facts: > 1) All the mappers finish without error > 2) Most of the reducers finish without error, but one reducer keeps failing > with the above error. > 3) It looks like caused by the data? But keep in mind that all the avro > records passed the mapper side, but failed in one reducer. > 4) From the stacktrace, it looks like our reducer code was NOT invoked yet, > but f
[jira] [Updated] (AVRO-2050) Clear Array To Allow GC
[ https://issues.apache.org/jira/browse/AVRO-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated AVRO-2050: -- Attachment: AVRO-2050.2.patch [~nkollar] This implementation is essentially an {{ArrayList}}. The {{ArrayList}} overwrites the {{clear}} method because using the default {{AbstractList}} implementation requires instantiating an Iterator and then deleting each item in the Iterator one at a time. This is bad performance in terms of constant stack manipulation, but also this amounts to draining the array from the head of the list. Draining from the head requires an array copy for each item removed to shift down the existing records. It is much better to override the method as {{ArrayList}} has done. However, I did see some overlap with the {{toString}} and {{add}} methods which can be leveraged. Changed the patch to remove the two overrides. > Clear Array To Allow GC > --- > > Key: AVRO-2050 > URL: https://issues.apache.org/jira/browse/AVRO-2050 > Project: Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.7.7, 1.8.2 >Reporter: BELUGA BEHR >Priority: Minor > Attachments: AVRO-2050.1.patch, AVRO-2050.2.patch > > > Java's {{ArrayList}} implementation clears all Objects from the internal > buffer when the {{clear()}} method is called. This allows the Objects to be > free for GC. We should do the same in Avro > {{org.apache.avro.generic.GenericData}} > [ArrayList > Source|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/util/ArrayList.java#ArrayList.clear%28%29] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AVRO-2050) Clear Array To Allow GC
[ https://issues.apache.org/jira/browse/AVRO-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089518#comment-16089518 ] Nandor Kollar commented on AVRO-2050: - I'm wondering why {{clear()}} method is in overridden. It looks like the base class is AbstractList, which has clear method implemented correctly, so we might instead implement the {{Array}} iterator's {{remove()}} method no? > Clear Array To Allow GC > --- > > Key: AVRO-2050 > URL: https://issues.apache.org/jira/browse/AVRO-2050 > Project: Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.7.7, 1.8.2 >Reporter: BELUGA BEHR >Priority: Minor > Attachments: AVRO-2050.1.patch > > > Java's {{ArrayList}} implementation clears all Objects from the internal > buffer when the {{clear()}} method is called. This allows the Objects to be > free for GC. We should do the same in Avro > {{org.apache.avro.generic.GenericData}} > [ArrayList > Source|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/util/ArrayList.java#ArrayList.clear%28%29] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AVRO-2046) avro-python3: Very restricted set of data types which are allowed in AvroSchemaFromJSONData
[ https://issues.apache.org/jira/browse/AVRO-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089511#comment-16089511 ] ASF GitHub Bot commented on AVRO-2046: -- GitHub user manu-chroma opened a pull request: https://github.com/apache/avro/pull/235 schema.py: No sys traceback in parse exception In the ``SchemaParseException``, do not provide sys traceback. For our project CWL Tool, we're using `avro/py` in our python 3 builds. More on this has been discussed here: https://issues.apache.org/jira/browse/AVRO-2046 For doing this, we use `autotranslate` tool which converts `avro/py` code to python2and3 compatible code during runtime. The problem arises when it tries to convert this `raise Exception` command. There is no way to achieve this in a cross-compatible way without the use of external lib. Thus, I've created this PR. This is a very minimal change and really solves our problem for the time being. We really hope you'll consider this or at least give your feedback on the same. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manu-chroma/avro patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/avro/pull/235.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #235 commit 92525fda5cbae1ea7b9e5e255a52ad7e8f0ff71f Author: Manvendra Singh Date: 2017-07-17T08:53:28Z schema.py: No sys traceback in parse exception In the ``SchemaParseException``, do not provide sys traceback. For our project CWL Tool, we're using `avro/py` in our python 3 builds. More on this has been discussed here: https://issues.apache.org/jira/browse/AVRO-2046 For doing this, we use `autotranslate` tool which converts `avro/py` code to python2and3 compatible code during runtime. The problem arises when it tries to convert this `raise Exception` command. There is no way to achieve this in a cross-compatible way without the use of external lib. Thus, I've created this PR. This is a very minimal change and really solves our problem for the time being. We really hope you'll consider this or at least give your feedback on the same. > avro-python3: Very restricted set of data types which are allowed in > AvroSchemaFromJSONData > --- > > Key: AVRO-2046 > URL: https://issues.apache.org/jira/browse/AVRO-2046 > Project: Avro > Issue Type: Bug > Components: python >Affects Versions: 1.8.2 > Environment: avro-python3 (1.8.2) >Reporter: Manvendra Singh > > Hey, I come from CWL project: > https://github.com/common-workflow-language/cwltool and as a part of my GSoC > project, I'm working on adding Python 3 compatibility to *cwltool* codebase. > We've been using avro-python2 for a long time now and it has worked great for > us in our projects: schema_salad and cwltool. > In the process of porting cwltool, I'm facing issues with avro-python3 > library. I've found the following bug: > Minimal reproducible example: > {code:none} > from collections import OrderedDict > import avro.schema > AvroSchemaFromJSONData = avro.schema.SchemaFromJSONData > a = { > "fields": [ > { > "name": "name", > "type": "string" > }, > { > "name": "favorite_number", > "type": [ > "int", > "null" > ] > }, > { > "name": "favorite_color", > "type": [ > "string", > "null" > ] > } > ], > "name": "User", > "namespace": "example.avro", > "type": "record" > } > b = OrderedDict(a) > AvroSchemaFromJSONData(a) > AvroSchemaFromJSONData(b) > {code} > Ouput: > {code} > ~/Desktop/test/venv3/lib/python3.5/site-packages/avro/schema.py in > SchemaFromJSONData(json_data, names) >1252 if parser is None: >1253 raise SchemaParseException( > -> 1254 'Invalid JSON descriptor for an Avro schema: %r.' % json_data) >1255 return parser(json_data, names=names) >1256 > SchemaParseException: Invalid JSON descriptor for an Avro schema: > OrderedDict([('namespace', 'example.avro'), ('type', 'record'), ('name', > 'User'), ('fields', [{'type': 'string', 'name': 'name'}, {'type': ['int', > 'null'], 'name': 'favorite_number'}, {'type': ['string', 'null'], 'name': > 'favorite_color'}])]). > {code} > > h5. The current implementation of this function does not allow for *any dict > like data type*. It, however, works in avro-python2. > Relevant line of code: > https://github.com/apache/avro/blob/master/lang/py3/avro/schema.py#L1250 > Apart from this, I've tried using ``2t
[GitHub] avro pull request #235: schema.py: No sys traceback in parse exception
GitHub user manu-chroma opened a pull request: https://github.com/apache/avro/pull/235 schema.py: No sys traceback in parse exception In the ``SchemaParseException``, do not provide sys traceback. For our project CWL Tool, we're using `avro/py` in our python 3 builds. More on this has been discussed here: https://issues.apache.org/jira/browse/AVRO-2046 For doing this, we use `autotranslate` tool which converts `avro/py` code to python2and3 compatible code during runtime. The problem arises when it tries to convert this `raise Exception` command. There is no way to achieve this in a cross-compatible way without the use of external lib. Thus, I've created this PR. This is a very minimal change and really solves our problem for the time being. We really hope you'll consider this or at least give your feedback on the same. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manu-chroma/avro patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/avro/pull/235.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #235 commit 92525fda5cbae1ea7b9e5e255a52ad7e8f0ff71f Author: Manvendra Singh Date: 2017-07-17T08:53:28Z schema.py: No sys traceback in parse exception In the ``SchemaParseException``, do not provide sys traceback. For our project CWL Tool, we're using `avro/py` in our python 3 builds. More on this has been discussed here: https://issues.apache.org/jira/browse/AVRO-2046 For doing this, we use `autotranslate` tool which converts `avro/py` code to python2and3 compatible code during runtime. The problem arises when it tries to convert this `raise Exception` command. There is no way to achieve this in a cross-compatible way without the use of external lib. Thus, I've created this PR. This is a very minimal change and really solves our problem for the time being. We really hope you'll consider this or at least give your feedback on the same. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---