Regarding logging, Graphframes makes a simple wrapper this way: https://github.com/graphframes/graphframes/blob/master/src/main/scala/org/ graphframes/Logging.scala
Regarding the UDTs, they have been hidden to be reworked for Datasets, the reasons being detailed here [1]. Can you describe your use case in more details? You may be better off copy/pasting the UDT code outside of Spark, depending on your use case. [1] https://issues.apache.org/jira/browse/SPARK-14155 On Thu, Feb 23, 2017 at 3:42 PM, Joseph Bradley <jos...@databricks.com> wrote: > +1 for Nick's comment about discussing APIs which need to be made public > in https://issues.apache.org/jira/browse/SPARK-19498 ! > > On Thu, Feb 23, 2017 at 2:36 AM, Steve Loughran <ste...@hortonworks.com> > wrote: > >> >> On 22 Feb 2017, at 20:51, Shouheng Yi <sho...@microsoft.com.INVALID> >> wrote: >> >> Hi Spark developers, >> >> Currently my team at Microsoft is extending Spark’s machine learning >> functionalities to include new learners and transformers. We would like >> users to use these within spark pipelines so that they can mix and match >> with existing Spark learners/transformers, and overall have a native spark >> experience. We cannot accomplish this using a non-“org.apache” namespace >> with the current implementation, and we don’t want to release code inside >> the apache namespace because it’s confusing and there could be naming >> rights issues. >> >> >> This isn't actually the ASF has a strong stance against, more left to >> projects themselves. After all: the source is licensed by the ASF, and the >> license doesn't say you can't. >> >> Indeed, there's a bit of org.apache.hive in the Spark codebase where the >> hive team kept stuff package private. Though that's really a sign that >> things could be improved there. >> >> Where is problematic is that stack traces end up blaming the wrong group; >> nobody likes getting a bug report which doesn't actually exist in your >> codebase., not least because you have to waste time to even work it out. >> >> You also have to expect absolutely no stability guarantees, so you'd >> better set your nightly build to work against trunk >> >> Apache Bahir does put some stuff into org.apache.spark.stream, but >> they've sort of inherited that right.when they picked up the code from >> spark. new stuff is going into org.apache.bahir >> >> >> We need to extend several classes from spark which happen to have >> “private[spark].” For example, one of our class extends VectorUDT[0] which >> has private[spark] class VectorUDT as its access modifier. This >> unfortunately put us in a strange scenario that forces us to work under the >> namespace org.apache.spark. >> >> To be specific, currently the private classes/traits we need to use to >> create new Spark learners & Transformers are HasInputCol, VectorUDT and >> Logging. We will expand this list as we develop more. >> >> >> I do think tis a shame that logging went from public to private. >> >> One thing that could be done there is to copy the logging into Bahir, >> under an org.apache.bahir package, for yourself and others to use. That's >> be beneficial to me too. >> >> For the ML stuff, that might be place to work too, if you are going to >> open source the code. >> >> >> >> Is there a way to avoid this namespace issue? What do other >> people/companies do in this scenario? Thank you for your help! >> >> >> I've hit this problem in the past. Scala code tends to force your hand >> here precisely because of that (very nice) private feature. While it offers >> the ability of a project to guarantee that implementation details aren't >> picked up where they weren't intended to be, in OSS dev, all that >> implementation is visible and for lower level integration, >> >> What I tend to do is keep my own code in its package and try to do as >> think a bridge over to it from the [private] scope. It's also important to >> name things obviously, say, org.apache.spark.microsoft , so stack traces >> in bug reports can be dealt with more easily >> >> >> [0]: https://github.com/apache/spark/blob/master/mllib/src/ >> main/scala/org/apache/spark/ml/linalg/VectorUDT.scala >> >> Best, >> Shouheng >> >> >> > > > -- > > Joseph Bradley > > Software Engineer - Machine Learning > > Databricks, Inc. > > [image: http://databricks.com] <http://databricks.com/> >