>> We currently have code in the o.a.spark namespace. I don't think there is a >> JIRA for it yet, but this seems like cross-project trouble waiting to >> happen. https://github.com/apache/hbase/tree/master/ >> hbase-spark/src/main/scala/org/apache/spark >> > > IIRC, this was something we had to do because of how Spark architected > their stuff. So long as we're marking all of that stuff IA.Private I > think we're good, since we can fix it later if/when Spark changes. >
Yes. IIRC The trick is needed because we use a construct from spark sql package private for Spark 1.6. This trick is no longer needed if we only support Spark 2.x. > >> The way I see it, the options are a) ship both 1.6 and 2.y support, b) > >> ship just 2.y support, c) ship 1.6 in branch-1 and ship 2.y in > >> branch-2. Does anyone have preferences here? > > > > I think I prefer option B here as well. It sounds like Spark 2.2 will be > > out Very Soon, so we should almost certainly have a story for that. If > > there are no compatibility issues, then we can support >= 2.0 or 2.1, > > otherwise there's no reason to try and hit the moving target and we can > > focus on supporting the newest release. Like you said earlier, there's been > > no official release of this module yet, so I have to imagine that the > > current consumers are knowingly bleeding edge and can handle an upgrade or > > recompile on their own. > > > > Yeah, the bleeding-edge bit sounds fair. (someone please shout if it ain't) I am for Option b) as well! Even better, I am for we only ship support for Scala 2.11. Start clean? >>> 4) Packaging all this probably will be a pain no matter what we do >> >> Do we have to package this in our assembly at all? Currently, we include >> the hbase-spark module in the branch-2 and master assembly, but I'm not >> convinced this needs to be the case. Is it too much to ask users to build a >> jar with dependencies (which I think we already do) and include the >> appropriate spark/scala/hbase jars in it (pulled from maven)? I think this >> problem can be better solved through docs and client tooling rather than >> going through awkward gymnastics to package m*n versions in our tarball >> _and_ making sure that we get all the classpaths right. >> > > > Even if we don't put it in the assembly, we still have to package m*n > versions to put up in Maven, right? > > I'm not sure on the jar-with-deps bit. It's super nice to just include > one known-deployed jar in your spark classpath instead of putting that > size into each application jar your run. Of course, to your point on > classpaths, right now they'd need to grab things besides that jar. > Maybe these should be shaded jars sooner rather than later? There is a Filter class from the hbase-spark module that needs to be on the server classpath. If we don't have the whole jar there, we have to do some trick to separate it out. Great write-up from Sean. Thanks, Jerry
