ShubhamChaurasia commented on a change in pull request #778: HIVE-22221: Llap
external client - Need to reduce LlapBaseInputFormat#getSplits() footprint
URL: https://github.com/apache/hive/pull/778#discussion_r327915751
##########
File path:
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFGetSplits.java
##########
@@ -192,9 +199,31 @@ public PlanFragment(TezWork work, Schema schema, JobConf
jc) {
@Override
public void process(Object[] arguments) throws HiveException {
+ initArgs(arguments);
+ try {
+ SplitResult splitResult = getSplitResult(false);
+ InputSplit[] splits = schemaSplitOnly ? new
InputSplit[]{splitResult.schemaSplit} : splitResult.actualSplits;
+ for (InputSplit s : splits) {
+ Object[] os = new Object[1];
+ bos.reset();
+ s.write(dos);
+ byte[] frozen = bos.toByteArray();
+ os[0] = frozen;
+ forward(os);
+ }
+ } catch (Exception e) {
+ throw new HiveException(e);
+ }
+ }
- String query = stringOI.getPrimitiveJavaObject(arguments[0]);
- int num = intOI.get(arguments[1]);
+ protected void initArgs(Object[] arguments) {
+ inputArgQuery = stringOI.getPrimitiveJavaObject(arguments[0]);
+ inputArgNumSplits = intOI.get(arguments[1]);
+ schemaSplitOnly = inputArgNumSplits == 0;
Review comment:
Sometimes at client side we only need schema and not the entire split
information.
In actual usage, when we define a dataframe in spark, it needs to know the
schema only.
This option was introduced in:
https://issues.apache.org/jira/browse/HIVE-19682
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]