[jira] [Created] (FLINK-25660) Failed to build flink-runtime-web

2022-01-15 Thread Aitozi (Jira)
Aitozi created FLINK-25660:
--

 Summary: Failed to build flink-runtime-web
 Key: FLINK-25660
 URL: https://issues.apache.org/jira/browse/FLINK-25660
 Project: Flink
  Issue Type: Bug
Reporter: Aitozi


I clean up the local repo for org.apache.flink and rebuild the project. It 
fails at {{flink-runtime-web}}
{code:java}
Failed to execute goal com.github.eirslett:frontend-maven-plugin:1.11.0:npm 
(npm install) on project flink-runtime-web: Failed to run task: 'npm ci 
--cache-max=0 --no-save' failed. org.apache.commons.exec.ExecuteException: 
Process exited with an error: 1 (Exit value: 1) -> [Help 1]{code}
The debug log for it :

 
{code:java}
166 verbose stack TypeError: Cannot read property 'resolve' of undefined166 
verbose stack     at regFetch 
(/Users/aitozi/ant/work/code/community/flink-aitozi/flink/flink-runtime-web/web-dashboard/node/node_modules/npm/node_modules/npm-registry-fetch/index.js:76:23)166
 verbose stack     at fromManifest 
(/Users/aitozi/ant/work/code/community/flink-aitozi/flink/flink-runtime-web/web-dashboard/node/node_modules/npm/node_modules/pacote/lib/fetchers/registry/tarball.js:55:3)166
 verbose stack     at 
/Users/aitozi/ant/work/code/community/flink-aitozi/flink/flink-runtime-web/web-dashboard/node/node_modules/npm/node_modules/pacote/lib/fetchers/registry/tarball.js:38:25166
 verbose stack     at tryCatcher 
(/Users/aitozi/ant/work/code/community/flink-aitozi/flink/flink-runtime-web/web-dashboard/node/node_modules/npm/node_modules/bluebird/js/release/util.js:16:23)166
 verbose stack     at Promise._settlePromiseFromHandler 
(/Users/aitozi/ant/work/code/community/flink-aitozi/flink/flink-runtime-web/web-dashboard/node/node_modules/npm/node_modules/bluebird/js/release/promise.js:517:31)166
 verbose stack     at Promise._settlePromise 
(/Users/aitozi/ant/work/code/community/flink-aitozi/flink/flink-runtime-web/web-dashboard/node/node_modules/npm/node_modules/bluebird/js/release/promise.js:574:18)166
 verbose stack     at Promise._settlePromiseCtx 
(/Users/aitozi/ant/work/code/community/flink-aitozi/flink/flink-runtime-web/web-dashboard/node/node_modules/npm/node_modules/bluebird/js/release/promise.js:611:10)166
 verbose stack     at _drainQueueStep 
(/Users/aitozi/ant/work/code/community/flink-aitozi/flink/flink-runtime-web/web-dashboard/node/node_modules/npm/node_modules/bluebird/js/release/async.js:142:12)166
 verbose stack     at _drainQueue 
(/Users/aitozi/ant/work/code/community/flink-aitozi/flink/flink-runtime-web/web-dashboard/node/node_modules/npm/node_modules/bluebird/js/release/async.js:131:9)166
 verbose stack     at Async._drainQueues 
(/Users/aitozi/ant/work/code/community/flink-aitozi/flink/flink-runtime-web/web-dashboard/node/node_modules/npm/node_modules/bluebird/js/release/async.js:147:5)166
 verbose stack     at Immediate.Async.drainQueues [as _onImmediate] 
(/Users/aitozi/ant/work/code/community/flink-aitozi/flink/flink-runtime-web/web-dashboard/node/node_modules/npm/node_modules/bluebird/js/release/async.js:17:14)166
 verbose stack     at processImmediate (internal/timers.js:439:21)167 verbose 
cwd 
/Users/aitozi/ant/work/code/community/flink-aitozi/flink/flink-runtime-web/web-dashboard168
 verbose Darwin 20.6.0169 verbose argv 
"/Users/aitozi/ant/work/code/community/flink-aitozi/flink/flink-runtime-web/web-dashboard/node/node"
 
"/Users/aitozi/ant/work/code/community/flink-aitozi/flink/flink-runtime-web/web-dashboard/node/node_modules/npm/bin/npm-cli.js"
 "ci" "--cache-max=0" "--no-save"170 verbose node v12.14.1171 verbose npm  
v6.13.4172 error Cannot read property 'resolve' of undefined173 verbose exit [ 
1, true ] {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: Re: [DISCUSS] Introduce Hash Lookup Join

2022-01-15 Thread Jing Zhang
Hi all,
Thanks for all the feedback so far.
If there is no more suggestions, I would like to drive a vote in Tuesday
next week (18 Jan).

Best,
Jing Zhang

Jing Zhang  于2022年1月5日周三 11:33写道:

> Hi Francesco,
> Thanks a lot for the feedback.
>
> > does it makes sense for a lookup join to use hash distribution whenever
> is possible by default?
> I prefer to enable the hash lookup join only find the hint in the query
> for the following reason:
> 1. Plan compatibility
> Add a hash shuffle by default would leads to the change of plan after
> users upgrade the flink version.
> Besides, lookup join is commonly used feature in flink SQL.
> 2. Not all flink jobs could benefit from this improvement.
> It is a trade off for the lookup join with dimension connectors which
> has cache inside.
> We hope the raise the cache hit ratio by Hash Lookup Join, however it
> would leads to an extra shuffle at the same time.
> It is not always a positive optimization, especially for the
> connectors which does not have cache inside.
>
> > Shouldn't the hint take the table alias as the "table name"?  What if
> you do two lookup joins in cascade within the same query with the same
> table (once
> on a key, then on another one), where you use two different aliases for
> the table?
> In theory, it's better to support both table names and alias names.
> But in calcite, the alias name of subquery or table would not be lost in
> the sql conversion phase and sql optimization phase.
> So here we only support table names.
>
> Best,
> Jing Zhang
>
>
> Francesco Guardiani  于2022年1月3日周一 18:38写道:
>
>> Hi Jing,
>>
>> Thanks for the FLIP. I'm not very knowledgeable about the topic, but going
>> through both the FLIP and the discussion here, I wonder, does it makes
>> sense for a lookup join to use hash distribution whenever is possible by
>> default?
>>
>> The point you're explaining here:
>>
>> > Many Lookup table sources introduce cache in order
>> to reduce the RPC call, such as JDBC, CSV, HBase connectors.
>> For those connectors, we could raise cache hit ratio by routing the same
>> lookup keys to the same task instance
>>
>> Seems something we can infer automatically, rather than manually asking
>> the
>> user to add this hint to the query. Note that I'm not talking against the
>> hint syntax, which might still make sense to be introduced, but I feel
>> like
>> this optimization makes sense in the general case when using the
>> connectors
>> you have quoted. Perhaps there is some downside I'm not aware of?
>>
>> Talking about the hint themselves, taking this example as reference:
>>
>> SELECT /*+ SHUFFLE_HASH('Orders', 'Customers') */ o.order_id, o.total,
>> c.country, c.zip
>> FROM Orders AS o
>> JOIN Customers FOR SYSTEM_TIME AS OF o.proc_time AS c
>> ON o.customer_id = c.id;
>>
>> Shouldn't the hint take the table alias as the "table name"? What If you
>> do
>> two lookup joins in cascade within the same query with the same table
>> (once
>> on a key, then on another one), where you use two different aliases for
>> the
>> table?
>>
>>
>> On Fri, Dec 31, 2021 at 9:56 AM Jing Zhang  wrote:
>>
>> > Hi Lincoln,
>> > Thanks for the feedback.
>> >
>> > > 1. For the hint name, +1 for WenLong's proposal.
>> >
>> > I've added add 'SHUFFLE_HASH' to other alternatives in FLIP. Let's
>> waiting
>> > for more voices here.
>> >
>> > > Regarding the `SKEW` hint, agree with you that it can be used widely,
>> and
>> > I
>> > prefer to treat it as a metadata hint, a new category differs from a
>> join
>> > hint.
>> > For your example:
>> > ```
>> > SELECT /*+ USE_HASH('Orders', 'Customers'), SKEW('Orders') */
>> o.order_id,
>> > o.total, c.country, c.zip
>> > FROM Orders AS o
>> > JOIN Customers FOR SYSTEM_TIME AS OF o.proc_time AS c
>> > ON o.customer_id = c.id;
>> > ```
>> > I would prefer another form:
>> > ```
>> > -- provide the skew info to let the engine choose the optimal plan
>> > SELECT /*+ SKEW('Orders') */ o.order_id, ...
>> >
>> > -- or introduce a new hint for the join case, e.g.,
>> > SELECT /*+ REPLICATED_SHUFFLE_HASH('Orders') */ o.order_id, ...
>> > ```
>> >
>> > Maybe there is misunderstanding here.
>> > I just use a syntax sugar here.
>> >
>> > SELECT /*+ USE_HASH('Orders', 'Customers'), SKEW('Orders') */
>> o.order_id,
>> > 
>> >
>> > is just a syntax with
>> >
>> > SELECT /*+ USE_HASH('Orders', 'Customers') */ /*+SKEW('Orders') */
>> > o.order_id,
>> > 
>> >
>> > Although I list 'USE_HASH' and 'SKEW' hint in a query hint clause, it
>> does
>> > not mean they must appear together as a whole.
>> > Based on calcite syntax doc [1], you could list more than one hint in
>> > a /*+' hint [, hint ]* '*/ clause.
>> >
>> > Each hint has different function.
>> > The'USE_HASH' hint suggests the optimizer use hash partitioner for
>> Lookup
>> > Join for table 'Orders' and table 'Customers' while the 'SKEW' hint
>> tells
>> > the optimizer the skew metadata about the table 'Orders'.
>> >
>> > Best,
>> > Jing Zhang
>>